Skip to content

Latest commit

 

History

History
59 lines (46 loc) · 3.74 KB

cheatsheet.md

File metadata and controls

59 lines (46 loc) · 3.74 KB

mcfly cheatsheet

This document can be found at https://github.com/NLeSC/mcfly-tutorial/blob/master/cheatsheet.md

Detailed documentation can be found in the mcfly wiki.

Notebook tutorials can be found in the mcfly-tutorial repository

Jargon terms

  • accuracy: proportion of correctly classified samples on all samples in a dataset.
  • convolutional filter: a set of weights that are applied to neighbouring data points.
  • convolutional layer: type of network layer where a convolutional filter is slided over the input.
  • CNN: Convolutional Neural Network, a deep learning network that includes convolutional layers, often combined with dense or fully connected layers.
  • LSTM layer: Long Term Short Memory layer. This is a special type of Recurrent layer, that takes a sequence as input and outputs a sequence.
  • DeepConvLSTM: A deep learning network that includes both convolutional layers and LSTM layers
  • epoch: One full pass through a dataset (all datapoints are seen once) in the process of training the weights of a network.
  • loss: An indicator of overall classification error. More errors means greater loss. In mcfly we use categorical cross entropy.
  • gradient descent: Algorithm used to find the locally optimal weights for the nodes in the network. The algorithm iteratively improves the weights in order to minimize classification loss. The search space can be interpreted as a landscape where the lowest point is the optimum, hence the term 'descent'. In each step of the gradient descent algorithm, the weights are adjusted with a step in the direction of the gradient ('slope').
  • hyperparameters: In mcfly, the hyperparameters are the architectural choices of the model (number of layers, lstm or convolutional layers, etc) and the learning rate and regulization rate.
  • layer: A deep learning network consists of multiple layers. The more layers, the deeper your network.
  • learning rate: The step size to take in the gradient descent algorithm.
  • regularization rate: How strongly the L2 regularization is applied to avoid overfitting on train data.
  • validation set: Part of the data that is kept apart to evaluate the performance of your model and choose hyper parameters.

Input data:

X_train => Nr samples x Nr timesteps x Nr channels

y_train_binary => Nr samples x Nr classes

Generate models:

Generate one or multiple untrained Keras models with random hyperparameters.

num_classes = y_train_binary.shape[1]
models = modelgen.generate_models(X_train.shape, number_of_classes=num_classes, number_of_models = 2)

Train multiple models:

Tries out a number of models on a subsample of the data, and outputs the best found architecture and hyperparameters.

histories, val_accuracies, val_losses = find_architecture.train_models_on_samples(
  X_train, y_train_binary, X_val, y_val_binary,
  models, nr_epochs=5, subset_size=300,
  verbose=True, outputfile=outputfile)

Select best model

best_model_index = np.argmax(val_accuracies)
best_model, best_params, best_model_types = models[best_model_index]

Train one specific model (this is done with Keras function fit):

best_model.fit(X_train, y_train_binary,
  nb_epoch=25, validation_data=(X_val, y_val_binary))