You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Network being used is LSTM -> Linear -> Sigmoid network using Mackey-Glass.
net_cfg.add_layer(LayerConfig::new(
// Layer name is only used internally - can be changed to anything
"LSTMInitial",
RnnConfig {
hidden_size: 5,
num_layers: 1,
dropout_seed: 123,
dropout_probability: 0.5,
rnn_type: RnnNetworkMode::LSTM,
input_mode: RnnInputMode::LinearInput,
direction_mode: DirectionMode::UniDirectional,
},
));
net_cfg.add_layer(LayerConfig::new("linear1", LinearConfig { output_size: 1 }));
net_cfg.add_layer(LayerConfig::new("sigmoid", LayerType::Sigmoid));
Using Batch Size 12 and LR 0.1, the network begins the converge successfully. During training it was observed that the output for the RNN -> Linear stage was similar in output predictions despite a difference in RNN outputs, which was believed to be an issue with the weight initialisation.
Ideally this network can be trained at a higher batch size across a single epoch, with a MSE of 0.05, as the function being approximated is fairly simple.
Current theories
SGD is not suitable to this problem - RMSProp may be
Weight initialisation is done incorrectly somewhere, or Glorot is unsuitable to the LSTM we're using
LSTM is improperly setup, and is causing an issue with performance.
The text was updated successfully, but these errors were encountered:
Network being used is LSTM -> Linear -> Sigmoid network using Mackey-Glass.
Using Batch Size 12 and LR 0.1, the network begins the converge successfully. During training it was observed that the output for the RNN -> Linear stage was similar in output predictions despite a difference in RNN outputs, which was believed to be an issue with the weight initialisation.
Ideally this network can be trained at a higher batch size across a single epoch, with a MSE of 0.05, as the function being approximated is fairly simple.
Current theories
The text was updated successfully, but these errors were encountered: