You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Parameter gradients_accum to accumulate gradients and delay parameters update
Expose lower-level decoder APIs:
Decoder.step_fn: returns a callable and an initial state to run step by step decoding
Decoder.decode_from_inputs: decodes from full inputs (e.g. embeddings)
Fixes and improvements
Make learning rate decay configuration more generic: parameters can be set via a decay_params map which allows using more meaningful parameters name (see this example configurations)
By default, auto-configured Transformer models will accumulate gradients to simulate a training with 8 synchronous replicas (e.g. if you train with 4 GPUs, the gradients of 2 consecutive steps will be accumulated)