This is a patch release containing following changes to v0.21.3:
- Fixed large padding handling in input tensor transposition in bfloat16 weights gradient convolution (6df67fe)
- Fixed performance of reference convolution (2e1d048)
- Fixed "code is too big" error in case of extreme large spatial size (ed0be61, 4dee389, 59759ba)