You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# given layers 'input' and 'x_true' of suitable shapes/types/etc
...
x = lbann.Convolution(input, ..., parallel_strategy=<not None>)
y = lbann.L2Norm2(x)
z = lbann.Subtract(x, x_true)
...
It seems that the split layer introduced by LBANN's runtime between x and the y and z children doesn't gracefully handle the fact that x's tensors are actually managed by DistConv. I was seeing error messages like:
layer "conv_norm" expected an input tensor stored in a 4096 x 1 matrix from layer "convolution_layer_split", but got a 0 x 0 matrix
To fix this, I replaced x with:
x = lbann.Identity(lbann.Convolution(input, ..., parallel_strategy=<not None>), parallel_strategy=None)
(where the parallel_strategy=None is just to make very explicit that I do NOT want this layer to be DistConv-managed). This seems to have worked.
The text was updated successfully, but these errors were encountered:
Code like the following can cause problems:
It seems that the split layer introduced by LBANN's runtime between
x
and they
andz
children doesn't gracefully handle the fact thatx
's tensors are actually managed by DistConv. I was seeing error messages like:To fix this, I replaced
x
with:(where the
parallel_strategy=None
is just to make very explicit that I do NOT want this layer to be DistConv-managed). This seems to have worked.The text was updated successfully, but these errors were encountered: