DSEM grows as layers advances, but is significantly smaller for activation functions than for feature maps outputs Simpleconv with kernel 7x7 not learning mnist r16 It seems with r8 it learns slowly, and its normal for r2/r0