You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can you shed light on how these three different iterations of Aggregated Transforms are equivalent? From looking at your code it looks like you choose to implement method b. Is this accurate? Also I saw another implementation that uses lambda layers to do something more akin to item c. That is, if the previous layer channel dimension is 64-d for instance and C=32 (cardinality groups) then this would result in 64/32= 2 feature maps per cardinality group as input to the 32 different convolutions. These feature maps would not overlap and the sum of them across the cardinality groups will always equal 64-d in our example.
How is this the same as having 32 different convolutions all with 64-d channels as input? Your thoughts would be much appreciated!
I believe they are equivalent because 1) the model built using B) has the exact same number of parameters as that with C). C) is a a more succinct implementation, bit acheives the same thing.
However, if one tries to port weights from the torch code to Keras, they would find the lambda layer would fit, whereas this would not. I will look into, and update my code to match the lambda layer version when possible, as weight translation would be easier.
Interesting thank you so much! Just making sure I understand all this code that is going on. It seems the lambda layer is indeed doing what item c in fig 3 is displaying. Awesome to be on the right track! Thanks again!
@titu1994 ,
Awesome work!
In figure 3 of the paper:
Can you shed light on how these three different iterations of Aggregated Transforms are equivalent? From looking at your code it looks like you choose to implement method b. Is this accurate? Also I saw another implementation that uses lambda layers to do something more akin to item c. That is, if the previous layer channel dimension is 64-d for instance and C=32 (cardinality groups) then this would result in 64/32= 2 feature maps per cardinality group as input to the 32 different convolutions. These feature maps would not overlap and the sum of them across the cardinality groups will always equal 64-d in our example.
How is this the same as having 32 different convolutions all with 64-d channels as input? Your thoughts would be much appreciated!
EDIT: Other implementation - https://gist.github.com/mjdietzx/0cb95922aac14d446a6530f87b3a04ce
The text was updated successfully, but these errors were encountered: