You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For each query (i.e., voxel cell), we conduct inner product between the query and the keys to obtain the attention affinity matrix that contains 1 × N correlations between the voxel and all its corresponding N camera features.
So I think this should lead to V x N correlations for V voxel cells and if we consider batches BxVxN. However in the implementation affinity = tf.einsum('bnc,bnc->bn', q, k) produces BxN shaped tensor. I feel like this should be affinity = tf.einsum("bij,bkl->bik",q,k). I couldnt manage to wrap my head around this, what am I missing?
Finally, thanks to the team for this great work. @LiYingwei
The text was updated successfully, but these errors were encountered:
It sounds like voxels that they are talking about are in fact pillars with 1 per bev grid, but I'm not 100% sure.
Another interesting question is what is the definition of "corresponding N camera features" - do you know which camera points are considered for given lidar feature?
In the DeepFusion paper it was said that
So I think this should lead to
V x N
correlations for V voxel cells and if we consider batchesBxVxN
. However in the implementationaffinity = tf.einsum('bnc,bnc->bn', q, k)
producesBxN
shaped tensor. I feel like this should beaffinity = tf.einsum("bij,bkl->bik",q,k)
. I couldnt manage to wrap my head around this, what am I missing?Finally, thanks to the team for this great work.
@LiYingwei
The text was updated successfully, but these errors were encountered: