[Bug] Mysterious Dimension Swapping in BEVFusion's TransfusionHead #3020

barrydoooit · 2024-08-12T12:53:28Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version (dev-1.x) or latest version (dev-1.0).

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment

sys.platform: linux
Python: 3.8.10 (default, Nov 22 2023, 10:22:35) [GCC 9.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.4, V11.4.152
GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 1.10.1+cu113
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.2
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.11.2+cu113
OpenCV: 4.10.0
MMEngine: 0.10.4
MMDetection: 3.2.0+d509b75

Reproduces the problem - code sample

bash tools/dist_train.sh [configs] 1

Reproduces the problem - command or script

bash tools/dist_train.sh [configs] 1

Reproduces the problem - error message

File "/usr/local/lib/python3.8/dist-packages/mmdet3d/models/detectors/base.py", line 75, in forward
return self.loss(inputs, data_samples, **kwargs)
File "/workspace/bevfusion/bevfusion.py", line 301, in loss
bbox_loss = self.bbox_head.loss(feats, batch_data_samples)
File "/workspace/bevfusion/transfusion_head.py", line 761, in loss
loss = self.loss_by_feat(preds_dicts, batch_gt_instances_3d)
File "/workspace/bevfusion/transfusion_head.py", line 786, in loss_by_feat
loss_heatmap = self.loss_heatmap(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmdet/models/losses/gaussian_focal_loss.py", line 176, in forward
loss_reg = self.loss_weight * gaussian_focal_loss(
File "/usr/local/lib/python3.8/dist-packages/mmdet/models/losses/utils.py", line 121, in wrapper
loss = loss_func(pred, target, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmdet/models/losses/gaussian_focal_loss.py", line 35, in gaussian_focal_loss
pos_loss = -(pred + eps).log() * (1 - pred).pow(alpha) * pos_weights
RuntimeError: The size of tensor a (136) must match the size of tensor b (120) at non-singleton dimension 3

Additional information

Here I use a custom dataset with non-square bev feature (i.e., the sparse shape is [960, 1088, z], making the bev feature map of spatial shape [120, 136]). When passing the sparse shape as "grid_size", which is used in:

mmdetection3d/projects/BEVFusion/bevfusion/transfusion_head.py

Lines 701 to 702 in fe25f7a

    
           feature_map_size = (grid_size[:2] // self.train_cfg['out_size_factor'] 
        
                               )  # [x_len, y_len]

and the feature_map_size is then used to create the heatmap. Here the X and Y dimensions are swapped, making it of spatial shape [136, 120] following the code below:

mmdetection3d/projects/BEVFusion/bevfusion/transfusion_head.py

Lines 703 to 704 in fe25f7a

    
           heatmap = gt_bboxes_3d.new_zeros(self.num_classes, feature_map_size[1], 
        
                                            feature_map_size[0])

The problem is finally triggered at

mmdetection3d/projects/BEVFusion/bevfusion/transfusion_head.py

Lines 785 to 790 in fe25f7a

    
           # compute heatmap loss 
        
           loss_heatmap = self.loss_heatmap( 
        
               clip_sigmoid(preds_dict['dense_heatmap']).float(), 
        
               heatmap.float(), 
        
               avg_factor=max(heatmap.eq(1).float().sum().item(), 1), 
        
           )

(specifically in https://github.com/open-mmlab/mmdetection/blob/cfd5d3a985b0249de009b67d04f37263e11cdf3d/mmdet/models/losses/gaussian_focal_loss.py#L35) where the heatmap and the dense heatmap (with same spatial shape as the bev feature map) should have the same spatial shape.
This problem won't happen with nuscenes since it has square bev feature and the swap means nothing. Conclusively, such swap when intializing the heatmap is ambiguous and makes it impossible to have the same shape as the bev feature.

The text was updated successfully, but these errors were encountered:

barrydoooit · 2024-08-12T20:08:19Z

In addition, reverting this swapping, in other words, init-ing the heatmap as shape feature_map_size[0], feature_map_size[1] can accomplish to train a model.

cxnaive · 2024-08-13T03:53:43Z

Are you training a camera-only model?

barrydoooit · 2024-08-13T07:09:40Z

Are you training a camera-only model?

No, I’m dealing with Lidar-only and LC-fusion models. But this bug will remain there with cam-only BEVFusion, as the vtransform outputs a Bev feature map with the same spatial shape as the SCN.

cxnaive · 2024-08-13T07:29:50Z

In fact, this operation is present in the original code of transfusion head in BEVFusion. However, the mit-bevfusion and the BEVFusion from NeurIPS 2022 differ in the final step of vtransform. The outputs X and Y from their vtransform are reversed

barrydoooit · 2024-08-13T07:57:13Z

In fact, this operation is present in the original code of transfusion head in BEVFusion. However, the mit-bevfusion and the BEVFusion from NeurIPS 2022 differ in the final step of vtransform. The outputs X and Y from their vtransform are reversed

Thanks for pointing out. Yet what do you mean by "reversed"? Do you mean that the vtransform output is in a spatial shape of [y,x] rather than the [x, y] as the lidar feature from the SCN? I'm a little bit confused since if that's the case, the two feature maps won't be able to be stacked and processed by the 2D pts_backbone.

cxnaive · 2024-08-13T08:04:40Z

I'm confused about this too, but in the vtransform of the NeurIPS 2022 bevfusion, the output is [y,x], and in its (and BevDet's) transfusion head the position you mention is also [y,x].

cxnaive · 2024-08-13T08:06:30Z

Have you tried a non-square bev LCFusion? Does the 2d backbone of the bev model accept input properly?

barrydoooit · 2024-08-13T08:09:25Z

Have you tried a non-square bev LCFusion? Does the 2d backbone of the bev model accept input properly?

Yes, that's exactly the case I'm encountering. I have sparse_shape=[960, 1088, 41], which corresponds to x, y, z in lidar coord. The x, y, z bound of LSS is adjusted accordingly. In this case 2d backbone (pts_backbone) does accept the feature maps in a proper manner.

barrydoooit · 2024-08-13T09:37:46Z

@cxnaive Actually there is another confusing snippet, which however might be a hint to understand these ambiguous spatial shapes:

mmdetection3d/projects/BEVFusion/bevfusion/transfusion_head.py

Lines 727 to 731 in fe25f7a

    
           # original 
        
           # draw_heatmap_gaussian(heatmap[gt_labels_3d[idx]], center_int, radius) # noqa: E501 
        
           # NOTE: fix 
        
           draw_heatmap_gaussian(heatmap[gt_labels_3d[idx]], 
        
                                 center_int[[1, 0]], radius)

Say sparse_shape (xyz) is [960, 1088 41]. The predicted center (which should be [x, y] as it is used to form a LidarInstance3DBbox) is reversed when used to index the hotspot in the heatmap. That means the heatmap should also be reversed (as it is now) , shaped as [1088/8, 960/8]. But the SCN outputs a feature map of [960/8, 1088/8] when using xyz voxelization, manifested in:

mmdetection3d/projects/BEVFusion/bevfusion/sparse_encoder.py

Lines 144 to 146 in fe25f7a

    
           N, C, H, W, D = spatial_features.shape 
        
           spatial_features = spatial_features.permute(0, 1, 4, 2, 3).contiguous() 
        
           spatial_features = spatial_features.view(N, C * D, H, W)

This is a conflict, but the operations in TransfusionHead are confusingly correct (except the heatmap coord swapping); using 'center' instead of 'center[[1,0]]' to index an heatmap of shape [960/8, 1088/8] ruins the training and the model never converges.

cxnaive · 2024-08-13T09:53:07Z

draw_heatmap_gaussian(heatmap[gt_labels_3d[idx]], center_int, radius) is the original version of transfusion head. The original version should correspond to BEV features in the format [Y, X], while center_int[[1,0]] corresponds to [X, Y]

cxnaive · 2024-08-13T09:55:44Z

So, the grid_size should also be reversed in the same way, but this was forgotten in the transfusion head of this version. Alternatively, consider using the original transfusion head but reversing the BEV features.

cxnaive · 2024-08-13T10:08:48Z

https://github.com/open-mmlab/mmdetection3d/blob/fe25f7a51d36e3702f961e198894580d83c4387b/mmdet3d/models/utils/gaussian.py#L46C5-L53C69
From this, it can be seen that the default heatmap shape in mmdet3d is [Y, X]

barrydoooit · 2024-08-13T10:33:31Z

https://github.com/open-mmlab/mmdetection3d/blob/fe25f7a51d36e3702f961e198894580d83c4387b/mmdet3d/models/utils/gaussian.py#L46C5-L53C69 From this, it can be seen that the default heatmap shape in mmdet3d is [Y, X]

This solves most of the confusion. That's why initializing the heatmap as shape feature_map_size[0], feature_map_size[1] (i.e.. the same as the bev feature) conforms to everything else, right?

cxnaive · 2024-08-13T10:37:46Z

Yes, you can check the implementation of CenterHead in CenterPoint within mmdet3D, which also uses [Y, X] for BEV features. However, the BEV features obtained from the sparse encoder in BEVFusion are [X, Y]

chenwen60 · 2024-08-29T03:19:09Z

barrydoooit changed the title ~~[Bug] Mysterious Dimension Swapping in TransfusionHead~~ [Bug] Mysterious Dimension Swapping in BEVFusion's TransfusionHead Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Mysterious Dimension Swapping in BEVFusion's TransfusionHead #3020

[Bug] Mysterious Dimension Swapping in BEVFusion's TransfusionHead #3020

barrydoooit commented Aug 12, 2024 •

edited

Loading

barrydoooit commented Aug 12, 2024 •

edited

Loading

cxnaive commented Aug 13, 2024

barrydoooit commented Aug 13, 2024

cxnaive commented Aug 13, 2024

barrydoooit commented Aug 13, 2024

cxnaive commented Aug 13, 2024

cxnaive commented Aug 13, 2024

barrydoooit commented Aug 13, 2024 •

edited

Loading

barrydoooit commented Aug 13, 2024

cxnaive commented Aug 13, 2024

cxnaive commented Aug 13, 2024

cxnaive commented Aug 13, 2024

barrydoooit commented Aug 13, 2024

cxnaive commented Aug 13, 2024

chenwen60 commented Aug 29, 2024

[Bug] Mysterious Dimension Swapping in BEVFusion's TransfusionHead #3020

[Bug] Mysterious Dimension Swapping in BEVFusion's TransfusionHead #3020

Comments

barrydoooit commented Aug 12, 2024 • edited Loading

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

barrydoooit commented Aug 12, 2024 • edited Loading

cxnaive commented Aug 13, 2024

barrydoooit commented Aug 13, 2024

cxnaive commented Aug 13, 2024

barrydoooit commented Aug 13, 2024

cxnaive commented Aug 13, 2024

cxnaive commented Aug 13, 2024

barrydoooit commented Aug 13, 2024 • edited Loading

barrydoooit commented Aug 13, 2024

cxnaive commented Aug 13, 2024

cxnaive commented Aug 13, 2024

cxnaive commented Aug 13, 2024

barrydoooit commented Aug 13, 2024

cxnaive commented Aug 13, 2024

chenwen60 commented Aug 29, 2024

barrydoooit commented Aug 12, 2024 •

edited

Loading

barrydoooit commented Aug 12, 2024 •

edited

Loading

barrydoooit commented Aug 13, 2024 •

edited

Loading