Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Mysterious Dimension Swapping in BEVFusion's TransfusionHead #3020

Open
3 tasks done
barrydoooit opened this issue Aug 12, 2024 · 15 comments
Open
3 tasks done

[Bug] Mysterious Dimension Swapping in BEVFusion's TransfusionHead #3020

barrydoooit opened this issue Aug 12, 2024 · 15 comments

Comments

@barrydoooit
Copy link

barrydoooit commented Aug 12, 2024

Prerequisite

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment

sys.platform: linux
Python: 3.8.10 (default, Nov 22 2023, 10:22:35) [GCC 9.4.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.4, V11.4.152
GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 1.10.1+cu113
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.3
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  • CuDNN 8.2
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.11.2+cu113
OpenCV: 4.10.0
MMEngine: 0.10.4
MMDetection: 3.2.0+d509b75

Reproduces the problem - code sample

bash tools/dist_train.sh [configs] 1

Reproduces the problem - command or script

bash tools/dist_train.sh [configs] 1

Reproduces the problem - error message

File "/usr/local/lib/python3.8/dist-packages/mmdet3d/models/detectors/base.py", line 75, in forward
return self.loss(inputs, data_samples, **kwargs)
File "/workspace/bevfusion/bevfusion.py", line 301, in loss
bbox_loss = self.bbox_head.loss(feats, batch_data_samples)
File "/workspace/bevfusion/transfusion_head.py", line 761, in loss
loss = self.loss_by_feat(preds_dicts, batch_gt_instances_3d)
File "/workspace/bevfusion/transfusion_head.py", line 786, in loss_by_feat
loss_heatmap = self.loss_heatmap(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmdet/models/losses/gaussian_focal_loss.py", line 176, in forward
loss_reg = self.loss_weight * gaussian_focal_loss(
File "/usr/local/lib/python3.8/dist-packages/mmdet/models/losses/utils.py", line 121, in wrapper
loss = loss_func(pred, target, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mmdet/models/losses/gaussian_focal_loss.py", line 35, in gaussian_focal_loss
pos_loss = -(pred + eps).log() * (1 - pred).pow(alpha) * pos_weights
RuntimeError: The size of tensor a (136) must match the size of tensor b (120) at non-singleton dimension 3

Additional information

Here I use a custom dataset with non-square bev feature (i.e., the sparse shape is [960, 1088, z], making the bev feature map of spatial shape [120, 136]). When passing the sparse shape as "grid_size", which is used in:

feature_map_size = (grid_size[:2] // self.train_cfg['out_size_factor']
) # [x_len, y_len]

and the feature_map_size is then used to create the heatmap. Here the X and Y dimensions are swapped, making it of spatial shape [136, 120] following the code below:
heatmap = gt_bboxes_3d.new_zeros(self.num_classes, feature_map_size[1],
feature_map_size[0])

The problem is finally triggered at
# compute heatmap loss
loss_heatmap = self.loss_heatmap(
clip_sigmoid(preds_dict['dense_heatmap']).float(),
heatmap.float(),
avg_factor=max(heatmap.eq(1).float().sum().item(), 1),
)
(specifically in https://github.com/open-mmlab/mmdetection/blob/cfd5d3a985b0249de009b67d04f37263e11cdf3d/mmdet/models/losses/gaussian_focal_loss.py#L35) where the heatmap and the dense heatmap (with same spatial shape as the bev feature map) should have the same spatial shape.
This problem won't happen with nuscenes since it has square bev feature and the swap means nothing. Conclusively, such swap when intializing the heatmap is ambiguous and makes it impossible to have the same shape as the bev feature.

@barrydoooit barrydoooit changed the title [Bug] Mysterious Dimension Swapping in TransfusionHead [Bug] Mysterious Dimension Swapping in BEVFusion's TransfusionHead Aug 12, 2024
@barrydoooit
Copy link
Author

barrydoooit commented Aug 12, 2024

In addition, reverting this swapping, in other words, init-ing the heatmap as shape feature_map_size[0], feature_map_size[1] can accomplish to train a model.

@cxnaive
Copy link

cxnaive commented Aug 13, 2024

Are you training a camera-only model?

@barrydoooit
Copy link
Author

Are you training a camera-only model?

No, I’m dealing with Lidar-only and LC-fusion models. But this bug will remain there with cam-only BEVFusion, as the vtransform outputs a Bev feature map with the same spatial shape as the SCN.

@cxnaive
Copy link

cxnaive commented Aug 13, 2024

In fact, this operation is present in the original code of transfusion head in BEVFusion. However, the mit-bevfusion and the BEVFusion from NeurIPS 2022 differ in the final step of vtransform. The outputs X and Y from their vtransform are reversed

@barrydoooit
Copy link
Author

In fact, this operation is present in the original code of transfusion head in BEVFusion. However, the mit-bevfusion and the BEVFusion from NeurIPS 2022 differ in the final step of vtransform. The outputs X and Y from their vtransform are reversed

Thanks for pointing out. Yet what do you mean by "reversed"? Do you mean that the vtransform output is in a spatial shape of [y,x] rather than the [x, y] as the lidar feature from the SCN? I'm a little bit confused since if that's the case, the two feature maps won't be able to be stacked and processed by the 2D pts_backbone.

@cxnaive
Copy link

cxnaive commented Aug 13, 2024

I'm confused about this too, but in the vtransform of the NeurIPS 2022 bevfusion, the output is [y,x], and in its (and BevDet's) transfusion head the position you mention is also [y,x].

@cxnaive
Copy link

cxnaive commented Aug 13, 2024

Have you tried a non-square bev LCFusion? Does the 2d backbone of the bev model accept input properly?

@barrydoooit
Copy link
Author

barrydoooit commented Aug 13, 2024

Have you tried a non-square bev LCFusion? Does the 2d backbone of the bev model accept input properly?

Yes, that's exactly the case I'm encountering. I have sparse_shape=[960, 1088, 41], which corresponds to x, y, z in lidar coord. The x, y, z bound of LSS is adjusted accordingly. In this case 2d backbone (pts_backbone) does accept the feature maps in a proper manner.

@barrydoooit
Copy link
Author

@cxnaive Actually there is another confusing snippet, which however might be a hint to understand these ambiguous spatial shapes:

# original
# draw_heatmap_gaussian(heatmap[gt_labels_3d[idx]], center_int, radius) # noqa: E501
# NOTE: fix
draw_heatmap_gaussian(heatmap[gt_labels_3d[idx]],
center_int[[1, 0]], radius)

Say sparse_shape (xyz) is [960, 1088 41]. The predicted center (which should be [x, y] as it is used to form a LidarInstance3DBbox) is reversed when used to index the hotspot in the heatmap. That means the heatmap should also be reversed (as it is now) , shaped as [1088/8, 960/8]. But the SCN outputs a feature map of [960/8, 1088/8] when using xyz voxelization, manifested in:
N, C, H, W, D = spatial_features.shape
spatial_features = spatial_features.permute(0, 1, 4, 2, 3).contiguous()
spatial_features = spatial_features.view(N, C * D, H, W)

This is a conflict, but the operations in TransfusionHead are confusingly correct (except the heatmap coord swapping); using 'center' instead of 'center[[1,0]]' to index an heatmap of shape [960/8, 1088/8] ruins the training and the model never converges.

@cxnaive
Copy link

cxnaive commented Aug 13, 2024

draw_heatmap_gaussian(heatmap[gt_labels_3d[idx]], center_int, radius) is the original version of transfusion head. The original version should correspond to BEV features in the format [Y, X], while center_int[[1,0]] corresponds to [X, Y]

@cxnaive
Copy link

cxnaive commented Aug 13, 2024

So, the grid_size should also be reversed in the same way, but this was forgotten in the transfusion head of this version. Alternatively, consider using the original transfusion head but reversing the BEV features.

@cxnaive
Copy link

cxnaive commented Aug 13, 2024

@barrydoooit
Copy link
Author

https://github.com/open-mmlab/mmdetection3d/blob/fe25f7a51d36e3702f961e198894580d83c4387b/mmdet3d/models/utils/gaussian.py#L46C5-L53C69 From this, it can be seen that the default heatmap shape in mmdet3d is [Y, X]

This solves most of the confusion. That's why initializing the heatmap as shape feature_map_size[0], feature_map_size[1] (i.e.. the same as the bev feature) conforms to everything else, right?

@cxnaive
Copy link

cxnaive commented Aug 13, 2024

Yes, you can check the implementation of CenterHead in CenterPoint within mmdet3D, which also uses [Y, X] for BEV features. However, the BEV features obtained from the sparse encoder in BEVFusion are [X, Y]

@chenwen60
Copy link

Uploading 企业微信截图_1724901329616.png…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants