Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhance] Support RGB images on ScanNet for multi-view detector #696

Merged
merged 9 commits into from
Jul 20, 2021

Conversation

filaPro
Copy link
Contributor

@filaPro filaPro commented Jul 3, 2021

This PR aims to support RGB images on ScanNet for multi-view detector. More discussion in #620.

It is a bit WIP now as I also plan to add images to ScanNetDataset. For now I just adapted SensReader from original ScanNet repo. Can you please check if we are ok with it? cc @Tai-Wang

@codecov
Copy link

codecov bot commented Jul 3, 2021

Codecov Report

Merging #696 (e8b6c40) into master (1f79fc7) will increase coverage by 0.02%.
The diff coverage is 37.73%.

❗ Current head e8b6c40 differs from pull request most recent head aea1385. Consider uploading reports for the commit aea1385 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #696      +/-   ##
==========================================
+ Coverage   48.93%   48.96%   +0.02%     
==========================================
  Files         208      208              
  Lines       15861    15880      +19     
  Branches     2543     2538       -5     
==========================================
+ Hits         7761     7775      +14     
+ Misses       7606     7603       -3     
- Partials      494      502       +8     
Flag Coverage Δ
unittests 48.96% <37.73%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmdet3d/core/bbox/structures/coord_3d_mode.py 60.99% <0.00%> (-0.88%) ⬇️
mmdet3d/models/voxel_encoders/voxel_encoder.py 54.40% <0.00%> (ø)
mmdet3d/models/detectors/single_stage.py 58.62% <12.50%> (-17.57%) ⬇️
...et3d/models/dense_heads/anchor_free_mono3d_head.py 65.31% <33.33%> (+6.67%) ⬆️
mmdet3d/datasets/scannet_dataset.py 78.51% <42.85%> (-9.34%) ⬇️
mmdet3d/datasets/pipelines/transforms_3d.py 90.85% <50.00%> (-0.51%) ⬇️
mmdet3d/models/dense_heads/fcos_mono3d_head.py 12.10% <66.66%> (+0.13%) ⬆️
mmdet3d/models/detectors/voxelnet.py 33.87% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f79fc7...aea1385. Read the comment docs.

data/scannet/README.md Outdated Show resolved Hide resolved
Copy link
Member

@Tai-Wang Tai-Wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would better add some basic comments or docstrings to extract_posed_images.py and add the corresponding contents/explanations in the ScanNet dataset doc.

@Tai-Wang
Copy link
Member

Tai-Wang commented Jul 4, 2021

This PR aims to support RGB images on ScanNet for multi-view detector. More discussion in #620.

It is a bit WIP now as I also plan to add images to ScanNetDataset. For now I just adapted SensReader from original ScanNet repo. Can you please check if we are ok with it? cc @Tai-Wang

It is fine to borrow some official code from ScanNet with the explicit acknowledgement/citation to their credit. Just need some basic comments and explanations in the doc to make this new feature clearer.

@filaPro
Copy link
Contributor Author

filaPro commented Jul 5, 2021

Hi @Tai-Wang ,

Have a question about LoadMultiViewImageFromFiles. Looks like it is now not used in the project and also in my opinion it should be replaced with another class. The point is how to apply per image transforms like Resize or Normalize to the output of this class? It probably will cause MultiViewResize and MultiViewNormalize which looks not good. I think we should use the idea of MultiScaleFlipAug3D(..., transforms=[...]) in the way of MultiViewPipeline(..., transforms=[..., Resize, Normalize]). Do you agree or see a better solution?

Also a bit curious about your LoadImageFromFileMono3D. For some reason in all other datasets we return calibration matrices directly in dataset, and for NuScenesMonoDataset we have a standalone pipeline. Can we move this matrix to NuScenesMonoDataset.pre_pipeline?

Also new depth2img key for ScanNetDataset can affect GlobalAlignment transform. In fact, I do not fully understand why this transform is added in recent release. This is caused by unaligned points in segmentation task. But for this purpose we have separate ScanNetSegData and ScanNetSegDataset. How they affect ScanNetData and ScanNetDataset? For now if someone forget to add this GlobalAlignment is can cause bugs as it rotates only points but not gt_bboxes_3d. In my opinion all alignment stuff can be done in ScanNetData and GlobalAlignment can be removed, but may be I'm completely wrong. So my question is if we keep GlobalAlignment should it also rotate depth2img values for each RGB image or this depth2img can be already returned in aligned version.

Sorry for not very smart questions.

@Tai-Wang
Copy link
Member

Tai-Wang commented Jul 6, 2021

Hi @filaPro ,

LoadMultiViewImageFromFiles is a pipeline designed for MVXNet on nuScenes. Because the release of moca is blocked, it has not been used in current supported models. You can basically keep it for further potential usage and add your pipelines. I agree that following the approach of MultiScaleFlipAug3D to achieve transformations to multi-view images is a good choice.

As for LoadImageFromFileMono3D, it is just because when first designed this pipeline I added some other processing, and at last only the cam_intrinsic information is preserved. I agree that moving it into functions in the datasets like get_ann_infos or parse_ann_infos would be a good idea to make it consistent with other datasets and unify the pipelines. I will consider to refactor it before next release.

@filaPro
Copy link
Contributor Author

filaPro commented Jul 12, 2021

Hi @Tai-Wang ,
I think it is enough for this PR. MultiViewPypeline can be committed in a separate one. I've also visualized a couple of boxes on RGB images, looks fine.

Copy link
Member

@Tai-Wang Tai-Wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost ready to be merged

Copy link
Contributor

@Wuziyi616 Wuziyi616 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@filaPro Good job! From the ImVoxelNet paper, it seems to perform training on training set and report results on val set. So all the gt boxes it uses is axis-aligned, and test set is not involved right? If that's true, this PR is done.

I just want to make sure this point because we perform point cloud alignment in the data loading pipeline. Gt boxes in both aligned and unaligned coordinates are provided.

@filaPro
Copy link
Contributor Author

filaPro commented Jul 14, 2021

Sure, in ImVoxelNet paper only train and val scenes are used.

Btw, not quite understand how train/val/test splits are connected with point cloud alignment? We can not process this alignment on test step if necessary?

@Wuziyi616
Copy link
Contributor

Sure, in ImVoxelNet paper only train and val scenes are used.

Btw, not quite understand how train/val/test splits are connected with point cloud alignment? We can not process this alignment on test step if necessary?

Good question. This is mainly because the axis-aligned matrix required to perform alignment is consider as part of the annotation by ScanNet team. As a result, test scans don't have this matrix and can't be aligned. This doc provide more details about our design of ScanNet pre-processing scripts (which I believe is a bit complicated and confusing at the first glance).

BTW it reminds me that, it would be great if you can add this PR's change in the compatibility doc, so that users will know they may need to re-generate ScanNet data for your ImVoxelNet models (and maybe more multi-view method in the future :)

@filaPro
Copy link
Contributor Author

filaPro commented Jul 14, 2021

Can I add a MMDetection3D 0.16.0 section in this compatibility.md?

@Wuziyi616
Copy link
Contributor

Wuziyi616 commented Jul 14, 2021

Can I add a MMDetection3D 0.16.0 section in this compatibility.md?

Yes, sure! Just follow the pattern :)

Edit: I think you can refer to this part, which is very similar to your changes here.

docs/compatibility.md Outdated Show resolved Hide resolved
@Wuziyi616
Copy link
Contributor

Ready to merge after fixing that typo.

@ZwwWayne ZwwWayne merged commit da4c3af into open-mmlab:master Jul 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants