You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 12, 2024. It is now read-only.
Hi, thanks a lot for open-sourcing the code for this great work. I went through the shared data and found there are several types of masks. May I know whether you mind elaborating more on them? Specifically:
mv_masks: used in eval_nvidia.py. My guess is that this is the mask from multiview processing, following the procedure described in NSFF's supplementary. Is this correct?
coarse_masks: used in eval_nvidia.py. Is this the mask computed by the motion segmentation module in Sec. 3.3 in the paper? I guess it is provided for reproducibility purposes since the segmentation module is not released as discussed in Question about static_masks, dynamic_masks #9 . Is this correct?
a. What is related is coarse_masks is used for feeding masked RGB to feature_net_fine (see here). However, I searched the code repo, feature_net_fine is only used in eval_nvidia.py and it is never called during training. Is this intentional or do I miss something?
static_masks and dynamic_masks: they are used in the dataloader for motion_mask and static_mask. They are then used for training loss as in dynamic_rgb_loss and static_rgb_loss. I am confused by this due to the following:
a. For the kid-running example, it seems like the static_masks and dynamic_masks are not complementary (even taking erosion into account). Namely, combining the two masks cannot produce a white mask. May I know how you obtain these two?
b. Based on the paper's Eq. 7 and Question about static_masks, dynamic_masks #9, it seems like ideally, the mask should come from the motion segmentation module. May I know whether the provided static_mask in the NVIDIA Dynamic Dataset is for approximation purposes? If so, what tools do you use to obtain them? Actually, I am confused about why we need these two if coarse_masks have already been provided.
c. Actually, from Eq. 7, I guess we only need one mask, either static or dynamic. Any specific reasons why we need the other separate mask?
Related to the 3 above, to obtain consistent depth estimation, we need to run dynamic_cvd. For this, we need to provide a mask. May I know which mask DynIBaR uses to obtain the disparity, e.g., coarse mask, dynamic mask, etc?
Thanks a lot for your time in advance and congrats again on this great work.
The text was updated successfully, but these errors were encountered:
coarse_masks is used for masking the potential moving objects and actually was derived by coarse model renderings by thresholding regions where percentage of rendering from dynamic model is > percentage of rendering from static model. This is not necessary for benchmarks (i.e. even without coarse_masks, the rendering quality and metrics should be the same. I used them in both cases just for some compatibility reason, and you might ignore this part).
The training code for monocular video is slightly different from evaluation for benchmarks. For monocular video training code, we didn't incorporate coarse-to-fine strategy since we didn't find big difference but make training much slower. For training code, this is the place where masked RGB image is feed to the encoder: https://github.com/google/dynibar/blob/main/train.py#L167C56-L167C71.
The black regions in both static_masks and dynamic_masks should correspond to moving regions. We construct them with different threshold value from motion segmentation module. Also mentioned in supp material, to make sure any potential moving pixels are covered in real monocular video, we also combine result from motion segmentation module with semantic segmentation output (by union them similar to NSFF preprocessing code) to get final mask outputs. static_masks in Nvidia dataset was obtained by motion segmentation module and no semantic segmentation model was used in this case.
for dynamic_cvd we actually use semantic mask from Mask-RCNN, but should be possible to replace with the mask from motion segmentation but we found that the mask is not critical for the regularization loss to work.
Hi, thanks a lot for open-sourcing the code for this great work. I went through the shared data and found there are several types of masks. May I know whether you mind elaborating more on them? Specifically:
mv_masks
: used ineval_nvidia.py
. My guess is that this is the mask from multiview processing, following the procedure described in NSFF's supplementary. Is this correct?coarse_masks
: used ineval_nvidia.py
. Is this the mask computed by the motion segmentation module in Sec. 3.3 in the paper? I guess it is provided for reproducibility purposes since the segmentation module is not released as discussed in Question about static_masks, dynamic_masks #9 . Is this correct?a. What is related is
coarse_masks
is used for feeding masked RGB tofeature_net_fine
(see here). However, I searched the code repo,feature_net_fine
is only used ineval_nvidia.py
and it is never called during training. Is this intentional or do I miss something?static_masks
anddynamic_masks
: they are used in the dataloader for motion_mask and static_mask. They are then used for training loss as in dynamic_rgb_loss and static_rgb_loss. I am confused by this due to the following:a. For the kid-running example, it seems like the
static_masks
anddynamic_masks
are not complementary (even taking erosion into account). Namely, combining the two masks cannot produce a white mask. May I know how you obtain these two?b. Based on the paper's Eq. 7 and Question about static_masks, dynamic_masks #9, it seems like ideally, the mask should come from the motion segmentation module. May I know whether the provided
static_mask
in the NVIDIA Dynamic Dataset is for approximation purposes? If so, what tools do you use to obtain them? Actually, I am confused about why we need these two ifcoarse_masks
have already been provided.c. Actually, from Eq. 7, I guess we only need one mask, either static or dynamic. Any specific reasons why we need the other separate mask?
Related to the 3 above, to obtain consistent depth estimation, we need to run
dynamic_cvd
. For this, we need to provide a mask. May I know which mask DynIBaR uses to obtain the disparity, e.g., coarse mask, dynamic mask, etc?Thanks a lot for your time in advance and congrats again on this great work.
The text was updated successfully, but these errors were encountered: