Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

bounds and downsampling factor for load_llff_data_multi_view #11

Open
andrewsonga opened this issue Nov 27, 2021 · 5 comments
Open

bounds and downsampling factor for load_llff_data_multi_view #11

andrewsonga opened this issue Nov 27, 2021 · 5 comments

Comments

@andrewsonga
Copy link

First of all, thank you for releasing your impactful work!
I'm trying to train NRNeRF on multi-view data from 8 synchronized cameras with known intrinsics and extrinsics, and I ran into a couple questions regarding the bounds and the downsampling factor.

1. Are the parameters min_bound and max_bound defined as the minimum and maximum across all cameras?

I noticed that in the README.md, there is a single min_bound and max_bound that is shared between all cameras when specifying calibration.json, as opposed to there being one for each camera.

2. When using load_llff_data_multi_view, if our training images are downsampled from their original resolution by a certain factor, are there any parts of the calibration.json (i.e. camera intrinsics / extrinsics) we have to accordingly adjust to account for the downsampling factor?

I'm asking this question because that downsampling images by a factor is not implemented in load_llff_data_multi_view, but load_llff_data appears to be using factor in a couple of cases (https://github.com/yenchenlin/nerf-pytorch/blob/a15fd7cb363e93f933012fd1f1ad5395302f63a4/load_llff.py#L76, https://github.com/yenchenlin/nerf-pytorch/blob/a15fd7cb363e93f933012fd1f1ad5395302f63a4/load_llff.py#L103).

Thank you in advance for reading this long question.
I look forward to reading your response.

@edgar-tr
Copy link
Contributor

edgar-tr commented Nov 27, 2021 via email

@andrewsonga
Copy link
Author

andrewsonga commented Nov 27, 2021

Thank you for the swift response!
I have just a few more follow-up questions:

1. Do we have to adjust min_bound and max_bound according to the downsampling factor?

2. Do you think using min_bounds and max_bounds in the poses_bounds.npy file generated by running colmap as follows (https://colmap.github.io/faq.html#reconstruct-sparse-dense-model-from-known-camera-poses) constitutes a good heuristic for multi-view?

  • I ran colmap on multi-view images from a single timestep to estimate the 3D points, and used the 1% and 99% percentile depth values to define the min_bounds and max_bounds for each camera; the shared min_bound and max_bound then would become the minimum and maximum, respectively, across all caermas.

3. Where is min_bound and max_bound used? is it used as the integration bounds for volume rendering?

4. If so, what is the harm of heuristically setting min_bound as 0 and max_bound as a very large number?

@edgar-tr
Copy link
Contributor

edgar-tr commented Nov 28, 2021 via email

@andrewsonga
Copy link
Author

andrewsonga commented Nov 29, 2021

Thank you for the detailed response! I heeded your instructions carefully, but my renderings are coming out super weirdly and I can't seem to figure out why. The following are the first five renderings for --camera_path spiral:
Screen Shot 2021-11-29 at 6 00 44 AM
Screen Shot 2021-11-29 at 6 00 51 AM
Screen Shot 2021-11-29 at 6 01 04 AM
Screen Shot 2021-11-29 at 6 09 24 AM
Screen Shot 2021-11-29 at 6 09 31 AM

The first frame of my multi-view video look like this:

Screen Shot 2021-11-29 at 6 04 35 AM

Screen Shot 2021-11-29 at 6 04 40 AM

Screen Shot 2021-11-29 at 6 04 46 AM

Screen Shot 2021-11-29 at 6 04 52 AM

Screen Shot 2021-11-29 at 6 05 01 AM

Screen Shot 2021-11-29 at 6 05 08 AM

Screen Shot 2021-11-29 at 6 05 13 AM

Screen Shot 2021-11-29 at 6 05 19 AM

Are there any modifications I need to make to free_viewpoint_rendering.py in order to make it work for multi-view datasets? For instance, do we have to change load_llff_data to load_llff_data_multi_view in free_viewpoint_rendering.py as well as train.py?

@edgar-tr
Copy link
Contributor

I have never tried running the multi-view code with rendering. The spiral code might be too sensitive, you could try the static or input reconstruction rendering. Changing to load_llff_data_multi_view sounds reasonable, but again, I have not tried that part.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants