Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't do sky? #69

Open
noobtoob4lyfe opened this issue Aug 9, 2024 · 3 comments
Open

Can't do sky? #69

noobtoob4lyfe opened this issue Aug 9, 2024 · 3 comments

Comments

@noobtoob4lyfe
Copy link

noobtoob4lyfe commented Aug 9, 2024

Thanks for sharing your great work.
I'm encountering an issue with outdoor shots that show the sky. Some frames will place the sky in the foreground and some will not, causing huge temporal inconsistency. Is there any way to get it to ignore the sky?
image

@lpiccinelli-eth
Copy link
Owner

Which model are you using?

Many stereo-based GT (HRWSI, CityScapes, or even BlendedMVS) give to the sky region really close-by values and sensor-based datasets never have GT on the sky, in addition, we do not model sky in any way, i.e. external segmentation model to set an arbitrarily large value, etc...
All these combined lead to the model being extremely uncertain on sky regions.

There are different ways you can try:

  1. Hopefully the sky regions are considered low-confidence regions, so you can use confidence to mask them out
  2. Use an external segmentation model: segmentation models targeted to the sky only are pretty efficient and fast, for instance, you do not need (grounded) SAM, or similar foundation models, to do it.

Anyway, I see that the output is extremely blurry, are you using the infer method or using similar resizing methods?
An OOD image shape leads to quite degraded performance, especially for ViT-based architecture. As a general rule of thumb, for geometric tasks, it is better to go for shorter edge-based resizing and padding to fit the aspect ratio given in the config or follow the original papers' implementation details. For instance, ZoeDepth or DepthAnything implementations include brute-forcefully resizing the input image to a given and fixed shape, i.e. modifying the original aspect ratio.

@noobtoob4lyfe
Copy link
Author

Thanks for your reply. I'm using this onnx implementation with the model he linked on huggingface. https://github.com/ibaiGorordo/ONNX-Unidepth-Monocular-Metric-Depth-Estimation
Would I have better luck with the main implementation you think?
Thanks for your suggestions.

@lpiccinelli-eth
Copy link
Owner

I think trying the GPU model from this repo would be better first. If you see the same artifacts, then the problem is from the model itself. AS I think that some tiny operational mismatch might have been introduced when "forcing" to be compliant to onnx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants