-
Notifications
You must be signed in to change notification settings - Fork 26.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ZoeDepth #30136
Merged
Merged
Add ZoeDepth #30136
Changes from 100 commits
Commits
Show all changes
108 commits
Select commit
Hold shift + click to select a range
c074ce8
First draft
NielsRogge ca4d141
Fix merge
NielsRogge 420397f
Add docs
NielsRogge 02d775a
Merge remote-tracking branch 'upstream/main' into add_zoedepth
NielsRogge de9d51e
Clean up code
NielsRogge 705f4c6
Convert model
NielsRogge 8080b35
Add image processor
NielsRogge 7e511c2
Convert Zoe_K
NielsRogge ceb079b
More improvements
NielsRogge 331b48d
Improve variable names and docstrings
NielsRogge be57cc6
Improve variable names
NielsRogge 27c013c
Improve variable names
NielsRogge 090bb82
Replace nn.sequential
NielsRogge 74088b3
Merge remote-tracking branch 'upstream/main' into add_zoedepth
NielsRogge 712b483
More improvements
NielsRogge a1f9520
Convert ZoeD_NK
NielsRogge 04cd658
Fix most tests
NielsRogge 73bd15e
Verify pixel values
NielsRogge 2198070
Verify pixel values
NielsRogge 470856b
Add squeeze
NielsRogge ad188e5
Update beit to support arbitrary window sizes
NielsRogge f422f24
Improve image processor
NielsRogge 8c611c3
Improve docstring
NielsRogge 69b3593
Improve beit
NielsRogge 35f86df
Improve model outputs
NielsRogge 0146011
Add figure
NielsRogge a8d7739
Fix beit
NielsRogge 46a6479
Update checkpoint
NielsRogge b693dd4
Fix repo id
NielsRogge bb16289
Add _keys_to_ignore_on_load_unexpected
NielsRogge a9b3070
More improvements
NielsRogge 47ac850
Address comments
NielsRogge ba496e5
Merge remote-tracking branch 'upstream/main' into add_zoedepth
NielsRogge ef8412e
Address comments
NielsRogge 40c8f3d
Address comments
NielsRogge d52b425
Address comments
NielsRogge ff70bff
Rename variable name
NielsRogge 75a8ccb
Add backbone_hidden_size
NielsRogge 94afdbd
Vectorize
NielsRogge 66f3b21
Vectorize more
NielsRogge da53475
Address comments
NielsRogge 76cc537
Clarify docstring
NielsRogge b9f59e2
Remove backbone_hidden_size
NielsRogge c7020b1
Fix image processor
NielsRogge 313e5f1
Remove print statements
NielsRogge 2ec7f1a
Remove print statement
NielsRogge 070aed2
Fix merge
NielsRogge c1d2954
Add integration test
NielsRogge 3d1cc8f
Address comments
NielsRogge 973b538
Address comments
NielsRogge dcdcc7f
Address comments
NielsRogge 5f3cdf8
Address comments
NielsRogge 0f5b499
Add requires_backends
NielsRogge f9bd852
Clean up
NielsRogge 5febc7f
Simplify conversion script
NielsRogge fd97f14
Simplify more
NielsRogge e69fbb8
Simplify more
NielsRogge e097e1f
Simplify more
NielsRogge 53a1992
Clean up
NielsRogge f06d878
Make sure beit is loaded correctly
NielsRogge d044b2b
Address comment
NielsRogge 373ff0f
Address bin_configurations
NielsRogge ccc8d46
Use bin_configurations
NielsRogge 46bf21c
Convert models, add integration tests
NielsRogge 2331935
Fix doc test
NielsRogge 57af8d7
Address comments
NielsRogge bb8e7b2
Unify regressor classes
NielsRogge 96777f8
Clarify arguments
NielsRogge 30a3ac9
Improve resize_image
NielsRogge fb727f1
Add num_relative_features
NielsRogge 7fc2b3b
Merge remote-tracking branch 'upstream/main' into add_zoedepth
NielsRogge a558e02
Address comment
NielsRogge 25e4319
[run-slow]beit,data2vec,zoedepth
NielsRogge a592c8d
Merge remote-tracking branch 'upstream/main' into add_zoedepth
NielsRogge 1397644
[run-slow]beit,data2vec,zoedepth
NielsRogge d77929f
Address comments
NielsRogge db1e4ad
Address comment
NielsRogge 5c80182
Fix merge
NielsRogge 979a4bb
Address comment
NielsRogge 234cbf2
Replace nn.TransformerEncoderLayer and nn.TransformerEncoder
NielsRogge 1244d59
Replace nn.MultiheadAttention
NielsRogge 7d0e82b
Add attributes for patch transformer to config
NielsRogge d194245
Add tests for ensure_multiple_of
NielsRogge 7415bd4
Update organization
NielsRogge e19921b
Add tests
NielsRogge fbcffcc
[run-slow] beit data2vec
NielsRogge bcc3108
Merge remote-tracking branch 'upstream/main' into add_zoedepth
NielsRogge 3ece046
Update ruff
NielsRogge 4893ec9
Merge remote-tracking branch 'upstream/main' into add_zoedepth
NielsRogge 9d82897
[run-slow] beit data2vec
NielsRogge dcdf9b6
Add comment
NielsRogge 344a565
Merge remote-tracking branch 'upstream/main' into add_zoedepth
NielsRogge 525869e
Improve docstrings, add test
NielsRogge 5aad6c7
Fix merge
NielsRogge bcd7ae1
Fix interpolate_pos_encoding
NielsRogge 39e2ca3
Fix slow tests
NielsRogge bde8dda
Add docstring
NielsRogge 80bb3ed
Update src/transformers/models/zoedepth/image_processing_zoedepth.py
NielsRogge 6137387
Update src/transformers/models/zoedepth/image_processing_zoedepth.py
NielsRogge e6d8aac
Improve tests and docstrings
NielsRogge dffbfea
Fix merge
NielsRogge 34a1abb
Use run_common_tests
NielsRogge 1bcd19f
Improve docstrings
NielsRogge c6e5d6f
Improve docstrings
NielsRogge 8ac163f
Improve tests
NielsRogge 6cb3c56
Improve tests
NielsRogge b2534b2
Remove print statements
NielsRogge 617487d
Merge remote-tracking branch 'upstream/main' into add_zoedepth
NielsRogge File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
<!--Copyright 2024 The HuggingFace Team. All rights reserved. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations under the License. | ||
|
||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | ||
rendered properly in your Markdown viewer. | ||
|
||
--> | ||
|
||
# ZoeDepth | ||
|
||
## Overview | ||
|
||
The ZoeDepth model was proposed in [ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth](https://arxiv.org/abs/2302.12288) by Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, Matthias Müller. ZoeDepth extends the [DPT](dpt) framework for metric (also called absolute) depth estimation. ZoeDepth is pre-trained on 12 datasets using relative depth and fine-tuned on two domains (NYU and KITTI) using metric depth. A lightweight head is used with a novel bin adjustment design called metric bins module for each domain. During inference, each input image is automatically routed to the appropriate head using a latent classifier. | ||
|
||
The abstract from the paper is the following: | ||
|
||
*This paper tackles the problem of depth estimation from a single image. Existing work either focuses on generalization performance disregarding metric scale, i.e. relative depth estimation, or state-of-the-art results on specific datasets, i.e. metric depth estimation. We propose the first approach that combines both worlds, leading to a model with excellent generalization performance while maintaining metric scale. Our flagship model, ZoeD-M12-NK, is pre-trained on 12 datasets using relative depth and fine-tuned on two datasets using metric depth. We use a lightweight head with a novel bin adjustment design called metric bins module for each domain. During inference, each input image is automatically routed to the appropriate head using a latent classifier. Our framework admits multiple configurations depending on the datasets used for relative depth pre-training and metric fine-tuning. Without pre-training, we can already significantly improve the state of the art (SOTA) on the NYU Depth v2 indoor dataset. Pre-training on twelve datasets and fine-tuning on the NYU Depth v2 indoor dataset, we can further improve SOTA for a total of 21% in terms of relative absolute error (REL). Finally, ZoeD-M12-NK is the first model that can jointly train on multiple datasets (NYU Depth v2 and KITTI) without a significant drop in performance and achieve unprecedented zero-shot generalization performance to eight unseen datasets from both indoor and outdoor domains.* | ||
|
||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/zoedepth_architecture_bis.png" | ||
alt="drawing" width="600"/> | ||
|
||
<small> ZoeDepth architecture. Taken from the <a href="https://arxiv.org/abs/2302.12288">original paper.</a> </small> | ||
|
||
This model was contributed by [nielsr](https://huggingface.co/nielsr). | ||
The original code can be found [here](https://github.com/isl-org/ZoeDepth). | ||
|
||
## Usage tips | ||
|
||
- ZoeDepth is an absolute (also called metric) depth estimation model, unlike DPT which is a relative depth estimation model. This means that ZoeDepth is able to estimate depth in metric units like meters. | ||
|
||
The easiest to perform inference with ZoeDepth is by leveraging the [pipeline API](../main_classes/pipelines.md): | ||
|
||
```python | ||
from transformers import pipeline | ||
from PIL import Image | ||
import requests | ||
|
||
url = "http://images.cocodataset.org/val2017/000000039769.jpg" | ||
image = Image.open(requests.get(url, stream=True).raw) | ||
|
||
pipe = pipeline(task="depth-estimation", model="Intel/zoedepth-nyu-kitti") | ||
result = pipe(image) | ||
depth = result["depth"] | ||
``` | ||
|
||
Alternatively, one can also perform inference using the classes: | ||
|
||
```python | ||
from transformers import AutoImageProcessor, ZoeDepthForDepthEstimation | ||
import torch | ||
import numpy as np | ||
from PIL import Image | ||
import requests | ||
|
||
url = "http://images.cocodataset.org/val2017/000000039769.jpg" | ||
image = Image.open(requests.get(url, stream=True).raw) | ||
|
||
image_processor = AutoImageProcessor.from_pretrained("Intel/zoedepth-nyu-kitti") | ||
model = ZoeDepthForDepthEstimation.from_pretrained("Intel/zoedepth-nyu-kitti") | ||
|
||
# prepare image for the model | ||
inputs = image_processor(images=image, return_tensors="pt") | ||
|
||
with torch.no_grad(): | ||
outputs = model(**inputs) | ||
predicted_depth = outputs.predicted_depth | ||
|
||
# interpolate to original size | ||
prediction = torch.nn.functional.interpolate( | ||
predicted_depth.unsqueeze(1), | ||
size=image.size[::-1], | ||
mode="bicubic", | ||
align_corners=False, | ||
) | ||
|
||
# visualize the prediction | ||
output = prediction.squeeze().cpu().numpy() | ||
formatted = (output * 255 / np.max(output)).astype("uint8") | ||
depth = Image.fromarray(formatted) | ||
``` | ||
|
||
## Resources | ||
|
||
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with ZoeDepth. | ||
|
||
- A demo notebook regarding inference with ZoeDepth models can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/ZoeDepth). 🌎 | ||
|
||
## ZoeDepthConfig | ||
|
||
[[autodoc]] ZoeDepthConfig | ||
|
||
## ZoeDepthImageProcessor | ||
|
||
[[autodoc]] ZoeDepthImageProcessor | ||
- preprocess | ||
|
||
## ZoeDepthForDepthEstimation | ||
|
||
[[autodoc]] ZoeDepthForDepthEstimation | ||
- forward |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -259,4 +259,5 @@ | |
xmod, | ||
yolos, | ||
yoso, | ||
zoedepth, | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be part of a post processing method in the image processor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is to be done in a follow-up, an issue should be made to make sure it's actually done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue has been opened: #30917