Improve vision models #17731

NielsRogge · 2022-06-16T12:16:03Z

What does this PR do?

This PR improves the vision models by:

removing to_2tuple
sanity checking whether the channel dimension of pixel values provided to the model match with config.num_channels
replacing hardcoded 3 with config.num_channels for xxxForMaskedImageModeling models (fixes SimMIM output num_channels should not be hardcoded #17727)
replacing hardcoded 3 by config.num_channels in Flax models (ViT, BEiT)

To do:

HuggingFaceDocBuilderDev · 2022-06-16T12:25:16Z

The documentation is not available anymore as the PR was closed or merged.

src/transformers/models/swin/modeling_tf_swin.py

sgugger

Nice cleanup! Thanks for working on it!

src/transformers/models/beit/modeling_beit.py

src/transformers/models/data2vec/modeling_data2vec_vision.py

src/transformers/models/data2vec/modeling_tf_data2vec_vision.py

src/transformers/models/deit/modeling_deit.py

src/transformers/models/dpt/modeling_dpt.py

src/transformers/models/vit/modeling_vit.py

src/transformers/models/vit_mae/modeling_tf_vit_mae.py

src/transformers/models/vit_mae/modeling_vit_mae.py

src/transformers/models/yolos/modeling_yolos.py

amyeroberts

Nice! Thanks for making all these changes 🧹🧹🧹

Just some small comments about tests, but otherwise LGTM :)

src/transformers/models/cvt/modeling_cvt.py

tests/models/swin/test_modeling_swin.py

tests/models/deit/test_modeling_deit.py

tests/models/yolos/test_modeling_yolos.py

* Improve vision models * Add a lot of improvements * Remove to_2tuple from swin tests * Fix TF Swin * Fix more tests * Fix copies * Improve more models * Fix ViTMAE test * Add channel check for TF models * Add proper channel check for TF models * Apply suggestion from code review * Apply suggestions from code review * Add channel check for Flax models, apply suggestion * Fix bug * Add tests for greyscale images * Add test for interpolation of pos encodigns Co-authored-by: Niels Rogge <[email protected]>

* Initial TF DeiT implementation * Fix copies naming issues * Fix up + docs * Properly same main layer * Name layers properly * Initial TF DeiT implementation * Fix copies naming issues * Fix up + docs * Properly same main layer * Name layers properly * Fixup * Fix import * Fix import * Fix import * Fix weight loading for tests whilst not on hub * Add doc tests and remove to_2tuple * Add back to_2tuple Removing to_2tuple results in many downstream changes needed because of the copies checks * Incorporate updates in Improve vision models #17731 PR * Don't hard code num_channels * Copy PyTorch DeiT embeddings and remove pytorch operations with mask * Fix patch embeddings & tidy up * Update PixelShuffle to move logic into class layer * Update doc strings - remove PT references * Use NHWC format in internal layers * Fix up * Use linear activation layer * Remove unused import * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> * Move dataclass to top of file * Remove from_pt now weights on hub * Fixup Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Amy Roberts <[email protected]>

* Initial TF DeiT implementation * Fix copies naming issues * Fix up + docs * Properly same main layer * Name layers properly * Initial TF DeiT implementation * Fix copies naming issues * Fix up + docs * Properly same main layer * Name layers properly * Fixup * Fix import * Fix import * Fix import * Fix weight loading for tests whilst not on hub * Add doc tests and remove to_2tuple * Add back to_2tuple Removing to_2tuple results in many downstream changes needed because of the copies checks * Incorporate updates in Improve vision models huggingface#17731 PR * Don't hard code num_channels * Copy PyTorch DeiT embeddings and remove pytorch operations with mask * Fix patch embeddings & tidy up * Update PixelShuffle to move logic into class layer * Update doc strings - remove PT references * Use NHWC format in internal layers * Fix up * Use linear activation layer * Remove unused import * Apply suggestions from code review Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> * Move dataclass to top of file * Remove from_pt now weights on hub * Fixup Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Amy Roberts <[email protected]>

This was referenced Jun 22, 2022

Add TF DeiT implementation #17806

Merged

Add VideoMAE #17821

Merged

Add DPT Flax #17779

Closed

Niels Rogge and others added 7 commits June 22, 2022 14:03

Improve vision models

d89dc40

Add a lot of improvements

52d76e0

Remove to_2tuple from swin tests

8c1f418

Fix TF Swin

1c6aa77

Fix more tests

fef92c6

Fix copies

6993ecd

Improve more models

33b720a

NielsRogge force-pushed the fix_simmim_channels branch from dcd728c to 33b720a Compare June 22, 2022 14:03

Fix ViTMAE test

9653777

NielsRogge mentioned this pull request Jun 22, 2022

Add TF ResNet model #17427

Merged

5 tasks

Add channel check for TF models

57ff820

NielsRogge mentioned this pull request Jun 22, 2022

TF implementation of RegNets #17554

Merged

NielsRogge commented Jun 22, 2022

View reviewed changes

src/transformers/models/swin/modeling_tf_swin.py Outdated Show resolved Hide resolved

NielsRogge requested review from sgugger and amyeroberts June 22, 2022 15:43

sgugger approved these changes Jun 22, 2022

View reviewed changes

amyeroberts approved these changes Jun 22, 2022

View reviewed changes

src/transformers/models/cvt/modeling_cvt.py Show resolved Hide resolved

tests/models/swin/test_modeling_swin.py Show resolved Hide resolved

tests/models/deit/test_modeling_deit.py Show resolved Hide resolved

tests/models/yolos/test_modeling_yolos.py Outdated Show resolved Hide resolved

NielsRogge added 7 commits June 23, 2022 09:57

Add proper channel check for TF models

e10b6ca

Apply suggestion from code review

4b947b3

Apply suggestions from code review

1d28d94

Add channel check for Flax models, apply suggestion

84ce118

Fix bug

aee248b

Add tests for greyscale images

465ab11

Add test for interpolation of pos encodigns

08efa9d

NielsRogge merged commit 0917870 into huggingface:main Jun 24, 2022

amyeroberts added a commit to amyeroberts/transformers that referenced this pull request Jun 24, 2022

Incorporate updates in Improve vision models huggingface#17731 PR

79c85dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve vision models #17731

Improve vision models #17731

NielsRogge commented Jun 16, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 16, 2022 •

edited

Loading

sgugger left a comment

amyeroberts left a comment

Improve vision models #17731

Improve vision models #17731

Conversation

NielsRogge commented Jun 16, 2022 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Jun 16, 2022 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

NielsRogge commented Jun 16, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 16, 2022 •

edited

Loading