Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve vision models #17731

Merged
merged 16 commits into from
Jun 24, 2022
Merged

Conversation

NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Jun 16, 2022

What does this PR do?

This PR improves the vision models by:

  • removing to_2tuple
  • sanity checking whether the channel dimension of pixel values provided to the model match with config.num_channels
  • replacing hardcoded 3 with config.num_channels for xxxForMaskedImageModeling models (fixes SimMIM output num_channels should not be hardcoded #17727)
  • replacing hardcoded 3 by config.num_channels in Flax models (ViT, BEiT)

To do:

  • ViT
  • BEiT
  • DeiT
  • Swin
  • PoolFormer
  • DPT
  • YOLOS
  • ViLT
  • GLPN
  • DPT
  • Data2VecVision
  • MaskFormer
  • ViTMAE
  • TF and Flax implementations
  • Corresponding test files
  • add more Copied from statements (e.g. DropPath)

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 16, 2022

The documentation is not available anymore as the PR was closed or merged.

This was referenced Jun 22, 2022
@NielsRogge NielsRogge mentioned this pull request Jun 22, 2022
5 tasks
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup! Thanks for working on it!

src/transformers/models/beit/modeling_beit.py Outdated Show resolved Hide resolved
src/transformers/models/deit/modeling_deit.py Outdated Show resolved Hide resolved
src/transformers/models/dpt/modeling_dpt.py Outdated Show resolved Hide resolved
src/transformers/models/vit/modeling_vit.py Outdated Show resolved Hide resolved
src/transformers/models/vit/modeling_vit.py Outdated Show resolved Hide resolved
src/transformers/models/vit_mae/modeling_tf_vit_mae.py Outdated Show resolved Hide resolved
src/transformers/models/vit_mae/modeling_vit_mae.py Outdated Show resolved Hide resolved
src/transformers/models/yolos/modeling_yolos.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for making all these changes 🧹🧹🧹

Just some small comments about tests, but otherwise LGTM :)

src/transformers/models/cvt/modeling_cvt.py Show resolved Hide resolved
tests/models/swin/test_modeling_swin.py Show resolved Hide resolved
tests/models/deit/test_modeling_deit.py Show resolved Hide resolved
tests/models/yolos/test_modeling_yolos.py Outdated Show resolved Hide resolved
@NielsRogge NielsRogge merged commit 0917870 into huggingface:main Jun 24, 2022
amyeroberts added a commit to amyeroberts/transformers that referenced this pull request Jun 24, 2022
younesbelkada pushed a commit to younesbelkada/transformers that referenced this pull request Jun 25, 2022
* Improve vision models

* Add a lot of improvements

* Remove to_2tuple from swin tests

* Fix TF Swin

* Fix more tests

* Fix copies

* Improve more models

* Fix ViTMAE test

* Add channel check for TF models

* Add proper channel check for TF models

* Apply suggestion from code review

* Apply suggestions from code review

* Add channel check for Flax models, apply suggestion

* Fix bug

* Add tests for greyscale images

* Add test for interpolation of pos encodigns

Co-authored-by: Niels Rogge <[email protected]>
younesbelkada pushed a commit to younesbelkada/transformers that referenced this pull request Jun 29, 2022
* Improve vision models

* Add a lot of improvements

* Remove to_2tuple from swin tests

* Fix TF Swin

* Fix more tests

* Fix copies

* Improve more models

* Fix ViTMAE test

* Add channel check for TF models

* Add proper channel check for TF models

* Apply suggestion from code review

* Apply suggestions from code review

* Add channel check for Flax models, apply suggestion

* Fix bug

* Add tests for greyscale images

* Add test for interpolation of pos encodigns

Co-authored-by: Niels Rogge <[email protected]>
amyeroberts added a commit that referenced this pull request Jul 13, 2022
* Initial TF DeiT implementation

* Fix copies naming issues

* Fix up + docs

* Properly same main layer

* Name layers properly

* Initial TF DeiT implementation

* Fix copies naming issues

* Fix up + docs

* Properly same main layer

* Name layers properly

* Fixup

* Fix import

* Fix import

* Fix import

* Fix weight loading for tests whilst not on hub

* Add doc tests and remove to_2tuple

* Add back to_2tuple
Removing to_2tuple results in many downstream changes needed because of the copies checks

* Incorporate updates in Improve vision models #17731 PR

* Don't hard code num_channels

* Copy PyTorch DeiT embeddings and remove pytorch operations with mask

* Fix patch embeddings & tidy up

* Update PixelShuffle to move logic into class layer

* Update doc strings - remove PT references

* Use NHWC format in internal layers

* Fix up

* Use linear activation layer

* Remove unused import

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: NielsRogge <[email protected]>

Co-authored-by: NielsRogge <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>

* Move dataclass to top of file

* Remove from_pt now weights on hub

* Fixup

Co-authored-by: NielsRogge <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: Amy Roberts <[email protected]>
viclzhu pushed a commit to viclzhu/transformers that referenced this pull request Jul 18, 2022
* Initial TF DeiT implementation

* Fix copies naming issues

* Fix up + docs

* Properly same main layer

* Name layers properly

* Initial TF DeiT implementation

* Fix copies naming issues

* Fix up + docs

* Properly same main layer

* Name layers properly

* Fixup

* Fix import

* Fix import

* Fix import

* Fix weight loading for tests whilst not on hub

* Add doc tests and remove to_2tuple

* Add back to_2tuple
Removing to_2tuple results in many downstream changes needed because of the copies checks

* Incorporate updates in Improve vision models huggingface#17731 PR

* Don't hard code num_channels

* Copy PyTorch DeiT embeddings and remove pytorch operations with mask

* Fix patch embeddings & tidy up

* Update PixelShuffle to move logic into class layer

* Update doc strings - remove PT references

* Use NHWC format in internal layers

* Fix up

* Use linear activation layer

* Remove unused import

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: NielsRogge <[email protected]>

Co-authored-by: NielsRogge <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>

* Move dataclass to top of file

* Remove from_pt now weights on hub

* Fixup

Co-authored-by: NielsRogge <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: Amy Roberts <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SimMIM output num_channels should not be hardcoded
5 participants