-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TF ResNet model #17427
Add TF ResNet model #17427
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Co-authored-by: Sayak Paul <[email protected]>
@Rocketknight1 Digging into this - I believe this is because of the batch norm layers. Every time the layer is called it updates its Weirdly, the test was passing before. I'm guessing just a fluke? |
Ahhh, of course! I had thought that running a single iteration of training with a learning rate of 0 would leave the weights unchanged, but that isn't true for |
@sgugger Sorry - I didn't mean to re-request for you as you'd already approved! |
@slow | ||
def test_model_from_pretrained(self): | ||
for model_name in TF_RESNET_PRETRAINED_MODEL_ARCHIVE_LIST[:1]: | ||
model = TFResNetModel.from_pretrained(model_name, from_pt=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To remove once all approved and weights pushed to hub
@slow | ||
def test_inference_image_classification_head(self): | ||
model = TFResNetForImageClassification.from_pretrained( | ||
TF_RESNET_PRETRAINED_MODEL_ARCHIVE_LIST[0], from_pt=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To remove once all approved and weights pushed to hub
@NielsRogge @Rocketknight1 friendly nudge - let me know if there's any other changes or if I'm good to merge :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good - my main complaint is that I think we could swap the data format for some Conv and MaxPool layers to improve performance and save memory by skipping some transposes!
if tf.executing_eagerly() and num_channels != self.num_channels: | ||
raise ValueError( | ||
"Make sure that the channel dimension of the pixel values match with the one set in the configuration." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check should be okay without tf.executing_eagerly()
, since it will still work if it's encountered during function tracing, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this as @ydshieh suggested to all vision models for which we can check the number of channels.
I did get an error for one of the tests ofTFViTMAE
when not adding this check (during save_pretrained
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following 2 tests (in test_modeling_tf_core.py
) will fail
- test_saved_model_creation
- test_saved_model_creation_extended
These tests are only run for a few core models, but I think it is good to keep them working.
(I originally introduced these kinds of conditions when adding TF ViT in order to keep @tooslow tests working)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, that makes sense. It can stay!
hidden_state = tf.transpose(hidden_state, (0, 2, 3, 1)) | ||
hidden_state = self.convolution(hidden_state) | ||
# (batch_size, height, width, num_channels) -> (batch_size, num_channels, height, width) | ||
hidden_state = tf.transpose(hidden_state, (0, 3, 1, 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just do an NCHW convolution here (data_format="channels_first" for the Conv2D layer) instead of transpose -> NHWC convolution -> untranspose? I believe TF stores the weights in the same shape (H, W, in_dim, out_dim)
regardless, so it shouldn't affect crossloading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I have it this way is that Conv2D can't run on CPU with channels_first
. You get this error:
The Conv2D op currently only supports the NHWC tensor format on the CPU. The op was given the format: NCHW [Op:Conv2D]```
hidden_state = tf.transpose(hidden_state, (0, 2, 3, 1)) | ||
hidden_state = self.conv(hidden_state) | ||
# (batch_size, height, width, num_channels) -> (batch_size, num_channels, height, width) | ||
hidden_state = tf.transpose(hidden_state, (0, 3, 1, 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just do an NCHW convolution here (data_format="channels_first" for the Conv2D layer) instead of transpose -> NHWC convolution -> untranspose? I believe TF stores the weights in the same shape (H, W, in_dim, out_dim)
regardless, so it shouldn't affect crossloading.
hidden_state = tf.transpose(hidden_state, (0, 2, 3, 1)) | ||
hidden_state = self.pooler(hidden_state) | ||
# (batch_size, height, width, num_channels) -> (batch_size, num_channels, height, width) | ||
hidden_state = tf.transpose(hidden_state, (0, 3, 1, 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also change the data format on the pooler
to channels_first
to avoid needing to transpose and untranspose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MaxPool doesn't work with channels_first
on CPU either 😞
Default MaxPoolingOp only supports NHWC on device type CPU [Op:MaxPool]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
God I hate Tensorflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤣
Co-authored-by: matt <[email protected]> Co-authored-by: Matt <[email protected]>
…nto add-tf-resnet
* Rought TF conversion outline * Tidy up * Fix padding differences between layers * Add back embedder - whoops * Match test file to main * Match upstream test file * Correctly pass and assign image_size parameter Co-authored-by: Sayak Paul <[email protected]> * Add in MainLayer * Correctly name layer * Tidy up AdaptivePooler * Small tidy-up More accurate type hints and remove whitespaces * Change AdaptiveAvgPool Use the AdaptiveAvgPool implementation by @Rocketknight1, which correctly pools if the output shape does not evenly divide by input shape c.f. https://github.com/huggingface/transformers/pull/17554/files/9e26607e22aa8d069c86b50196656012ff0ce62a#r900109509 Co-authored-by: From: matt <[email protected]> Co-authored-by: Sayak Paul <[email protected]> * Use updated AdaptiveAvgPool Co-authored-by: matt <[email protected]> * Make AdaptiveAvgPool compatible with CPU * Remove image_size from configuration * Fixup * Tensorflow -> TensorFlow * Fix pt references in tests * Apply suggestions from code review - grammar and wording Co-authored-by: NielsRogge <[email protected]> Co-authored-by: NielsRogge <[email protected]> * Add TFResNet to doc tests * PR comments - GlobalAveragePooling and clearer comments * Remove unused import * Add in keepdims argument * Add num_channels check * grammar fix: by -> of Co-authored-by: matt <[email protected]> Co-authored-by: Matt <[email protected]> * Remove transposes - keep NHWC throughout forward pass * Fixup look sharp * Add missing layer names * Final tidy up - remove from_pt now weights on hub Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: matt <[email protected]> Co-authored-by: NielsRogge <[email protected]> Co-authored-by: Matt <[email protected]>
Adds a tensorflow implementation of the ResNet model + associated tests.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.