Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TF ResNet model #17427

Merged
merged 47 commits into from
Jul 4, 2022
Merged

Add TF ResNet model #17427

merged 47 commits into from
Jul 4, 2022

Conversation

amyeroberts
Copy link
Collaborator

Adds a tensorflow implementation of the ResNet model + associated tests.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 25, 2022

The documentation is not available anymore as the PR was closed or merged.

@amyeroberts amyeroberts mentioned this pull request Jun 13, 2022
5 tasks
@amyeroberts
Copy link
Collaborator Author

I'm seeing a failure in test_keras_fit - it looks like the outputs are different depending on whether the labels are passed in the input dict or separately. That might actually have nothing to do with the labels and instead be caused by some random differences in the model outputs, though - maybe the training flag isn't being passed correctly so layers like dropout are still being run in training mode during eval time? Alternatively, maybe the tolerances we use for NLP models are just too strict for this one?

@Rocketknight1 Digging into this - I believe this is because of the batch norm layers. Every time the layer is called it updates its moving_mean and moving_variance parameters. During training, the batches are normalised based on the batch stats, which will be exactly the same for both fit calls, because the data isn't shuffled. And we see this - the training loss for the two histories in test_keras_fit are exactly the same. However, at inference the batches are normalised based on the moving_mean and moving_var params. I'm not really sure how to address this. @ydshieh have we handled anything like this with tests before?

Weirdly, the test was passing before. I'm guessing just a fluke?

@Rocketknight1
Copy link
Member

Ahhh, of course! I had thought that running a single iteration of training with a learning rate of 0 would leave the weights unchanged, but that isn't true for BatchNorm, because BatchNorm weights aren't updated by gradient descent. The test was broken and we only got away with it because NLP models generally don't use BatchNorm. I'll fix it tomorrow!

@amyeroberts
Copy link
Collaborator Author

@sgugger Sorry - I didn't mean to re-request for you as you'd already approved!

@slow
def test_model_from_pretrained(self):
for model_name in TF_RESNET_PRETRAINED_MODEL_ARCHIVE_LIST[:1]:
model = TFResNetModel.from_pretrained(model_name, from_pt=True)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove once all approved and weights pushed to hub

@slow
def test_inference_image_classification_head(self):
model = TFResNetForImageClassification.from_pretrained(
TF_RESNET_PRETRAINED_MODEL_ARCHIVE_LIST[0], from_pt=True
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove once all approved and weights pushed to hub

@amyeroberts
Copy link
Collaborator Author

@NielsRogge @Rocketknight1 friendly nudge - let me know if there's any other changes or if I'm good to merge :)

Copy link
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good - my main complaint is that I think we could swap the data format for some Conv and MaxPool layers to improve performance and save memory by skipping some transposes!

Comment on lines +115 to +118
if tf.executing_eagerly() and num_channels != self.num_channels:
raise ValueError(
"Make sure that the channel dimension of the pixel values match with the one set in the configuration."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check should be okay without tf.executing_eagerly(), since it will still work if it's encountered during function tracing, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this as @ydshieh suggested to all vision models for which we can check the number of channels.

I did get an error for one of the tests ofTFViTMAE when not adding this check (during save_pretrained).

Copy link
Collaborator

@ydshieh ydshieh Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following 2 tests (in test_modeling_tf_core.py) will fail

  • test_saved_model_creation
  • test_saved_model_creation_extended

These tests are only run for a few core models, but I think it is good to keep them working.

(I originally introduced these kinds of conditions when adding TF ViT in order to keep @tooslow tests working)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, that makes sense. It can stay!

Comment on lines 149 to 152
hidden_state = tf.transpose(hidden_state, (0, 2, 3, 1))
hidden_state = self.convolution(hidden_state)
# (batch_size, height, width, num_channels) -> (batch_size, num_channels, height, width)
hidden_state = tf.transpose(hidden_state, (0, 3, 1, 2))
Copy link
Member

@Rocketknight1 Rocketknight1 Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just do an NCHW convolution here (data_format="channels_first" for the Conv2D layer) instead of transpose -> NHWC convolution -> untranspose? I believe TF stores the weights in the same shape (H, W, in_dim, out_dim) regardless, so it shouldn't affect crossloading.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I have it this way is that Conv2D can't run on CPU with channels_first . You get this error:

The Conv2D op currently only supports the NHWC tensor format on the CPU. The op was given the format: NCHW [Op:Conv2D]```

src/transformers/models/resnet/modeling_tf_resnet.py Outdated Show resolved Hide resolved
Comment on lines 83 to 86
hidden_state = tf.transpose(hidden_state, (0, 2, 3, 1))
hidden_state = self.conv(hidden_state)
# (batch_size, height, width, num_channels) -> (batch_size, num_channels, height, width)
hidden_state = tf.transpose(hidden_state, (0, 3, 1, 2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just do an NCHW convolution here (data_format="channels_first" for the Conv2D layer) instead of transpose -> NHWC convolution -> untranspose? I believe TF stores the weights in the same shape (H, W, in_dim, out_dim) regardless, so it shouldn't affect crossloading.

Comment on lines 123 to 126
hidden_state = tf.transpose(hidden_state, (0, 2, 3, 1))
hidden_state = self.pooler(hidden_state)
# (batch_size, height, width, num_channels) -> (batch_size, num_channels, height, width)
hidden_state = tf.transpose(hidden_state, (0, 3, 1, 2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also change the data format on the pooler to channels_first to avoid needing to transpose and untranspose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxPool doesn't work with channels_first on CPU either 😞

Default MaxPoolingOp only supports NHWC on device type CPU [Op:MaxPool]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

God I hate Tensorflow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤣

@amyeroberts amyeroberts merged commit 77ea513 into huggingface:main Jul 4, 2022
@amyeroberts amyeroberts deleted the add-tf-resnet branch July 4, 2022 09:59
viclzhu pushed a commit to viclzhu/transformers that referenced this pull request Jul 18, 2022
* Rought TF conversion outline

* Tidy up

* Fix padding differences between layers

* Add back embedder - whoops

* Match test file to main

* Match upstream test file

* Correctly pass and assign image_size parameter

Co-authored-by: Sayak Paul <[email protected]>

* Add in MainLayer

* Correctly name layer

* Tidy up AdaptivePooler

* Small tidy-up

More accurate type hints and remove whitespaces

* Change AdaptiveAvgPool

Use the AdaptiveAvgPool implementation by @Rocketknight1, which correctly pools if the output shape does not evenly divide by input shape c.f. https://github.com/huggingface/transformers/pull/17554/files/9e26607e22aa8d069c86b50196656012ff0ce62a#r900109509

Co-authored-by: From: matt <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>

* Use updated AdaptiveAvgPool

Co-authored-by: matt <[email protected]>

* Make AdaptiveAvgPool compatible with CPU

* Remove image_size from configuration

* Fixup

* Tensorflow -> TensorFlow

* Fix pt references in tests

* Apply suggestions from code review - grammar and wording

Co-authored-by: NielsRogge <[email protected]>

Co-authored-by: NielsRogge <[email protected]>

* Add TFResNet to doc tests

* PR comments - GlobalAveragePooling and clearer comments

* Remove unused import

* Add in keepdims argument

* Add num_channels check

* grammar fix: by -> of

Co-authored-by: matt <[email protected]>

Co-authored-by: Matt <[email protected]>

* Remove transposes - keep NHWC throughout forward pass

* Fixup look sharp

* Add missing layer names

* Final tidy up - remove from_pt now weights on hub

Co-authored-by: Sayak Paul <[email protected]>
Co-authored-by: matt <[email protected]>
Co-authored-by: NielsRogge <[email protected]>
Co-authored-by: Matt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants