Finetuning doesn't initialize microsoft/resnet classifier weights with _fast_init #31841

williford · 2024-07-08T14:33:15Z

System Info

It seems that the changes with #11471 broke fine-tuning of ResNet
(when the number of classes is being changed).

It seems like most models handle this by adding Linear to the following:

transformers/src/transformers/models/resnet/modeling_resnet.py

Line 274 in ae9dd02

def _init_weights(self, module):

However, it seems like it would be better to handle it when the mismatch size is detected in modeling_utils.py:

transformers/src/transformers/modeling_utils.py

Line 4282 in ae9dd02

mismatched_keys += _find_mismatched_keys(

Who can help?

@amyeroberts

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

E.g.

> AutoModelForImageClassification.from_pretrained("microsoft/resnet-50", num_labels=10, ignore_mismatched_sizes=True).classifier[1].
weight.absolute().max()
Some weights of ResNetForImageClassification were not initialized from the model checkpoint at microsoft/resnet-50 and are newly initialized because the shapes did not match:
- classifier.1.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([10]) in the model instantiated
- classifier.1.weight: found shape torch.Size([1000, 2048]) in the checkpoint and torch.Size([10, 2048]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
tensor(1.7014e+38, grad_fn=<MaxBackward1>)

# Sometimes the same command gives NaN:
> AutoModelForImageClassification.from_pretrained("microsoft/resnet-50", num_labels=10, ignore_mismatched_sizes=True).classifier[1].weight.absolute().max()
Some weights of ResNetForImageClassification were not initialized from the model checkpoint at microsoft/resnet-50 and are newly initialized because the shapes did not match:
- classifier.1.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([10]) in the model instantiated
- classifier.1.weight: found shape torch.Size([1000, 2048]) in the checkpoint and torch.Size([10, 2048]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
tensor(nan, grad_fn=<MaxBackward1>)


# no change in the number of labels
> AutoModelForImageClassification.from_pretrained("microsoft/resnet-50", num_labels=1000, ignore_mismatched_sizes=True).classifier[1
].weight.absolute().max()
tensor(4.7245, grad_fn=<MaxBackward1>)

# change weights
> AutoModelForImageClassification.from_pretrained("microsoft/resnet-50", num_labels=1001, ignore_mismatched_sizes=True ).classifier[1].weight.absolute().max()
Some weights of ResNetForImageClassification were not initialized from the model checkpoint at microsoft/resnet-50 and are newly initialized because the shapes did not match:
- classifier.1.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([1001]) in the model instantiated
- classifier.1.weight: found shape torch.Size([1000, 2048]) in the checkpoint and torch.Size([1001, 2048]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
tensor(1.8520e-40, grad_fn=<MaxBackward1>)

> AutoModelForImageClassification.from_pretrained("microsoft/resnet-50", num_labels=10000, ignore_mismatched_sizes=True).classifier[
1].weight.absolute().max()
Some weights of ResNetForImageClassification were not initialized from the model checkpoint at microsoft/resnet-50 and are newly initialized because the shapes did not match:
- classifier.1.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([10000]) in the model instantiated
- classifier.1.weight: found shape torch.Size([1000, 2048]) in the checkpoint and torch.Size([10000, 2048]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
tensor(0., grad_fn=<MaxBackward1>)

Disabling the _fast_init fixes the issue:

> AutoModelForImageClassification.from_pretrained("microsoft/resnet-50", num_labels=10000, ignore_mismatched_sizes=True, _fast_init=False).classifier[1].weight.absolute().max()
Some weights of ResNetForImageClassification were not initialized from the model checkpoint at microsoft/resnet-50 and are newly initialized because the shapes did not match:
- classifier.1.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([10000]) in the model instantiated
- classifier.1.weight: found shape torch.Size([1000, 2048]) in the checkpoint and torch.Size([10000, 2048]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
tensor(0.0221, grad_fn=<MaxBackward1>)

Expected behavior

The statistics of the initialized weights should be similar with and without the _fast_init - importantly, it shouldn't contain NaN's and the maximum absolute values shouldn't be 0 or really large (e.g. > 1e20).

The text was updated successfully, but these errors were encountered:

NielsRogge · 2024-07-08T15:54:47Z

cc @ydshieh who worked on a similar issue which was fixed by #28122

ydshieh · 2024-07-08T15:58:01Z

Hi @williford

Could you share your system info with us? You can run the command transformers-cli env and copy-paste its output below.

williford · 2024-07-08T16:39:57Z

For the reproduction I installed transformers with pip install git+https://github.com/huggingface/transformers:

transformers version: 4.43.0.dev0
Platform: Linux
Python version: 3.12.4
Huggingface_hub version: 0.23.4
Safetensors version: 0.4.3
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.3.1.post300 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: no
Using GPU in script?: no
GPU type: NVIDIA GeForce RTX 3090

williford · 2024-07-08T17:24:20Z

@ydshieh If I'm understanding the code correctly, your change makes sure the model._initialize_weights is called. ResNetForImageClassification inherits from ResNetPreTrainedModel, which overloads _init_weights. However, ResNetPreTrainedModel doesn't do anything when the module is a torch.nn.module.linear.Linear.

When fast_init is not set, then the Linear module initializes the weights via the "reset_parameters" method.

ydshieh · 2024-07-09T08:41:27Z

@williford Thank you for diving into this issue. Yes, you are correct! I opened a PR to fix it and it works now.

ydshieh self-assigned this Jul 8, 2024

ydshieh mentioned this issue Jul 9, 2024

Fix _init_weights for ResNetPreTrainedModel #31851

Merged

ydshieh closed this as completed in #31851 Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuning doesn't initialize microsoft/resnet classifier weights with _fast_init #31841

Finetuning doesn't initialize microsoft/resnet classifier weights with _fast_init #31841

williford commented Jul 8, 2024 •

edited

Loading

NielsRogge commented Jul 8, 2024

ydshieh commented Jul 8, 2024

williford commented Jul 8, 2024

williford commented Jul 8, 2024 •

edited

Loading

ydshieh commented Jul 9, 2024

Finetuning doesn't initialize microsoft/resnet classifier weights with _fast_init #31841

Finetuning doesn't initialize microsoft/resnet classifier weights with _fast_init #31841

Comments

williford commented Jul 8, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

NielsRogge commented Jul 8, 2024

ydshieh commented Jul 8, 2024

williford commented Jul 8, 2024

williford commented Jul 8, 2024 • edited Loading

ydshieh commented Jul 9, 2024

williford commented Jul 8, 2024 •

edited

Loading

williford commented Jul 8, 2024 •

edited

Loading