Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-trained weights incompatible with backbone #65

Open
LucFrachon opened this issue Jan 31, 2024 · 4 comments
Open

Pre-trained weights incompatible with backbone #65

LucFrachon opened this issue Jan 31, 2024 · 4 comments

Comments

@LucFrachon
Copy link

LucFrachon commented Jan 31, 2024

The weights for ConvNeXt-V2-Base, pretrained on INet1k, provided here, have several incompatibilities with the encoder architecture:

  • The MinkowskiLayerNorm layers have aself.ln element that means that the checkpoint keys should look like downsample_layers.1.0.ln.bias, whereas the weights provided have downsample_layers.1.0.bias.
  • Similarly, all the layers named pwconv<i> have self.linear, which the checkpoints doesn't have.
  • The MinkowskiConvolution layers have a parameter kernel (e.g., downsample_layers.1.1.kernel), whereas the provided checkpoint has keys "downsample_layers.1.1.weight.
  • The MinkowskiConvolution biases have shape (1, c) (e.g., (1, 256)) in the model but shape (c,) in the checkpoint, and weights have shapes (p**2, in, out) in the model but (out, in, p, p) in the checkpoint
  • Same problem with the DW convolution weights and biases in stages: biases need to be (1, c) but are (c,), and weights need to be (p ** 2, c) but are (c, -1, p, p)

These can all be fixed with some code, but it would make life easier for everyone if you could upload the correct weights.
Thanks a lot!

@LucFrachon LucFrachon changed the title Pre-trained weight incompatible with backbone Pre-trained weights incompatible with backbone Jan 31, 2024
@blackpearl1022
Copy link

@LucFrachon Same idea, I have.
Do you have any updates on your side ?
Thanks !

@LucFrachon
Copy link
Author

I wrote code to update the state dict and it worked.
I haven't checked if the provided checkpoints have been updated. I've moved on to other things now...

@MarkoHaralovic
Copy link

This issue is heavily connected to three other issues: #26, #33, and #47.

You correctly observed the mismatch in the layer naming between model weights and checkpoint weights, as well as the shape mismatch. Certainly, I tried reshaping and renaming the weights, and it was successful, but I would argue that the model won't reconstruct the masked images as intended, if used for that purpose.

If you observe the weights more closely, you could see another issue: there are some weights present in the model but not found in the checkpoint state dictionary. This is due to the fact that the pretrained weights of all the models available on the main page of this repository only have the encoder structure present in the .pt files, whereas the decoder was cut off.

For that reason, this isn't a proper solution if you are using the model for any reconstruction task, which I'd argue would be the main use case. The fine-tuning of those weights is still successful due to the fact that fine-tuning for any downstream task (classification, object detection, segmentation, etc.) will construct a decoder, and for that reason, these pretrained weights are still useful.

The absence of the decoder head in the .pt files is the main cause of these visualization issues people often come across when trying to reconstruct the masked image, i.e., when using the pretrained weights in a reconstruction task. These are the visualization issues I found raised, which I'd say are covered by this comment: #48, #42.

The visualization itself is probably not wrong in each of those cases. I myself tried and succeeded in visualizing the reconstructions using code from MAE (https://github.com/facebookresearch/mae/blob/main/demo/mae_visualize.ipynb), and all the images contained nothing but white noise in the masked areas (reconstruction was mere noise, per se). I verified that the pretrained weights output the same noise as the randomly initialized and not pretrained model. That makes sense knowing the decoder weights are not present in the .pt files, meaning that the reconstruction part was practically a reconstruction using randomly initialized weights. Now, that means reconstruction won't be successful per se, even if the weights are renamed and reshaped. The solution is to train a decoder head.

I indeed pretrained a ConvNeXt-V2 Atto model myself and saved all the weights, both of the encoder and decoder. Then I went back to visualization and saw that the reconstruction code worked, and these weight mismatches weren't a problem anymore, as all the keys matched successfully. The visualization code, among other things, can be found on my GitHub profile in the forked and modified version of this repository (https://github.com/MarkoHaralovic/ConvNeXt-V2).

@watertianyi
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants