Pre-trained weights incompatible with backbone #65

LucFrachon · 2024-01-31T15:05:38Z

The weights for ConvNeXt-V2-Base, pretrained on INet1k, provided here, have several incompatibilities with the encoder architecture:

The MinkowskiLayerNorm layers have aself.ln element that means that the checkpoint keys should look like downsample_layers.1.0.ln.bias, whereas the weights provided have downsample_layers.1.0.bias.
Similarly, all the layers named pwconv<i> have self.linear, which the checkpoints doesn't have.
The MinkowskiConvolution layers have a parameter kernel (e.g., downsample_layers.1.1.kernel), whereas the provided checkpoint has keys "downsample_layers.1.1.weight.
The MinkowskiConvolution biases have shape (1, c) (e.g., (1, 256)) in the model but shape (c,) in the checkpoint, and weights have shapes (p**2, in, out) in the model but (out, in, p, p) in the checkpoint
Same problem with the DW convolution weights and biases in stages: biases need to be (1, c) but are (c,), and weights need to be (p ** 2, c) but are (c, -1, p, p)

These can all be fixed with some code, but it would make life easier for everyone if you could upload the correct weights.
Thanks a lot!

The text was updated successfully, but these errors were encountered:

blackpearl1022 · 2024-07-13T05:10:45Z

@LucFrachon Same idea, I have.
Do you have any updates on your side ?
Thanks !

LucFrachon · 2024-07-13T10:34:29Z

I wrote code to update the state dict and it worked.
I haven't checked if the provided checkpoints have been updated. I've moved on to other things now...

MarkoHaralovic · 2024-09-20T07:31:28Z

This issue is heavily connected to three other issues: #26, #33, and #47.

You correctly observed the mismatch in the layer naming between model weights and checkpoint weights, as well as the shape mismatch. Certainly, I tried reshaping and renaming the weights, and it was successful, but I would argue that the model won't reconstruct the masked images as intended, if used for that purpose.

If you observe the weights more closely, you could see another issue: there are some weights present in the model but not found in the checkpoint state dictionary. This is due to the fact that the pretrained weights of all the models available on the main page of this repository only have the encoder structure present in the .pt files, whereas the decoder was cut off.

For that reason, this isn't a proper solution if you are using the model for any reconstruction task, which I'd argue would be the main use case. The fine-tuning of those weights is still successful due to the fact that fine-tuning for any downstream task (classification, object detection, segmentation, etc.) will construct a decoder, and for that reason, these pretrained weights are still useful.

The absence of the decoder head in the .pt files is the main cause of these visualization issues people often come across when trying to reconstruct the masked image, i.e., when using the pretrained weights in a reconstruction task. These are the visualization issues I found raised, which I'd say are covered by this comment: #48, #42.

The visualization itself is probably not wrong in each of those cases. I myself tried and succeeded in visualizing the reconstructions using code from MAE (https://github.com/facebookresearch/mae/blob/main/demo/mae_visualize.ipynb), and all the images contained nothing but white noise in the masked areas (reconstruction was mere noise, per se). I verified that the pretrained weights output the same noise as the randomly initialized and not pretrained model. That makes sense knowing the decoder weights are not present in the .pt files, meaning that the reconstruction part was practically a reconstruction using randomly initialized weights. Now, that means reconstruction won't be successful per se, even if the weights are renamed and reshaped. The solution is to train a decoder head.

I indeed pretrained a ConvNeXt-V2 Atto model myself and saved all the weights, both of the encoder and decoder. Then I went back to visualization and saw that the reconstruction code worked, and these weight mismatches weren't a problem anymore, as all the keys matched successfully. The visualization code, among other things, can be found on my GitHub profile in the forked and modified version of this repository (https://github.com/MarkoHaralovic/ConvNeXt-V2).

watertianyi · 2024-10-12T09:25:58Z

@MarkoHaralovic huggingface/pytorch-image-models#1922 (comment)

LucFrachon changed the title ~~Pre-trained weight incompatible with backbone~~ Pre-trained weights incompatible with backbone Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-trained weights incompatible with backbone #65

Pre-trained weights incompatible with backbone #65

LucFrachon commented Jan 31, 2024 •

edited

Loading

blackpearl1022 commented Jul 13, 2024

LucFrachon commented Jul 13, 2024

MarkoHaralovic commented Sep 20, 2024

watertianyi commented Oct 12, 2024

Pre-trained weights incompatible with backbone #65

Pre-trained weights incompatible with backbone #65

Comments

LucFrachon commented Jan 31, 2024 • edited Loading

blackpearl1022 commented Jul 13, 2024

LucFrachon commented Jul 13, 2024

MarkoHaralovic commented Sep 20, 2024

watertianyi commented Oct 12, 2024

LucFrachon commented Jan 31, 2024 •

edited

Loading