Fix Marian model conversion #30173

zucchini-nlp · 2024-04-11T07:19:07Z

What does this PR do?

Fixes #26216. After a bit of code exploration I found that problem was in the "tie_weights" method. The issue is that "tie_weights" does not clone, but simply change pointer output_embeddings -> input_embeddings. And since we tie weights at least twice during loading weights (once before loading state dict and the other is after), the weights were being loaded incorrectly.

Why the issue is only in MarianMT and not other models that tie weights? -> Because in other models either the weights already hold the same/tied values , or the order of loading parameters is different from MarianMT. When loading state dict into the MarianMT, "output_embeddings" are loaded last and therefore "input_embeddings" weights data is overriden to will hold same data as "output_embeddings". That way we lose access to actual "input_embeddings" weights data
This is somehow related to an old PR I found.

I converted the weights with python src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py --models fin-eng --save_dir converted and checked the correctness of results with the below script. Everything works good.

from transformers import AutoTokenizer, MarianMTModel

tokenizer = AutoTokenizer.from_pretrained("/home/raushan/converted/opus-mt-fin-eng/")
model = MarianMTModel.from_pretrained("/home/raushan/converted/opus-mt-fin-eng/")

inputs = ["Hei siellä", "Miten aurinko sanotaan suomeksi?"]
batch_tokenized = tokenizer(inputs, return_tensors="pt", padding=True)
model_output = model.generate(
    **batch_tokenized, max_new_tokens=100
)
batch_detokenized = tokenizer.batch_decode(
    model_output,
    skip_special_tokens=True,
)

print(batch_detokenized)

Marian tests including slow are all passsing on my end.

zucchini-nlp · 2024-04-11T07:19:51Z

src/transformers/modeling_utils.py

@@ -3840,7 +3840,6 @@ def _fix_key(key):
            model_buffers = {".".join([prefix, key]) for key in model_buffers}
        unexpected_keys = sorted(unexpected_keys - model_buffers)

-        model.tie_weights()


prob we can remove this, given a few lines above the weights before loading state dict are tied already

This was introduced in #25107, however, indeed there are no changeds to the model in the code above.
Can you make sure the bug fixed in the PR are still passing?

zucchini-nlp · 2024-04-11T07:20:16Z

src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py

@@ -34,7 +34,6 @@

 DEFAULT_REPO = "Tatoeba-Challenge"
 DEFAULT_MODEL_DIR = os.path.join(DEFAULT_REPO, "models")
-LANG_CODE_URL = "https://datahub.io/core/language-codes/r/language-codes-3b2.csv"


this one gives 404, so I just loaded dataset to the hub

zucchini-nlp · 2024-04-11T07:20:59Z

src/transformers/models/marian/convert_marian_to_pytorch.py

@@ -622,6 +622,12 @@ def load_marian_model(self) -> MarianMTModel:
            bias_tensor = nn.Parameter(torch.FloatTensor(self.final_bias))
            model.model.decoder.embed_tokens.weight = decoder_wemb_tensor

+        # handle tied embeddings, otherwise "from_pretrained" loads them incorrectly
+        if self.cfg["tied-embeddings"]:


and this is the actual fix, which gives equal weights for tied parameters

HuggingFaceDocBuilderDev · 2024-04-11T07:38:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Thanks! We were recently pinged about that.
Great digging!

ArthurZucker · 2024-04-17T09:25:10Z

src/transformers/modeling_utils.py

@@ -3840,7 +3840,6 @@ def _fix_key(key):
            model_buffers = {".".join([prefix, key]) for key in model_buffers}
        unexpected_keys = sorted(unexpected_keys - model_buffers)

-        model.tie_weights()


This was introduced in #25107, however, indeed there are no changeds to the model in the code above.
Can you make sure the bug fixed in the PR are still passing?

ArthurZucker · 2024-04-17T09:25:58Z

src/transformers/models/marian/convert_marian_to_pytorch.py

+            wemb_tensor = nn.Parameter(torch.FloatTensor(self.wemb))
+            bias_tensor = nn.Parameter(torch.FloatTensor(self.final_bias))


I don't see where these are used?

oh, yeah, makes sense. They are not used anymore

Weird I cannot reply to the above comment. Anyway, I tested FSDP with fsdp_cpu_ram_efficient_loading: true, and looks like tying weights do not have big effect on CPU memory. But to be clear, summoning @pacman100 to confirm that as he was the PR contributor

zucchini-nlp · 2024-04-29T08:41:50Z

@ArthurZucker i reverted the change for tying weights, to be consistent with 'main'. It was not the actual solution for Marian models, so I think it does not hurt whatever was the reason for it to be added. Requesting re-review :)

ArthurZucker

Super glad with these changes ! Thanks for fixing!

* fix marian model coversion * uncomment that line * remove unnecessary code * revert tie_weights, doesn't hurt

zucchini-nlp added 2 commits April 10, 2024 20:16

fix marian model coversion

8d951e3

uncomment that line

8108eea

zucchini-nlp requested a review from ArthurZucker April 11, 2024 07:19

zucchini-nlp commented Apr 11, 2024

View reviewed changes

ArthurZucker reviewed Apr 17, 2024

View reviewed changes

zucchini-nlp added 2 commits April 17, 2024 15:41

remove unnecessary code

4158798

revert tie_weights, doesn't hurt

6535883

zucchini-nlp requested a review from ArthurZucker April 29, 2024 08:41

ArthurZucker approved these changes Apr 30, 2024

View reviewed changes

zucchini-nlp merged commit 4bc9cb3 into huggingface:main May 1, 2024
19 checks passed

itazap pushed a commit that referenced this pull request May 14, 2024

Fix Marian model conversion (#30173)

137e5c6

* fix marian model coversion * uncomment that line * remove unnecessary code * revert tie_weights, doesn't hurt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Marian model conversion #30173

Fix Marian model conversion #30173

zucchini-nlp commented Apr 11, 2024 •

edited

Loading

zucchini-nlp Apr 11, 2024

ArthurZucker Apr 17, 2024

zucchini-nlp Apr 11, 2024

zucchini-nlp Apr 11, 2024

HuggingFaceDocBuilderDev commented Apr 11, 2024

ArthurZucker left a comment

ArthurZucker Apr 17, 2024

ArthurZucker Apr 17, 2024

zucchini-nlp Apr 17, 2024

zucchini-nlp Apr 17, 2024

zucchini-nlp commented Apr 29, 2024

ArthurZucker left a comment

		wemb_tensor = nn.Parameter(torch.FloatTensor(self.wemb))
		bias_tensor = nn.Parameter(torch.FloatTensor(self.final_bias))

Fix Marian model conversion #30173

Fix Marian model conversion #30173

Conversation

zucchini-nlp commented Apr 11, 2024 • edited Loading

What does this PR do?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 11, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zucchini-nlp commented Apr 29, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

zucchini-nlp commented Apr 11, 2024 •

edited

Loading