-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with Llama conversion for new release #24
Comments
The MLP layer twice as large is actually equivalent to having two individual layers of half the size. If you use the MegatronGPTModel as we have, we expect it to work as a llama module, which is why we haven't updated that llama module. |
I understand that. My question is about how to fine-tune models converted using convert_hf_checkpoint_to_nemo_llama.py? The problem is, that script is translating
Given these are merged in MegatronGPTModel, I think they should also be merged in I just did the following changes in
And then |
Yes, you're right can you update the PR with that and we can take a look |
Just created a PR #26 |
Tracked by PR #26 |
I noticed that in the latest release, llama_module.py was replaced with falcon_module.py. And then, in test_llama.sh, you rely on megatron_gpt_pretraining.py (which relies on MegatronGPTModel instead of llama_module.py).
The problem is, MegatronGPTModel at some point relies on transformer.py (instead of llama_module.py) and there, for Swiglu, you've replaced the two separate MLP layers (
dense_h_to_4h
anddense_h_to_4h_2
) with a single one, twice as large:While in
llama_module.py
you had:But then you would have to change the checkpoint conversion script for llama as well, it's currently:
This is currently causing a crash when I try to load a checkpoint converted from HF Llama since it expects
dense_h_to_4h
to be a concatenation ofgate_proj
andup_proj
(from the HF checkpoint):The text was updated successfully, but these errors were encountered: