-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supervised Fine-tuning for HugginFace pretrained weight. #318
Conversation
clean up update update update update update update update arg fix update clean up update update update refine weight converter don't cat when dim=0 format update update update
There should be a more condition when processing new_w for padded tokens in _embedding_refactor function, due to absence of padded token. new_w = torch.zeros((per_partition_vocab_size, hf_w.shape[1]), dtype=hf_w.dtype)
new_w[:real_partition_vocab_size, :] = hf_w[start_index:end_index, :]
if self.tp_rank == self.tp_size - 1 and self.more_padded > 0:
new_w[-self.more_padded:] = hf_w[:self.token_vocab].mean(dim=0, keepdim=True) |
Thank you for pointing that out! @LLLLLgq ,Would it work for your case after this change? |
We have also plotted the loss curves for |
@inkcherry 您好,请问您有计划支持llama2么? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@inkcherry Thank you for the contribution. I have tested and LGTM. The conflict in megatron/model/gpt_model.py is due to I merged a newer PR that fixed the same issue #341. If you could help resolve it, I will then merge this PR. Thanks.
@inkcherry nevermind, it's a simple conflict and I just resolved it. Merging now. |
@conglongli Thanks for the help! |
Can you support GPTModel for llama2-7b with no pipeline parallel, for model convert with pp=1, tp=1, dp=8 hf2megads_weight_converter.py |
1 noticed many users employing Megatron for fine-tuning Huggingface's pretrained weights, aiming for improved large-scale model performance or convergence. Currently, the Megatron-LM supports weight conversion https://github.com/NVIDIA/Megatron-LM/blob/main/docs/llama2.md , but this isn't available for
non-cuda
devices. Here, we add weight conversion from HF llama to Megatron-Deepspeed.(noticed that another earlier PR was also related to weight conversion, but I failed to use it. It seems that the format is Megatron LM. Unfortunately, we were unable to contact the author #246)
2 Additionally, we add a fine-tuning script. Include the SFT process, utilizing an HF tokenizer, and a prompt dataset (refer https://github.com/tatsu-lab/stanford_alpaca) along with a dataloader.
we use repeating dataloader to address the issue of dataset inadequacy causing StopIteration problems.
Through steps 1 and 2, it's now feasible to conduct fine-tuning for the Alpaca-finetune( https://github.com/tatsu-lab/stanford_alpaca) task using Megatron-Deepspeed.
Furthermore, we've made some others changes.
1 fix an RMSnorm fallback path issue,
2 deals with invalid HF tokenizer arguments.
3 skips the 'cat' in ROPE operation when the 'cat dim' shape is 0.