-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
where can I download the 176B checkpoint in deepspeed format? #319
Comments
What are you after the bf16 weights split across TPs? or the optim states - that's 2.3TB of data! I don't know what: "had problem in resolving the layer files" means and how you were loading the model - it works with just |
I used this: TP_SIZE=1 EVAL_MICRO_BATCH_SIZE=6 MEGATRON_REQUIRED_ARGS=" CMD="./tasks/eval_harness/evaluate.py N_GPUS=1 $LAUNCHER $CMD` I try to evaluate the 176B model, but I got this error: |
I would like to evaluate bloom-176B on MMLU, Big-Bench, FewCLUE |
ok, so you do want the original weights - got it - we have a script that converts from Meg-DS to HF, but not the other way around. I will ask if we can release the Meg-DS weights on the hub. |
thanks a lot. It looks like I can only use HF checkpoint now. How to use bloom-176B to generate or evaluate on Multi-graphics? Should I change the code in generate.py,or use other code?
and it shows : load tokenizer I released same question on https://huggingface.co/bigscience/bloom/discussions/62 |
Please see: #308 |
The original Meg-DS checkpoint is here: https://huggingface.co/bigscience/bloom-optimizer-states |
@xuyifanbupt if you are trying to deploy BLOOM 176B as a server deployment, you can find here |
Hi @stas00, Is there an alternative method, or does a such a checkpoint exist publicly? thanks |
@asafkar , the ds inference script is compatible with HF checkpoints. |
@mayank31398 I was actually referring to the other way around - For that to work (if I understand correctly), I would have 2 options -
Not sure which is easier to do, and which one would actually work... |
https://huggingface.co/bigscience/bloom-optimizer-states is the full Meg-DS checkpoint. edit: hmm, I think you're correct it's incomplete. I will push the rest of the files in - will take a while. I will update here when it's done. it will appear here once uploaded: https://huggingface.co/bigscience/bloom-megatron-deepspeed edit: uploaded |
@asafkar, so it looks like I created the new repo for nothing, the https://huggingface.co/bigscience/bloom-optimizer-states was already the full checkpoint. Why did you say it only had optim state files and not the rest of the files? They should be all there. The listing is limited at the moment to just 50 entries so one can't see the remainder of the files on the hub, I made a request to ameliorate that. Please confirm that https://huggingface.co/bigscience/bloom-optimizer-states has all you need and I will remove the second repo. |
Wait @asafkar does DS-inference support pipeline parallelism? |
(plus DP in all) |
@stas00 I haven't checked the https://huggingface.co/bigscience/bloom-optimizer-states repo yet, I was merely asking whether it will support what I'm trying to do - I'm sorry if I wasn't clear enough about it. @mayank31398 regarding DS-inference, https://arxiv.org/pdf/2207.00032.pdf clearly states that they support PP, so I thought it should be covered by the actual package as well. I don't recall actually seeing an example of PP with DS-inference, so I hope I'm not mistaken. Perhaps I will try that first with a toy model to make sure it is supported... |
Hello, I used the 176B checkpoint of bloom-176B(https://huggingface.co/bigscience/bloom), but had problem in resolving the layer files. Should I download different type of checkpoint to use in this repo, or what code should I use to run the evalution based on bloom-176B checkpoint/
Thanks a lot.
The text was updated successfully, but these errors were encountered: