-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run code llama on mac? #11
Comments
+1 just went down this rabbit hole for a bit -- closest thing I found to helping here: meta-llama/llama@9a5670b |
I've sent a PR for running CodeLlama on mac: #18 |
David, does this work on M2 macbooks ? If so, I'll patch it. EDIT: Finally made it work with Code Llama 34B model !!!! lol |
yep, this brought a step further: ther is still another problem: ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 55764) of binary: /opt/dev/miniconda3/envs/llama/bin/python3.10 |
I have no M2 on my hand. I tested it on my Mac M1 Ultra, and it works. Not sure if it works on m2. So far as I know it should be compatible. |
the PR does work on M2, at least the 7b model. I was having trouble w/ the 13b and 34b with the mp count and world_size setting, not sure what I was doing wrong |
Can confirm the fix from @davideuler works on my M2 Macbook Air, running the 7b-Instruct model. |
Verified that the solution provided by @davideuler is effective on my M1 MacBook Pro using the 7b model. However, the performance is notably sluggish. Is it possible to run it using GPU acceleration? It runs so fast with GPU acceleration by llama.cpp |
I had the same issue. can anybody provide any help? |
did you try the PR at #18? it should work for 7b at least |
while 34b is useless with reasoning, 7b generates almost relevant code. I could probably write a 10 liner py script that generates snippets with almost same success. Would have been cool to get 34b running though. 7b is extremely useless, why wont 34b run on mac |
34b freezes on my m1 mac |
Can you please guide me how to run 13B and 34B model on Windows? I have single GPU and hence able to run 7B model whose Model parallel value=1. 13B model requires MP value=2 but I have only 1 GPU on which I want to to inference, what changes should I make in code and in which file so that I can run 13B model? |
@liqiang28 7b should work with that PR, I haven't been able to get any larger models to work |
@foolyoghurt Out of curiosity, what's your token per second ? I'm experiencing the sluggish performance as well. |
Yes, it can work after I changed the model to 7B, thanks a lot |
I have similar issue above, any fix? |
Hi,
on mac I got the following error:
RuntimeError: Distributed package doesn't have NCCL built in
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 80731) of binary: /opt/dev/miniconda3/envs/llama/bin/python3.10
Guess this is because of the missing CUDA. Is there an option to run it with CPU?
The text was updated successfully, but these errors were encountered: