Run code llama on mac? #11

mauermbq · 2023-08-24T21:47:30Z

Hi,

on mac I got the following error:
RuntimeError: Distributed package doesn't have NCCL built in
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 80731) of binary: /opt/dev/miniconda3/envs/llama/bin/python3.10

Guess this is because of the missing CUDA. Is there an option to run it with CPU?

lostmygithubaccount · 2023-08-24T23:22:12Z

+1 just went down this rabbit hole for a bit -- closest thing I found to helping here: meta-llama/llama@9a5670b

davideuler · 2023-08-25T04:52:03Z

I've sent a PR for running CodeLlama on mac: #18

sdfgsdfgd · 2023-08-25T06:04:16Z

David, does this work on M2 macbooks ? If so, I'll patch it.

EDIT:
I just applied that PR patch, since mine is M2 - I went with lostmygithubaccount's reference.
Also patched it so the WORLD_SIZE count matched the mp count.

Finally made it work with Code Llama 34B model !!!!
As soon as it began running, everything froze and my laptop crashed. I heard some weird noises from my dear computer.
I'm not coming back here again, GPT4 is good for everything

lol

mauermbq · 2023-08-25T08:15:21Z

+1 just went down this rabbit hole for a bit -- closest thing I found to helping here: facebookresearch/llama@9a5670b

yep, this brought a step further: ther is still another problem:
RuntimeErrorRuntimeError: : ProcessGroupGloo::allgather: invalid tensor type at index 0 (expected TensorOptions(dtype=c10::Half, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)), got TensorOptions(dtype=c10::Half, device=mps:0, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)))ProcessGroupGloo::allgather: invalid tensor type at index 0 (expected TensorOptions(dtype=c10::Half, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)), got TensorOptions(dtype=c10::Half, device=mps:0, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)))

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 55764) of binary: /opt/dev/miniconda3/envs/llama/bin/python3.10

davideuler · 2023-08-25T16:07:17Z

David, does this work on M2 macbooks ? If so, I'll patch it.

EDIT: I just applied that PR patch, since mine is M2 - I went with lostmygithubaccount's reference. Also patched it so the WORLD_SIZE count matched the mp count.

Finally made it work with Code Llama 34B model !!!! As soon as it began running, everything froze and my laptop crashed. I heard some weird noises from my dear computer. I'm not coming back here again, GPT4 is good for everything

lol

I have no M2 on my hand. I tested it on my Mac M1 Ultra, and it works. Not sure if it works on m2. So far as I know it should be compatible.
And I haven't test the PR on cuda, it will be a great job if anyone could help to test the PR on cuda.

lostmygithubaccount · 2023-08-28T16:47:35Z

the PR does work on M2, at least the 7b model. I was having trouble w/ the 13b and 34b with the mp count and world_size setting, not sure what I was doing wrong

brianirish · 2023-08-28T18:04:30Z

Can confirm the fix from @davideuler works on my M2 Macbook Air, running the 7b-Instruct model.

foolyoghurt · 2023-08-29T16:45:53Z

Verified that the solution provided by @davideuler is effective on my M1 MacBook Pro using the 7b model. However, the performance is notably sluggish. Is it possible to run it using GPU acceleration? It runs so fast with GPU acceleration by llama.cpp

liqiang28 · 2023-09-01T03:19:50Z

+1 just went down this rabbit hole for a bit -- closest thing I found to helping here: facebookresearch/llama@9a5670b

yep, this brought a step further: ther is still another problem: RuntimeErrorRuntimeError: : ProcessGroupGloo::allgather: invalid tensor type at index 0 (expected TensorOptions(dtype=c10::Half, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)), got TensorOptions(dtype=c10::Half, device=mps:0, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)))ProcessGroupGloo::allgather: invalid tensor type at index 0 (expected TensorOptions(dtype=c10::Half, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)), got TensorOptions(dtype=c10::Half, device=mps:0, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)))

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 55764) of binary: /opt/dev/miniconda3/envs/llama/bin/python3.10

I had the same issue. can anybody provide any help?

lostmygithubaccount · 2023-09-01T04:36:02Z

I had the same issue. can anybody provide any help?

did you try the PR at #18? it should work for 7b at least

sdfgsdfgd · 2023-09-01T04:42:44Z

while 34b is useless with reasoning, 7b generates almost relevant code. I could probably write a 10 liner py script that generates snippets with almost same success. Would have been cool to get 34b running though. 7b is extremely useless, why wont 34b run on mac

binoculars · 2023-09-02T21:58:15Z

34b freezes on my m1 mac

manoj21192 · 2023-09-05T05:48:41Z

34b freezes on my m1 mac

Can you please guide me how to run 13B and 34B model on Windows? I have single GPU and hence able to run 7B model whose Model parallel value=1. 13B model requires MP value=2 but I have only 1 GPU on which I want to to inference, what changes should I make in code and in which file so that I can run 13B model?

liqiang28 · 2023-09-06T02:57:51Z

I had the same issue. can anybody provide any help?

did you try the PR at #18? it should work for 7b at least

I tried the PR at #18 but I used 13b-instruct, should I change the model to 7b ?

lostmygithubaccount · 2023-09-06T18:49:13Z

@liqiang28 7b should work with that PR, I haven't been able to get any larger models to work

DavidLuong98 · 2023-09-07T01:18:00Z

Verified that the solution provided by @davideuler is effective on my M1 MacBook Pro using the 7b model. However, the performance is notably sluggish. Is it possible to run it using GPU acceleration? It runs so fast with GPU acceleration by llama.cpp

@foolyoghurt Out of curiosity, what's your token per second ? I'm experiencing the sluggish performance as well.

liqiang28 · 2023-09-07T02:48:15Z

@liqiang28 7b should work with that PR, I haven't been able to get any larger models to work

Yes, it can work after I changed the model to 7B, thanks a lot

robinsonmhj · 2023-11-29T01:59:24Z

+1 just went down this rabbit hole for a bit -- closest thing I found to helping here: facebookresearch/llama@9a5670b

yep, this brought a step further: ther is still another problem: RuntimeErrorRuntimeError: : ProcessGroupGloo::allgather: invalid tensor type at index 0 (expected TensorOptions(dtype=c10::Half, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)), got TensorOptions(dtype=c10::Half, device=mps:0, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)))ProcessGroupGloo::allgather: invalid tensor type at index 0 (expected TensorOptions(dtype=c10::Half, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)), got TensorOptions(dtype=c10::Half, device=mps:0, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)))

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 55764) of binary: /opt/dev/miniconda3/envs/llama/bin/python3.10

I have similar issue above, any fix?

hijkw added model-usage issues related to how models are used/loaded compability issues arising from specific hardware or system configs labels Sep 6, 2023

liqiang28 mentioned this issue Sep 7, 2023

Get unreadable result after running example #99

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run code llama on mac? #11

Run code llama on mac? #11

mauermbq commented Aug 24, 2023

lostmygithubaccount commented Aug 24, 2023

davideuler commented Aug 25, 2023

sdfgsdfgd commented Aug 25, 2023 •

edited

Loading

mauermbq commented Aug 25, 2023

davideuler commented Aug 25, 2023

lostmygithubaccount commented Aug 28, 2023

brianirish commented Aug 28, 2023

foolyoghurt commented Aug 29, 2023 •

edited

Loading

liqiang28 commented Sep 1, 2023

lostmygithubaccount commented Sep 1, 2023

sdfgsdfgd commented Sep 1, 2023 •

edited

Loading

binoculars commented Sep 2, 2023

manoj21192 commented Sep 5, 2023 •

edited

Loading

liqiang28 commented Sep 6, 2023 •

edited

Loading

lostmygithubaccount commented Sep 6, 2023

DavidLuong98 commented Sep 7, 2023

liqiang28 commented Sep 7, 2023

robinsonmhj commented Nov 29, 2023

Run code llama on mac? #11

Run code llama on mac? #11

Comments

mauermbq commented Aug 24, 2023

lostmygithubaccount commented Aug 24, 2023

davideuler commented Aug 25, 2023

sdfgsdfgd commented Aug 25, 2023 • edited Loading

mauermbq commented Aug 25, 2023

davideuler commented Aug 25, 2023

lostmygithubaccount commented Aug 28, 2023

brianirish commented Aug 28, 2023

foolyoghurt commented Aug 29, 2023 • edited Loading

liqiang28 commented Sep 1, 2023

lostmygithubaccount commented Sep 1, 2023

sdfgsdfgd commented Sep 1, 2023 • edited Loading

binoculars commented Sep 2, 2023

manoj21192 commented Sep 5, 2023 • edited Loading

liqiang28 commented Sep 6, 2023 • edited Loading

lostmygithubaccount commented Sep 6, 2023

DavidLuong98 commented Sep 7, 2023

liqiang28 commented Sep 7, 2023

robinsonmhj commented Nov 29, 2023

sdfgsdfgd commented Aug 25, 2023 •

edited

Loading

foolyoghurt commented Aug 29, 2023 •

edited

Loading

sdfgsdfgd commented Sep 1, 2023 •

edited

Loading

manoj21192 commented Sep 5, 2023 •

edited

Loading

liqiang28 commented Sep 6, 2023 •

edited

Loading