-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed package doesn't have NCCL built in #50
Comments
Have the same issue, running on a MacBook Pro. |
Same, running on MacBook Pro M1. |
Try change
to
|
Running on a MacBook pro m1, after changing to this got a new error: |
I'm getting the same error have you been able to resolve it? |
To run on m1, You have to go through the repository and modify every line that references CUDA to use the CPU instead. Even then, the sample prompt took over an hour to run for me on the smallest llama model. |
I am trying to do the same. Can you tell exactly how did you achieve the same? Also, if you can share your example.py, it would be great. |
Make the following changes if you want to use CPU
# torch.distributed.init_process_group("nccl") you don't have/didn't properly setup gpus
torch.distributed.init_process_group("gloo") # uses CPU
# torch.cuda.set_device(local_rank) remove for the same reasons
# torch.set_default_tensor_type(torch.cuda.HalfTensor)
torch.set_default_tensor_type(torch.FloatTensor)
# tokens = torch.full((bsz, total_len), self.tokenizer.pad_id).cuda().long()
tokens = torch.full((bsz, total_len), self.tokenizer.pad_id).long()
self.cache_k = torch.zeros(
(args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
)#.cuda()
self.cache_v = torch.zeros(
(args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
)#.cuda() |
Thank you! @b0kch01 |
I am now stuck with following issue after following instructions provided by @b0kch01: Any guidance? |
After following @b0kch01's advices, I'm stuck here. Can anyone help out? |
When the log says |
Those errors both typically indicate you havent actually pointed at either the model or the tokenizer. I noticed the tokenizer doesnt seem to get downloaded with the 7B I'll upload it here to save you the trouble, this should be extracted to whatever directory you set your download to (1 above your 7B model directory itself) or just manually point to it with that command line |
Thanks! Apparently, the download.sh needs to be edited to function on macOS. Luckily, this guy made it work: https://github.com/facebookresearch/llama/pull/39/files I implemented the changes and am now downloading the weights. |
I'm getting stuck in a different way after following @b0kch01's changes. I'm not running into the MP=0 issue, but I get some warnings about being unable to initialize the client and server sockets. Otherwise it looks similar to @jaygdesai's issue but with exitcode 7 instead of 9. |
If you'd like, you can try my llama cpu fork. Tested to work on my Macbook Pro M1 Max |
Just in case someone's going to ask for MPS (M1/2 GPU support): the code uses |
Unfortunately, I'm still unable to run anything using @b0kch01's
Here's the version on the cluster:
Edit: Nevermind, it was an issue with the memory available on the interactive node. Got it working by adjusting the batch size and seq len and submitting it as a job. Thank you, @b0kch01! |
Hi, I'm stuck here, following @b0kch01's llama-cpu repo on my mac. Any suggestion? Thanks. |
I‘m facing the same issue, but only with the 7B model. It seems like the consolidated.01.pth file doesn’t get downloaded when running the download.sh file: the consolidated.01.pth file is the only one I get a 403 status instead of 200. I opened a new issue here. |
Thank you @Urammar |
Doesn't pytorch support the apple silicon GPU, so it wouldn't be slow as a snail |
where ? |
Closing as author is inactive. If anyone has further questions, feel free to open a new issue. For future reference, check both llama and llama-recipes repos for getting started guides. |
This solution works for me: #947 |
Got the following error when executing:
torchrun --nproc_per_node 1 example.py --ckpt_dir models/7B --tokenizer_path models/tokenizer.model
additional info:
cuda: 11.4
GPU: NVIDIA GeForce 3090
torch 1.12.1
Ubuntu 20.04.2 LTS
Anyone knows how to solve it?
Thanks in advance!
The text was updated successfully, but these errors were encountered: