-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Distributed package doesn't have NCCL built in #112
Comments
You will have to manually add nccl. Make sure you have full privileges before choosing your install from nvidia. HPC-SDK is easiest, but downloading the tar and extracting to usr\local works the same. |
I am on a mac pro with M1 max using 20 gpus - any idea how to resolve the nccl issue given no nVidia cards are installed? |
I run |
You can't resolve nccl issues without nvidia. There are other process's that could be used as opposed to nccl, there are also other libraries which allow parallel work around but I haven't bothered with them yet. |
Hello guys, |
same issue |
me too! |
I'm on a Macbook Pro M1 2022 and have the same problem. |
Did anyone find out how to solve this error? |
For MacOS, we have to use the C++ implementation : https://github.com/ggerganov/llama.cpp Works like a charm on my side, with the 3 models that fit in my RAM ✌️ |
is it utilizing MPS acceleration from the M1 / M2 chip? |
I also have the NCCL error: |
I have same problem ... I use M1 pro |
same issue I have on MacBook Pro m1 16gb
|
It utilizes my iGPU to it's fullest, and not much CPU, if this is your question. |
There is a bit of customisation required to the newer You need to register the mps device
There are also a number of other cuda references in torch that have to change, including tensors. |
I have the same error when running |
I was able to run Llama 2 7B on Mac M2 with (https://github.com/aggiee/llama-v2-mps) |
your code returns an error message indicating that the function torch.polar() is not implemented for the Metal Performance Shaders (MPS) ?? I'm also running on an M2 Mac. |
I have same problem ... I use M1 pro |
If you are referring to the following message, it is expected. It's due to M1/M2/MPS not supporting polar.out operator. It falls back to CPU for that specific operation and the warning is to inform the user about it: The solution is to set PYTORCH_ENABLE_MPS_FALLBACK=1 env variable to run this code. That should make it work (you will still see the user warning about polar.out, but the code should run past that) |
this work but have to much slow in performance |
In case you run Windows 10 as me, I had the same git diff:
After that change I was able to run (base) H:\github\facebook\llama>torchrun --standalone --nnodes=1 example_text_completion.py --ckpt_dir llama-2-7b/ --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4
NOTE: Redirects are currently not supported in Windows or MacOs.
master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
Cuda support: True : 1 devices
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 13.40 seconds
I believe the meaning of life is
> to be happy. I believe we are all born with the potential to be happy. The meaning of life is to be happy, but the way to get there is not always easy.
The meaning of life is to be happy. It is not always easy to be happy, but it is possible. I believe that
==================================
Simply put, the theory of relativity states that
> 1) time, space, and mass are relative, and 2) the speed of light is constant, regardless of the relative motion of the observer.
Let’s look at the first point first.
Relative Time and Space
The theory of relativity is built on the idea that time and space are relative
==================================
A brief message congratulating the team on the launch:
Hi everyone,
I just
> wanted to say a big congratulations to the team on the launch of the new website.
I think it looks fantastic and I'm sure it will be a huge success.
I look forward to working with you all on the next project.
Best wishes
==================================
Translate English to French:
sea otter => loutre de mer
peppermint => menthe poivrée
plush girafe => girafe peluche
cheese =>
> fromage
fish => poisson
giraffe => girafe
elephant => éléphant
cat => chat
giraffe => girafe
elephant => éléphant
cat => chat
giraffe => gira
================================== Make sure you have enough RAM and GPU RAM, my RAM consumption when the model is loaded GPU ram:
If you get OOM error like below that but you have enough GPU RAM:
make sure that you actually have enough RAM. You can modify PageFile to use disk as memory, see https://gist.github.com/REASY/567c48e021288df505140cad7e4562ab?permalink_comment_id=4650490#gistcomment-4650490 Note: I fixed
My env gathered via
|
Just initialized with
change it to
|
Seems like the issue was resolved with suggestions above. Feel free to re-open as needed. Closing |
Why does we still don't have a solution to this error? |
I've been able to start execution after applying changes similar to https://github.com/facebookresearch/codellama/pull/18/files |
https://github.com/pianistprogrammer/llama3/tree/main, get this one, clone the repo, i have made changes to some files to make it work. You can find it in the commit tree |
Hey @pianistprogrammer 👋🏻 I tried your fork but got an error:
It's M1 Pro. Any clue what is the issue? Full logs:
|
I'm sorry about that, i have made a blog post on how to get it locally, https://questionbump.com/question/how-can-i-run-chatgpt-using-llms-locally/ |
I was able to download the 7B weights on Mac OS Monterey. I get the following errors when I try to call the example from the README in my Terminal:
torchrun --nproc_per_node 1 example.py --ckpt_dir download/model_size --tokenizer_path download/tokenizer.model
The text was updated successfully, but these errors were encountered: