Distributed package doesn't have NCCL built in #50

alescire94 · 2023-03-02T10:24:26Z

Got the following error when executing:
torchrun --nproc_per_node 1 example.py --ckpt_dir models/7B --tokenizer_path models/tokenizer.model

additional info:
cuda: 11.4
GPU: NVIDIA GeForce 3090
torch 1.12.1
Ubuntu 20.04.2 LTS

Anyone knows how to solve it?
Thanks in advance!

The text was updated successfully, but these errors were encountered:

sz85512678 · 2023-03-02T10:31:40Z

Have the same issue, running on a MacBook Pro.

christophelebrun · 2023-03-02T11:34:28Z

Same, running on MacBook Pro M1.

Amadeus-AI · 2023-03-02T12:27:39Z

Try change

torch.distributed.init_process_group("nccl")

to

torch.distributed.init_process_group("gloo")

Elvincth · 2023-03-02T15:33:56Z

Try change

torch.distributed.init_process_group("nccl")

to

torch.distributed.init_process_group("gloo")

Running on a MacBook pro m1, after changing to this got a new error: AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'

AmirFone · 2023-03-02T18:08:15Z

Running on a MacBook pro m1, after changing to this got a new error: AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'

I'm getting the same error have you been able to resolve it?

b0kch01 · 2023-03-02T19:00:57Z

Running on a MacBook pro m1, after changing to this got a new error: AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'

I'm getting the same error have you been able to resolve it?

To run on m1, You have to go through the repository and modify every line that references CUDA to use the CPU instead. Even then, the sample prompt took over an hour to run for me on the smallest llama model.

jaygdesai · 2023-03-02T20:31:22Z

Running on a MacBook pro m1, after changing to this got a new error: AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'

I'm getting the same error have you been able to resolve it?

To run on m1, You have to go through the repository and modify every line that references CUDA to use the CPU instead. Even then, the sample prompt took over an hour to run for me on the smallest llama model.

I am trying to do the same. Can you tell exactly how did you achieve the same? Also, if you can share your example.py, it would be great.

b0kch01 · 2023-03-02T20:52:35Z

Make the following changes if you want to use CPU

example.py

# torch.distributed.init_process_group("nccl")  you don't have/didn't properly setup gpus 
torch.distributed.init_process_group("gloo") # uses CPU 
# torch.cuda.set_device(local_rank) remove for the same reasons


# torch.set_default_tensor_type(torch.cuda.HalfTensor)
torch.set_default_tensor_type(torch.FloatTensor)

generation.py

# tokens = torch.full((bsz, total_len), self.tokenizer.pad_id).cuda().long()
tokens = torch.full((bsz, total_len), self.tokenizer.pad_id).long()

model.py

self.cache_k = torch.zeros(
    (args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
)#.cuda()
self.cache_v = torch.zeros(
    (args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
)#.cuda()

AmirFone · 2023-03-02T20:57:11Z

Thank you! @b0kch01

jaygdesai · 2023-03-02T21:41:26Z

I am now stuck with following issue after following instructions provided by @b0kch01:

Any guidance?

paulutsch · 2023-03-02T22:27:33Z

After following @b0kch01's advices, I'm stuck here. Can anyone help out?

b0kch01 · 2023-03-03T00:29:08Z

After following @b0kch01's advices, I'm stuck here. Can anyone help out?

When the log says MP=0, it means it cannot find any of the weights in the path you provided

Urammar · 2023-03-03T07:27:40Z

Those errors both typically indicate you havent actually pointed at either the model or the tokenizer. I noticed the tokenizer doesnt seem to get downloaded with the 7B

I'll upload it here to save you the trouble, this should be extracted to whatever directory you set your download to (1 above your 7B model directory itself) or just manually point to it with that command line

tokenizer.zip

paulutsch · 2023-03-03T14:48:44Z

Those errors both typically indicate you havent actually pointed at either the model or the tokenizer. I noticed the tokenizer doesnt seem to get downloaded with the 7B

I'll upload it here to save you the trouble, this should be extracted to whatever directory you set your download to (1 above your 7B model directory itself) or just manually point to it with that command line

tokenizer.zip

Thanks! Apparently, the download.sh needs to be edited to function on macOS. Luckily, this guy made it work: https://github.com/facebookresearch/llama/pull/39/files

I implemented the changes and am now downloading the weights.

mawilson1234 · 2023-03-03T17:40:12Z

I'm getting stuck in a different way after following @b0kch01's changes. I'm not running into the MP=0 issue, but I get some warnings about being unable to initialize the client and server sockets. Otherwise it looks similar to @jaygdesai's issue but with exitcode 7 instead of 9.

b0kch01 · 2023-03-03T17:49:36Z

If you'd like, you can try my llama cpu fork. Tested to work on my Macbook Pro M1 Max

turbo · 2023-03-03T21:27:46Z

Just in case someone's going to ask for MPS (M1/2 GPU support): the code uses view_as_complex, which is neither supported, nor does it have any PYTORCH_ENABLE_MPS_FALLBACK due to memory sharing issues. Even modifying the code to use MPS does not enable GPU support on Apple Silicon until pytorch/pytorch#77764 is fixed. So only CPU for now, but @b0kch01's version works nicely 🙂

mawilson1234 · 2023-03-03T22:37:34Z

Unfortunately, I'm still unable to run anything using @b0kch01's llama-cpu repo on Linux.

[W socket.cpp:426] [c10d] The server socket cannot be initialized on [::]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:601] [c10d] The client socket cannot be initialized to connect to [localhost]:29500 (errno: 97 - Address family not supported by protocol).
Locating checkpoints
Found MP=1 checkpoints
Creating checkpoint instance...
Grabbing params...
Loading model arguments...
Creating tokenizer...
Creating transformer...
-- Creating embedding
-- Creating transformer blocks (32)

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 13822) of binary: /gpfs/gibbs/project/frank/maw244/conda_envs/llama/bin/python3.10
Traceback (most recent call last):
	File "/gpfs/gibbs/project/frank/maw244/conda_envs/llama/bin/torchrun", line 8, in <module>
		sys.exit(main())
	File "/gpfs/gibbs/project/frank/maw244/conda_envs/llama/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
		return f(*args, **kwargs)
	File "/gpfs/gibbs/project/frank/maw244/conda_envs/llama/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main
		run(args)
	File "/gpfs/gibbs/project/frank/maw244/conda_envs/llama/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run
		elastic_launch(
	File "/gpfs/gibbs/project/frank/maw244/conda_envs/llama/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
		return launch_agent(self._config, self._entrypoint, list(args))
	File "/gpfs/gibbs/project/frank/maw244/conda_envs/llama/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
		raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
======================================================
example.py FAILED
------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------
Root Cause (first observed failure):
[0]:
time      : 2023-03-03_17:31:31
host      : c14n02.grace.hpc.yale.internal
rank      : 0 (local_rank: 0)
exitcode  : -9 (pid: 13822)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 13822
======================================================

Here's the version on the cluster:

LSB Version:    :core-4.1-amd64:core-4.1-ia32:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-ia32:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-ia32:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: RedHatEnterpriseServer
Description:    Red Hat Enterprise Linux Server release 7.9 (Maipo)
Release:        7.9
Codename:       Maipo

Edit:

Nevermind, it was an issue with the memory available on the interactive node. Got it working by adjusting the batch size and seq len and submitting it as a job. Thank you, @b0kch01!

jiyaos · 2023-03-04T10:18:55Z

Hi, I'm stuck here, following @b0kch01's llama-cpu repo on my mac. Any suggestion? Thanks.

paulutsch · 2023-03-04T10:34:08Z

Hi, I'm stuck here, following @b0kch01's llama-cpu repo on my mac. Any suggestion? Thanks.

I‘m facing the same issue, but only with the 7B model. It seems like the consolidated.01.pth file doesn’t get downloaded when running the download.sh file: the consolidated.01.pth file is the only one I get a 403 status instead of 200.

I opened a new issue here.

jiyaos · 2023-03-05T01:01:44Z

Thank you @Urammar

astelmach01 · 2023-07-19T04:32:32Z

Running on a MacBook pro m1, after changing to this got a new error: AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'

I'm getting the same error have you been able to resolve it?

To run on m1, You have to go through the repository and modify every line that references CUDA to use the CPU instead. Even then, the sample prompt took over an hour to run for me on the smallest llama model.

Doesn't pytorch support the apple silicon GPU, so it wouldn't be slow as a snail

piex-1 · 2023-07-26T13:33:14Z

When I was using my own jetson agx orin developer kit, I also had this error. I checked online and found that jetson does not seem to support NCCL. Is this normal, because I don’t want to use the CPU to run。

aragon5956 · 2023-08-31T17:56:00Z

Try change

torch.distributed.init_process_group("nccl")

to

torch.distributed.init_process_group("gloo")

where ?

albertodepaola · 2023-09-06T16:53:03Z

Closing as author is inactive. If anyone has further questions, feel free to open a new issue. For future reference, check both llama and llama-recipes repos for getting started guides.

hua3721 · 2024-04-07T21:37:49Z

This solution works for me: #947

This was referenced Jul 24, 2023

RuntimeError: Distributed package doesn't have NCCL built in #112

Closed

Distributed package doesn't have NCCL / The requested address is not valid in its context. #104

Closed

albertodepaola added documentation Improvements or additions to documentation compatibility issues arising from specific hardware or system configs labels Sep 6, 2023

albertodepaola closed this as completed Sep 6, 2023

andrewchungg mentioned this issue Sep 7, 2023

Issue downloading weights and Tokenizer #343

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed package doesn't have NCCL built in #50

Distributed package doesn't have NCCL built in #50

alescire94 commented Mar 2, 2023 •

edited

Loading

sz85512678 commented Mar 2, 2023

christophelebrun commented Mar 2, 2023

Amadeus-AI commented Mar 2, 2023

Elvincth commented Mar 2, 2023

AmirFone commented Mar 2, 2023

b0kch01 commented Mar 2, 2023

jaygdesai commented Mar 2, 2023 •

edited

Loading

b0kch01 commented Mar 2, 2023 •

edited

Loading

AmirFone commented Mar 2, 2023

jaygdesai commented Mar 2, 2023

paulutsch commented Mar 2, 2023 •

edited

Loading

b0kch01 commented Mar 3, 2023

Urammar commented Mar 3, 2023

paulutsch commented Mar 3, 2023 •

edited

Loading

mawilson1234 commented Mar 3, 2023

b0kch01 commented Mar 3, 2023

turbo commented Mar 3, 2023

mawilson1234 commented Mar 3, 2023 •

edited

Loading

jiyaos commented Mar 4, 2023 •

edited

Loading

paulutsch commented Mar 4, 2023

jiyaos commented Mar 5, 2023

astelmach01 commented Jul 19, 2023 •

edited

Loading

piex-1 commented Jul 26, 2023

aragon5956 commented Aug 31, 2023

albertodepaola commented Sep 6, 2023

hua3721 commented Apr 7, 2024

Distributed package doesn't have NCCL built in #50

Distributed package doesn't have NCCL built in #50

Comments

alescire94 commented Mar 2, 2023 • edited Loading

sz85512678 commented Mar 2, 2023

christophelebrun commented Mar 2, 2023

Amadeus-AI commented Mar 2, 2023

Elvincth commented Mar 2, 2023

AmirFone commented Mar 2, 2023

b0kch01 commented Mar 2, 2023

jaygdesai commented Mar 2, 2023 • edited Loading

b0kch01 commented Mar 2, 2023 • edited Loading

AmirFone commented Mar 2, 2023

jaygdesai commented Mar 2, 2023

paulutsch commented Mar 2, 2023 • edited Loading

b0kch01 commented Mar 3, 2023

Urammar commented Mar 3, 2023

paulutsch commented Mar 3, 2023 • edited Loading

mawilson1234 commented Mar 3, 2023

b0kch01 commented Mar 3, 2023

turbo commented Mar 3, 2023

mawilson1234 commented Mar 3, 2023 • edited Loading

jiyaos commented Mar 4, 2023 • edited Loading

paulutsch commented Mar 4, 2023

jiyaos commented Mar 5, 2023

astelmach01 commented Jul 19, 2023 • edited Loading

piex-1 commented Jul 26, 2023

aragon5956 commented Aug 31, 2023

albertodepaola commented Sep 6, 2023

hua3721 commented Apr 7, 2024

alescire94 commented Mar 2, 2023 •

edited

Loading

jaygdesai commented Mar 2, 2023 •

edited

Loading

b0kch01 commented Mar 2, 2023 •

edited

Loading

paulutsch commented Mar 2, 2023 •

edited

Loading

paulutsch commented Mar 3, 2023 •

edited

Loading

mawilson1234 commented Mar 3, 2023 •

edited

Loading

jiyaos commented Mar 4, 2023 •

edited

Loading

astelmach01 commented Jul 19, 2023 •

edited

Loading