RuntimeError: Distributed package doesn't have NCCL built in #112

qsimeon · 2023-03-04T19:35:12Z

I was able to download the 7B weights on Mac OS Monterey. I get the following errors when I try to call the example from the README in my Terminal: torchrun --nproc_per_node 1 example.py --ckpt_dir download/model_size --tokenizer_path download/tokenizer.model

RuntimeError: Distributed package doesn't have NCCL built in
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 51512) of binary: /Users/username/opt/anaconda3/envs/pytorch/bin/python
.
.
.
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
example.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-03-04_14:30:38
  host      : COMPUTER.tld
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 51512)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

The text was updated successfully, but these errors were encountered:

Inserian · 2023-03-04T19:42:50Z

You will have to manually add nccl. Make sure you have full privileges before choosing your install from nvidia. HPC-SDK is easiest, but downloading the tar and extracting to usr\local works the same.
https://docs.nvidia.com/deeplearning/nccl/install-guide/index.html

tekspirit · 2023-03-05T00:20:35Z

I am on a mac pro with M1 max using 20 gpus - any idea how to resolve the nccl issue given no nVidia cards are installed?

tekspirit · 2023-03-05T03:46:43Z

I am on a mac pro with M1 max using 20 gpus - any idea how to resolve the nccl issue given no nVidia cards are installed?

I run print(torch.backends.mps.is_built()) and it returns TRUE but when I set torch.distributed.init_process_group("mps") in example,py and run it, it complains mps cannot be found.
error ValueError: Invalid backend: 'mps'
Any ideas for getting the backend to run on m1?

Inserian · 2023-03-05T19:43:25Z

You can't resolve nccl issues without nvidia. There are other process's that could be used as opposed to nccl, there are also other libraries which allow parallel work around but I haven't bothered with them yet.
As far as torch run, it looks like you didn't input your MP value? If that still doesn't work, try python -m torch.distributed.run --nproc_per_node MP example.py --ckpt_dir $TARGET_FOLDER/model_size --tokenizer_path $TARGET_FOLDER/tokenizer.model (editing the mp value of course).

andrewssobral · 2023-03-06T21:18:59Z

Hello guys,
I am also interested to see how to run LLaMA (e.g. 7B model) on Mac M1 or M2, any solution?

Eurus-Holmes · 2023-03-07T21:36:28Z

same issue

tekspirit · 2023-03-13T22:17:37Z

me too!

bcouetil · 2023-03-28T19:45:27Z

I'm on a Macbook Pro M1 2022 and have the same problem.

bdabykov · 2023-04-05T15:21:49Z

Did anyone find out how to solve this error?
I am having the same issue here.

bcouetil · 2023-04-05T15:25:12Z

For MacOS, we have to use the C++ implementation : https://github.com/ggerganov/llama.cpp

Works like a charm on my side, with the 3 models that fit in my RAM ✌️

signalprime · 2023-04-19T01:31:13Z

For MacOS, we have to use the C++ implementation : https://github.com/ggerganov/llama.cpp

Works like a charm on my side, with the 3 models that fit in my RAM ✌️

is it utilizing MPS acceleration from the M1 / M2 chip?

AngelTs · 2023-05-11T15:02:21Z

I also have the NCCL error:
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
untimeError: Distributed package doesn't have NCCL built in

Sunjung-Dev · 2023-05-25T05:56:56Z

I have same problem ... I use M1 pro

araby123 · 2023-07-19T11:54:20Z

same issue I have on MacBook Pro m1 16gb

raise RuntimeError("Distributed package doesn't have NCCL " "built in") RuntimeError: Distributed package doesn't have NCCL built in

bcouetil · 2023-07-19T19:25:39Z

For MacOS, we have to use the C++ implementation : https://github.com/ggerganov/llama.cpp
Works like a charm on my side, with the 3 models that fit in my RAM ✌️

is it utilizing MPS acceleration from the M1 / M2 chip?

It utilizes my iGPU to it's fullest, and not much CPU, if this is your question.

byronrode · 2023-07-20T14:36:45Z

There is a bit of customisation required to the newer model.py and generation.py files at minimum.

You need to register the mps device device = torch.device('mps') and then reference that in a few places, as well as changing .cuda() to .to(device)

torch.distributed.init_process_group("gloo") is another change to make from nccl

There are also a number of other cuda references in torch that have to change, including tensors.

sixian-C · 2023-07-21T02:59:53Z

I have the same error when running torchrun --nproc_per_node 1 example.py --ckpt_dir download/model_size --tokenizer_path download/tokenizer.model in my windows 11 conda environment, any solution?

aggiee · 2023-07-21T21:52:50Z

I was able to run Llama 2 7B on Mac M2 with (https://github.com/aggiee/llama-v2-mps)

3zerevelt · 2023-07-21T22:31:50Z

I was able to run Llama 2 7B on Mac M2 with (https://github.com/aggiee/llama-v2-mps) @aggiee

your code returns an error message indicating that the function torch.polar() is not implemented for the Metal Performance Shaders (MPS) ?? I'm also running on an M2 Mac.

g8gg · 2023-07-22T00:35:36Z

I have same problem ... I use M1 pro

aggiee · 2023-07-22T19:02:14Z

If you are referring to the following message, it is expected. It's due to M1/M2/MPS not supporting polar.out operator. It falls back to CPU for that specific operation and the warning is to inform the user about it:
"UserWarning: The operator 'aten::polar.out' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_1aidzjezue/croot/pytorch_1687856425340/work/aten/src/ATen/mps/MPSFallback.mm:11.)
freqs_cis = torch.polar(torch.ones_like(freqs), freqs) # complex64"

The solution is to set PYTORCH_ENABLE_MPS_FALLBACK=1 env variable to run this code. That should make it work (you will still see the user warning about polar.out, but the code should run past that)

araby123 · 2023-07-23T12:13:08Z

If you are referring to the following message, it is expected. It's due to M1/M2/MPS not supporting polar.out operator. It falls back to CPU for that specific operation and the warning is to inform the user about it: "UserWarning: The operator 'aten::polar.out' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /private/var/folders/nz/j6p8yfhx1mv_0grj5xl4650h0000gp/T/abs_1aidzjezue/croot/pytorch_1687856425340/work/aten/src/ATen/mps/MPSFallback.mm:11.) freqs_cis = torch.polar(torch.ones_like(freqs), freqs) # complex64"

The solution is to set PYTORCH_ENABLE_MPS_FALLBACK=1 env variable to run this code. That should make it work (you will still see the user warning about polar.out, but the code should run past that)

this work but have to much slow in performance

REASY · 2023-07-27T10:52:54Z

In case you run Windows 10 as me, I had the same RuntimeError: Distributed package doesn't have NCCL built in error. To fix it I checked the code of Llama class https://github.com/facebookresearch/llama/blob/6c7fe276574e78057f917549435a2554000a876d/llama/generation.py#L61-L62 and saw how torch.distributed is initialized. One can check all possible backends at distributed.html#torch.distributed.init_process_group. I changed the code to initialize it with gloo backend, dist.init_process_group(backend="gloo")

git diff:

Index: example_text_completion.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/example_text_completion.py b/example_text_completion.py
--- a/example_text_completion.py	(revision 6c7fe276574e78057f917549435a2554000a876d)
+++ b/example_text_completion.py	(date 1690453793087)
@@ -5,6 +5,9 @@
 
 from llama import Llama
 
+import torch
+import torch.distributed as dist
+
 
 def main(
     ckpt_dir: str,
@@ -15,6 +18,8 @@
     max_gen_len: int = 64,
     max_batch_size: int = 4,
 ):
+    dist.init_process_group(backend="gloo")
+
     generator = Llama.build(
         ckpt_dir=ckpt_dir,
         tokenizer_path=tokenizer_path,
@@ -52,4 +57,5 @@
 
 
 if __name__ == "__main__":
+    print("Cuda support:", torch.cuda.is_available(),":", torch.cuda.device_count(), "devices")
     fire.Fire(main)

After that change I was able to run

(base) H:\github\facebook\llama>torchrun --standalone --nnodes=1 example_text_completion.py --ckpt_dir llama-2-7b/ --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 4
NOTE: Redirects are currently not supported in Windows or MacOs.
master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
Cuda support: True : 1 devices
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 13.40 seconds
I believe the meaning of life is
> to be happy. I believe we are all born with the potential to be happy. The meaning of life is to be happy, but the way to get there is not always easy.
The meaning of life is to be happy. It is not always easy to be happy, but it is possible. I believe that

==================================

Simply put, the theory of relativity states that
> 1) time, space, and mass are relative, and 2) the speed of light is constant, regardless of the relative motion of the observer.
Let’s look at the first point first.
Relative Time and Space
The theory of relativity is built on the idea that time and space are relative

==================================

A brief message congratulating the team on the launch:

        Hi everyone,

        I just
> wanted to say a big congratulations to the team on the launch of the new website.

        I think it looks fantastic and I'm sure it will be a huge success.

        I look forward to working with you all on the next project.

        Best wishes



==================================

Translate English to French:

        sea otter => loutre de mer
        peppermint => menthe poivrée
        plush girafe => girafe peluche
        cheese =>
> fromage
        fish => poisson
        giraffe => girafe
        elephant => éléphant
        cat => chat
        giraffe => girafe
        elephant => éléphant
        cat => chat
        giraffe => gira

==================================

Make sure you have enough RAM and GPU RAM, my RAM consumption when the model is loaded

GPU ram:

Thu Jul 27 18:48:46 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 536.67                 Driver Version: 536.67       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090      WDDM  | 00000000:0C:00.0  On |                  Off |
| 30%   40C    P2             151W / 450W |  15160MiB / 24564MiB |     53%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

If you get OOM error like below that but you have enough GPU RAM:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 23.99 GiB total capacity; 7.55 GiB already allocated; 14.84 GiB free; 7.56 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 21068) of binary: c:\Users\User\miniconda3\python.exe

make sure that you actually have enough RAM. You can modify PageFile to use disk as memory, see https://gist.github.com/REASY/567c48e021288df505140cad7e4562ab?permalink_comment_id=4650490#gistcomment-4650490

Note: I fixed torchrun, one can modify torchrun-script.py to make it work. In my case, I use miniconda, the full path is c:\Users\User\miniconda3\Scripts\torchrun-script.py and I had to fix the first line of that to point to the full path of Python shipped with miniconda:

#!c:\Users\User\miniconda3\python.exe

My env gathered via python -m torch.utils.collect_env

(base) H:\github\facebook\llama>python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro N
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.11.4 | packaged by Anaconda, Inc. | (main, Jul  5 2023, 13:47:18) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090
Nvidia driver version: 536.67
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=3493
DeviceID=CPU0
Family=107
L2CacheSize=8192
L2CacheSpeed=
Manufacturer=AuthenticAMD
MaxClockSpeed=3493
Name=AMD Ryzen 9 3950X 16-Core Processor
ProcessorType=3
Revision=28928

Versions of relevant libraries:
[pip3] numpy==1.25.0
[pip3] torch==2.0.1
[pip3] torchaudio==2.0.2+cu117
[pip3] torchvision==0.15.2+cu117
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.8.0               hd77b12b_0
[conda] mkl                       2023.1.0         h8bd8f75_46356
[conda] mkl-service               2.4.0           py311h2bbff1b_1
[conda] mkl_fft                   1.3.6           py311hf62ec03_1
[conda] mkl_random                1.2.2           py311hf62ec03_1
[conda] numpy                     1.25.1                   pypi_0    pypi
[conda] numpy-base                1.25.0          py311hd01c5d8_0
[conda] pytorch                   2.0.1           py3.11_cuda11.8_cudnn8_0    pytorch
[conda] pytorch-cuda              11.8                 h24eeafa_5    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torch                     2.0.1                    pypi_0    pypi
[conda] torchaudio                2.0.2+cu117              pypi_0    pypi
[conda] torchvision               0.15.2                   pypi_0    pypi

MDFARHYN · 2023-07-30T00:10:09Z

Just initialized with torch.distributed.init_process_group("gloo") go to the generation.py file and find the following line

 if not torch.distributed.is_initialized():
            if device == "cuda":
                torch.distributed.init_process_group("nccl")
            else:
                torch.distributed.init_process_group("gloo")

change it to

 if not torch.distributed.is_initialized():
            if device == "cuda":
                 torch.distributed.init_process_group("gloo")
                
            else:
                torch.distributed.init_process_group("nccl")

WuhanMonkey · 2023-09-06T16:38:42Z

Seems like the issue was resolved with suggestions above. Feel free to re-open as needed. Closing

dunanyang · 2023-09-20T11:54:51Z

Why does we still don't have a solution to this error?

psmyrdek · 2023-10-22T16:37:01Z

I've been able to start execution after applying changes similar to https://github.com/facebookresearch/codellama/pull/18/files

pianistprogrammer · 2024-04-20T08:46:37Z

https://github.com/pianistprogrammer/llama3/tree/main, get this one, clone the repo, i have made changes to some files to make it work. You can find it in the commit tree

haruelrovix · 2024-05-13T04:24:24Z

Hey @pianistprogrammer 👋🏻

I tried your fork but got an error:

RuntimeError: Placeholder storage has not been allocated on MPS device!

It's M1 Pro. Any clue what is the issue?

Full logs:

(base) ➜  llama3-pianist git:(main) ✗ PYTORCH_ENABLE_MPS_FALLBACK=1 torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir Meta-Llama-3-8B-Instruct/ --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model --max_seq_len 128 --max_batch_size 4
W0513 11:17:12.135000 8470690496 torch/distributed/elastic/multiprocessing/redirects.py:27] NOTE: Redirects are currently not supported in Windows or MacOs.
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
/opt/miniconda3/lib/python3.12/site-packages/torch/__init__.py:747: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/tensor/python_tensor.cpp:433.)
  _C._set_default_tensor_type(t)
Loaded in 37.49 seconds
[rank0]: Traceback (most recent call last):
[rank0]:   File "/llama3-pianist/example_text_completion.py", line 64, in <module>
[rank0]:     fire.Fire(main)
[rank0]:   File "/opt/miniconda3/lib/python3.12/site-packages/fire/core.py", line 143, in Fire
[rank0]:     component_trace = _Fire(component, args, parsed_flag_args, context, name)
[rank0]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/miniconda3/lib/python3.12/site-packages/fire/core.py", line 477, in _Fire
[rank0]:     component, remaining_args = _CallAndUpdateTrace(
[rank0]:                                 ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/miniconda3/lib/python3.12/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
[rank0]:     component = fn(*varargs, **kwargs)
[rank0]:                 ^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/llama3-pianist/example_text_completion.py", line 51, in main
[rank0]:     results = generator.text_completion(
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/llama3-pianist/llama/generation.py", line 282, in text_completion
[rank0]:     generation_tokens, generation_logprobs = self.generate(
[rank0]:                                              ^^^^^^^^^^^^^^
[rank0]:   File "/opt/miniconda3/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/llama3-pianist/llama/generation.py", line 201, in generate
[rank0]:     logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/miniconda3/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/llama3-pianist/llama/model.py", line 291, in forward
[rank0]:     h = self.tok_embeddings(tokens)
[rank0]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/miniconda3/lib/python3.12/site-packages/fairscale/nn/model_parallel/layers.py", line 136, in forward
[rank0]:     output_parallel = F.embedding(
[rank0]:                       ^^^^^^^^^^^^
[rank0]:   File "/opt/miniconda3/lib/python3.12/site-packages/torch/nn/functional.py", line 2264, in embedding
[rank0]:     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: RuntimeError: Placeholder storage has not been allocated on MPS device!
E0513 11:17:57.237000 8470690496 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 5741) of binary: /opt/miniconda3/bin/python
Traceback (most recent call last):
  File "/opt/miniconda3/bin/torchrun", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/miniconda3/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/lib/python3.12/site-packages/torch/distributed/run.py", line 879, in main
    run(args)
  File "/opt/miniconda3/lib/python3.12/site-packages/torch/distributed/run.py", line 870, in run
    elastic_launch(
  File "/opt/miniconda3/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
example_text_completion.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-05-13_11:17:57
  host      : 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 5741)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

pianistprogrammer · 2024-05-13T18:11:30Z

I'm sorry about that, i have made a blog post on how to get it locally, https://questionbump.com/question/how-can-i-run-chatgpt-using-llms-locally/

csuhan mentioned this issue Apr 6, 2023

Error while inference OpenGVLab/LLaMA-Adapter#8

Open

WuhanMonkey closed this as completed Sep 6, 2023

WuhanMonkey added the model-usage issues related to how models are used/loaded label Sep 6, 2023

andrewchungg mentioned this issue Sep 7, 2023

Issue downloading weights and Tokenizer #343

Closed

JamesHighsmith mentioned this issue Apr 18, 2024

Issue #37: WIP - M1 NCCL Error - Utilizing Llama2 M1 Bug Fix meta-llama/llama3#44

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Distributed package doesn't have NCCL built in #112

RuntimeError: Distributed package doesn't have NCCL built in #112

qsimeon commented Mar 4, 2023 •

edited

Loading

Inserian commented Mar 4, 2023

tekspirit commented Mar 5, 2023

tekspirit commented Mar 5, 2023

Inserian commented Mar 5, 2023 •

edited

Loading

andrewssobral commented Mar 6, 2023

Eurus-Holmes commented Mar 7, 2023

tekspirit commented Mar 13, 2023

bcouetil commented Mar 28, 2023

bdabykov commented Apr 5, 2023

bcouetil commented Apr 5, 2023

signalprime commented Apr 19, 2023

AngelTs commented May 11, 2023

Sunjung-Dev commented May 25, 2023

araby123 commented Jul 19, 2023 •

edited

Loading

bcouetil commented Jul 19, 2023

byronrode commented Jul 20, 2023

sixian-C commented Jul 21, 2023

aggiee commented Jul 21, 2023

3zerevelt commented Jul 21, 2023 •

edited

Loading

g8gg commented Jul 22, 2023

aggiee commented Jul 22, 2023 •

edited

Loading

araby123 commented Jul 23, 2023

REASY commented Jul 27, 2023 •

edited

Loading

MDFARHYN commented Jul 30, 2023 •

edited

Loading

WuhanMonkey commented Sep 6, 2023

dunanyang commented Sep 20, 2023

psmyrdek commented Oct 22, 2023

pianistprogrammer commented Apr 20, 2024

haruelrovix commented May 13, 2024

pianistprogrammer commented May 13, 2024

RuntimeError: Distributed package doesn't have NCCL built in #112

RuntimeError: Distributed package doesn't have NCCL built in #112

Comments

qsimeon commented Mar 4, 2023 • edited Loading

Inserian commented Mar 4, 2023

tekspirit commented Mar 5, 2023

tekspirit commented Mar 5, 2023

Inserian commented Mar 5, 2023 • edited Loading

andrewssobral commented Mar 6, 2023

Eurus-Holmes commented Mar 7, 2023

tekspirit commented Mar 13, 2023

bcouetil commented Mar 28, 2023

bdabykov commented Apr 5, 2023

bcouetil commented Apr 5, 2023

signalprime commented Apr 19, 2023

AngelTs commented May 11, 2023

Sunjung-Dev commented May 25, 2023

araby123 commented Jul 19, 2023 • edited Loading

bcouetil commented Jul 19, 2023

byronrode commented Jul 20, 2023

sixian-C commented Jul 21, 2023

aggiee commented Jul 21, 2023

3zerevelt commented Jul 21, 2023 • edited Loading

g8gg commented Jul 22, 2023

aggiee commented Jul 22, 2023 • edited Loading

araby123 commented Jul 23, 2023

REASY commented Jul 27, 2023 • edited Loading

If you get OOM error like below that but you have enough GPU RAM:

MDFARHYN commented Jul 30, 2023 • edited Loading

WuhanMonkey commented Sep 6, 2023

dunanyang commented Sep 20, 2023

psmyrdek commented Oct 22, 2023

pianistprogrammer commented Apr 20, 2024

haruelrovix commented May 13, 2024

pianistprogrammer commented May 13, 2024

qsimeon commented Mar 4, 2023 •

edited

Loading

Inserian commented Mar 5, 2023 •

edited

Loading

araby123 commented Jul 19, 2023 •

edited

Loading

3zerevelt commented Jul 21, 2023 •

edited

Loading

aggiee commented Jul 22, 2023 •

edited

Loading

REASY commented Jul 27, 2023 •

edited

Loading

MDFARHYN commented Jul 30, 2023 •

edited

Loading