Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCPStore is not available #3334

Closed
Z-Diviner opened this issue Mar 12, 2024 · 19 comments
Closed

TCPStore is not available #3334

Z-Diviner opened this issue Mar 12, 2024 · 19 comments
Assignees
Labels

Comments

@Z-Diviner
Copy link

Hello, when I use vllm3.2 and deepseek coder-33b install to start the service through Docker, the following error is reported. What is the situation?
image

@DoctorKey
Copy link

I have the same problem

@rkooo567
Copy link
Collaborator

Can you try eager_force=True?

I think in general, your environment doesn't seem to play well with cupy nccl backend.

@wangzihe1996
Copy link

wangzihe1996 commented Mar 15, 2024

I have the same problem when the version of vllm is greater than 0.3.0. Is there any possible solution?

screenshot

@rkooo567
Copy link
Collaborator

I think that's the version cupy backend has been introduced. It is basically to enable cuda graph (cuz it didn't work with default nccl backends).

@rkooo567
Copy link
Collaborator

do you have any other error from logs? For some reasons, your env cannot initialize the cupy backend, but it is difficult to know with information you just posted.

One way to debug is to use this yourself

        from vllm.model_executor.parallel_utils import cupy_utils
        cupy_utils.init_process_group(
            world_size=1
            rank=0,
            host="localhost",
            port=<choose port>,
        )

And see why this fails.

@Ethan-yt
Copy link

I can reproduce when I use a spawn process to run vllm.
The TCPStore.run method raised an error: cannot pickle '_thread.lock' object.

There is another problem. Here this catch any exception without logging:

@Z-Diviner
Copy link
Author

do you have any other error from logs? For some reasons, your env cannot initialize the cupy backend, but it is difficult to know with information you just posted.

One way to debug is to use this yourself

        from vllm.model_executor.parallel_utils import cupy_utils
        cupy_utils.init_process_group(
            world_size=1
            rank=0,
            host="localhost",
            port=<choose port>,
        )

And see why this fails.

I used the following method to create an image and run it. Below are the specific configurations:
docker build -t harbor.4pd.io/mlsonar/vllm:cu118-vllm-0.3.2-zw-test3.1 -f /home/common-user/vllm-0.2.5/vllm_docker/Dockerfile . --network=host
image

@nkwangleiGIT
Copy link

same issue here with the latest vllm 0.3.3 version

@rkooo567
Copy link
Collaborator

@Z-Diviner can you give me the copy-able docker run command? I can try repro in my local env

@CodeScriptum
Copy link

Hello, I have the same issue with version 0.3.3 running on a ray cluster in kubernetes. Everything works fine in a single node with multiple GPU and --tensor-parallel-size enabled but running the same config with an additional worker node results in "TCPStore is not available".

I'll be happy to provide any information to help in the resolution of this issue.

Thank you

@rkooo567
Copy link
Collaborator

@CodeScriptum it'd be great if you can provide me a way to reproduce the issue! I'd like to make it clear that this doesn't seem like vllm or Ray issue. It must be cupy is somehow not working with your environment. I can try helping debug in this case, but I'd need to reproduce since I think it only happens in some environment.

@Z-Diviner
Copy link
Author

如果您能为我提供一种重现该问题的方法,那就太好了!我想明确指出,这似乎不像 vllm 或 Ray 问题。它一定是 cupy 不知何故不适用于您的环境。在这种情况下,我可以尝试帮助调试,但我需要重现,因为我认为它只发生在某些环境中。

Hello, I'm sorry, I just saw that below is the command for my Docker Run:

    --ipc=host \
    --name ds33b \
    -p 17778:8080 \
    --gpus '"device=3,4,5,6"' \
    -v /mnt/contest_ceph/zhangwei03/deepseek-coder-33b-instruct:/deepseek-coder-33b-instruct \
    harbor.4pd.io/mlsonar/vllm:cu118-vllm-0.3.2-zw-test3.1 \
    --host 0.0.0.0 \
    --port 8080 \
    --served-model-name deepseek-coder-33b-instruct \
    --model /deepseek-coder-33b-instruct \
    --trust-remote-code \
    --max-num-batched-tokens 16384 \
    --max-model-len 16384 \
    --tokenizer-mode auto \
    --tensor-parallel-size 4```
Among them, harpor.4pd.io/mlsonar/vllm: cu118-vllm-0.3.2-zw test3.1 is the vllm image I built myself。

@wangzihe1996
Copy link

@CodeScriptum it'd be great if you can provide me a way to reproduce the issue! I'd like to make it clear that this doesn't seem like vllm or Ray issue. It must be cupy is somehow not working with your environment. I can try helping debug in this case, but I'd need to reproduce since I think it only happens in some environment.

I think you can try to inference with multi-process and tensor-parallel-size greater than 1. This way can reproduce the error.

And this error occurs when the version of vllm is greater than 0.3.0. So it should be due to the cupy.

The specific reasons for this can be found below.

I can reproduce when I use a spawn process to run vllm. The TCPStore.run method raised an error: cannot pickle '_thread.lock' object.

There is another problem. Here this catch any exception without logging:

@rkooo567 rkooo567 self-assigned this Mar 26, 2024
@rkooo567
Copy link
Collaborator

Btw, there's also an effort to remove cupy #3625 from the dependency. I may not have time to tackle this for next few days

@gentleman-turk
Copy link

I have the same issue. When I set enforce_eager=true, I get a new error:

INFO 03-26 13:20:22 llm_engine.py:87] Initializing an LLM engine with config: model='TheBloke/Llama-2-7b-Chat-AWQ', tokenizer='TheBloke/Llama-2-7b-Chat-AWQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=True, quantization=awq, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, seed=0)
Dell-Dev-U:2803813:2803813 [0] NCCL INFO Bootstrap : Using eno1:192.168.1.37<0>
Dell-Dev-U:2803813:2803813 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation

Dell-Dev-U:2803813:2803813 [0] init.cc:1270 NCCL WARN Invalid config blocking attribute value -2147483648
Traceback (most recent call last):
File "/home/rch/dev/ubiquitous-distributed-ai/ray-vllm-wsl/exampleawq.py", line 16, in
llm = LLM(model="TheBloke/Llama-2-7b-Chat-AWQ", quantization="AWQ", tensor_parallel_size=2, enforce_eager=True)
File "/home/rch/miniconda3/envs/ray9/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 109, in init
self.llm_engine = LLMEngine.from_engine_args(engine_args)
File "/home/rch/miniconda3/envs/ray9/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 391, in from_engine_args
engine = cls(*engine_configs,
File "/home/rch/miniconda3/envs/ray9/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 126, in init
self._init_workers_ray(placement_group)
File "/home/rch/miniconda3/envs/ray9/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 304, in _init_workers_ray
self._run_workers("init_model",
File "/home/rch/miniconda3/envs/ray9/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 1041, in _run_workers
driver_worker_output = getattr(self.driver_worker,
File "/home/rch/miniconda3/envs/ray9/lib/python3.9/site-packages/vllm/worker/worker.py", line 94, in init_model
init_distributed_environment(self.parallel_config, self.rank,
File "/home/rch/miniconda3/envs/ray9/lib/python3.9/site-packages/vllm/worker/worker.py", line 283, in init_distributed_environment
torch.distributed.all_reduce(torch.zeros(1).cuda())
File "/home/rch/miniconda3/envs/ray9/lib/python3.9/site-packages/torch/distributed/c10d_logger.py", line 47, in wrapper
return func(*args, **kwargs)
File "/home/rch/miniconda3/envs/ray9/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 2050, in all_reduce
work = group.allreduce([tensor], opts)
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:219, invalid argument, NCCL version 2.14.3
ncclInvalidArgument: Invalid value for an argument.
Last error:
Invalid config blocking attribute value -2147483648

@youkaichao
Copy link
Member

For people who encountered the problem, please try this docker image docker pull us-central1-docker.pkg.dev/vllm-405802/vllm-ci-test-repo/vllm-test:983243ea9941b4178c53ebc4631c13f21de5a624 . It has a new backend without cupy.

We plan to remove cupy in the next release because of many instability report. Your feedback is warmly welcomed and would help our decision.

@CodeScriptum
Copy link

Hi I confirm that the new backend is working good. I did some quick tests using the v0.4.0 built from source using a ray cluster with head and worker nodes and I didn't notice any issues so far with distributed inference.

I will give more feedback if I encounter any instability, thank you for the great work.

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Oct 29, 2024
Copy link

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants