-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: NameError: name 'ncclGetVersion' is not defined #4294
Comments
run with |
After the command is executed, the complete log is as follows: (vllm-test) $ export NCCL_DEBUG=TRACE
(vllm-test) $ CUDA_VISIBLE_DEVICES=2,3 python -m vllm.entrypoints.openai.api_server --model /data/zhaoxf4/pretrained/meta-llama/Meta-Llama-3-8B-Instruct --dtype half --tensor-parallel-size 2
INFO 04-24 09:39:20 api_server.py:151] vLLM API server version 0.4.1
INFO 04-24 09:39:20 api_server.py:152] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, served_model_name=None, lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], model='/data/zhaoxf4/pretrained/meta-llama/Meta-Llama-3-8B-Instruct', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='half', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='outlines', worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=2, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=5, disable_log_stats=False, quantization=None, enforce_eager=False, max_context_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='auto', image_input_type=None, image_token_id=None, image_input_shape=None, image_feature_size=None, scheduler_delay_factor=0.0, enable_chunked_prefill=False, speculative_model=None, num_speculative_tokens=None, model_loader_extra_config=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
WARNING 04-24 09:39:20 config.py:948] Casting torch.bfloat16 to torch.float16.
2024-04-24 09:39:22,770 INFO worker.py:1749 -- Started a local Ray instance.
INFO 04-24 09:39:23 llm_engine.py:98] Initializing an LLM engine (v0.4.1) with config: model='/data/zhaoxf4/pretrained/meta-llama/Meta-Llama-3-8B-Instruct', speculative_config=None, tokenizer='/data/zhaoxf4/pretrained/meta-llama/Meta-Llama-3-8B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 04-24 09:39:27 utils.py:598] Found nccl from environment variable VLLM_NCCL_SO_PATH=/usr/lib/x86_64-linux-gnu/libnccl.so.2
ERROR 04-24 09:39:27 pynccl.py:44] Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .It is expected if you are not running on NVIDIA/AMD GPUs.Otherwise, the nccl library might not exist, be corrupted or it does not support the current platform Linux-4.15.0-136-generic-x86_64-with-glibc2.27.One solution is to download libnccl2 version 2.18 from https://developer.download.nvidia.com/compute/cuda/repos/ and extract the libnccl.so.2 file. If you already have the library, please set the environment variable VLLM_NCCL_SO_PATH to point to the correct nccl library path.
INFO 04-24 09:39:27 pynccl_utils.py:17] Failed to import NCCL library: Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .
INFO 04-24 09:39:27 pynccl_utils.py:18] It is expected if you are not running on NVIDIA GPUs.
(RayWorkerWrapper pid=1177) INFO 04-24 09:39:27 utils.py:598] Found nccl from environment variable VLLM_NCCL_SO_PATH=/usr/lib/x86_64-linux-gnu/libnccl.so.2
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:27 pynccl.py:44] Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .It is expected if you are not running on NVIDIA/AMD GPUs.Otherwise, the nccl library might not exist, be corrupted or it does not support the current platform Linux-4.15.0-136-generic-x86_64-with-glibc2.27.One solution is to download libnccl2 version 2.18 from https://developer.download.nvidia.com/compute/cuda/repos/ and extract the libnccl.so.2 file. If you already have the library, please set the environment variable VLLM_NCCL_SO_PATH to point to the correct nccl library path.
(RayWorkerWrapper pid=1177) INFO 04-24 09:39:27 pynccl_utils.py:17] Failed to import NCCL library: Failed to load NCCL library from /usr/lib/x86_64-linux-gnu/libnccl.so.2 .
(RayWorkerWrapper pid=1177) INFO 04-24 09:39:27 pynccl_utils.py:18] It is expected if you are not running on NVIDIA GPUs.
INFO 04-24 09:39:27 selector.py:65] Cannot use FlashAttention backend for Volta and Turing GPUs.
INFO 04-24 09:39:27 selector.py:33] Using XFormers backend.
(RayWorkerWrapper pid=1177) INFO 04-24 09:39:27 selector.py:65] Cannot use FlashAttention backend for Volta and Turing GPUs.
(RayWorkerWrapper pid=1177) INFO 04-24 09:39:27 selector.py:33] Using XFormers backend.
ERROR 04-24 09:39:29 worker_base.py:153] Error executing method init_device. This might cause deadlock in distributed execution.
ERROR 04-24 09:39:29 worker_base.py:153] Traceback (most recent call last):
ERROR 04-24 09:39:29 worker_base.py:153] File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker_base.py", line 145, in execute_method
ERROR 04-24 09:39:29 worker_base.py:153] return executor(*args, **kwargs)
ERROR 04-24 09:39:29 worker_base.py:153] File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 110, in init_device
ERROR 04-24 09:39:29 worker_base.py:153] init_worker_distributed_environment(self.parallel_config, self.rank,
ERROR 04-24 09:39:29 worker_base.py:153] File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 301, in init_worker_distributed_environment
ERROR 04-24 09:39:29 worker_base.py:153] pynccl_utils.init_process_group(
ERROR 04-24 09:39:29 worker_base.py:153] File "/data/zhaoxf4/API/llama3/vllm/vllm/distributed/device_communicators/pynccl_utils.py", line 46, in init_process_group
ERROR 04-24 09:39:29 worker_base.py:153] logger.info(f"vLLM is using nccl=={ncclGetVersion()}")
ERROR 04-24 09:39:29 worker_base.py:153] NameError: name 'ncclGetVersion' is not defined
Traceback (most recent call last):
File "/home/zhaoxf4/miniconda3/envs/vllm-test/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/zhaoxf4/miniconda3/envs/vllm-test/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/data/zhaoxf4/API/llama3/vllm/vllm/entrypoints/openai/api_server.py", line 159, in <module>
engine = AsyncLLMEngine.from_engine_args(
File "/data/zhaoxf4/API/llama3/vllm/vllm/engine/async_llm_engine.py", line 361, in from_engine_args
engine = cls(
File "/data/zhaoxf4/API/llama3/vllm/vllm/engine/async_llm_engine.py", line 319, in __init__
self.engine = self._init_engine(*args, **kwargs)
File "/data/zhaoxf4/API/llama3/vllm/vllm/engine/async_llm_engine.py", line 437, in _init_engine
return engine_class(*args, **kwargs)
File "/data/zhaoxf4/API/llama3/vllm/vllm/engine/llm_engine.py", line 148, in __init__
self.model_executor = executor_class(
File "/data/zhaoxf4/API/llama3/vllm/vllm/executor/executor_base.py", line 41, in __init__
self._init_executor()
File "/data/zhaoxf4/API/llama3/vllm/vllm/executor/ray_gpu_executor.py", line 44, in _init_executor
self._init_workers_ray(placement_group)
File "/data/zhaoxf4/API/llama3/vllm/vllm/executor/ray_gpu_executor.py", line 181, in _init_workers_ray
self._run_workers("init_device")
File "/data/zhaoxf4/API/llama3/vllm/vllm/executor/ray_gpu_executor.py", line 323, in _run_workers
driver_worker_output = self.driver_worker.execute_method(
File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker_base.py", line 154, in execute_method
raise e
File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker_base.py", line 145, in execute_method
return executor(*args, **kwargs)
File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 110, in init_device
init_worker_distributed_environment(self.parallel_config, self.rank,
File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 301, in init_worker_distributed_environment
pynccl_utils.init_process_group(
File "/data/zhaoxf4/API/llama3/vllm/vllm/distributed/device_communicators/pynccl_utils.py", line 46, in init_process_group
logger.info(f"vLLM is using nccl=={ncclGetVersion()}")
NameError: name 'ncclGetVersion' is not defined
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] Error executing method init_device. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] Traceback (most recent call last):
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker_base.py", line 145, in execute_method
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] return executor(*args, **kwargs)
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 110, in init_device
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] init_worker_distributed_environment(self.parallel_config, self.rank,
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] File "/data/zhaoxf4/API/llama3/vllm/vllm/worker/worker.py", line 301, in init_worker_distributed_environment
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] pynccl_utils.init_process_group(
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] File "/data/zhaoxf4/API/llama3/vllm/vllm/distributed/device_communicators/pynccl_utils.py", line 46, in init_process_group
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] logger.info(f"vLLM is using nccl=={ncclGetVersion()}")
(RayWorkerWrapper pid=1177) ERROR 04-24 09:39:29 worker_base.py:153] NameError: name 'ncclGetVersion' is not defined Looks like no new logs have been added. (vllm-test) $ ll /usr/lib/x86_64-linux-gnu/libnccl.so.2
lrwxrwxrwx 1 root root 17 Sep 20 2022 /usr/lib/x86_64-linux-gnu/libnccl.so.2 -> libnccl.so.2.15.1* |
If you have |
PR #4259 can avoid this problem, but should just ignore the error. |
Just chiming in to say I'm experiencing a similar issue. Perhaps this is just an issue issue with how my directories are being set up. I have $ find . | grep libnccl
./usr/local/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2 Digging through the latest version of find_nccl_library seemed to confirm the issue for me >>> find_nccl_library()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 18, in find_nccl_library
File "<stdin>", line 25, in find_library
ValueError: Cannot find libnccl.so.2 in the system. I think manually passing in libnccl may be the only feasible solution |
Unless either NVIDIA/nccl#1234 or pypi/support#3792 is resolved, we have no choice but to bring libnccl.so this way. Sorry for the trouble. This is not what we want, either. We also hope to manage dependency in a standard pip way. |
Your current environment
🐛 Describe the bug
Error logs:
Reproduce commands:
By the way, this problem does not occur in single-card inference.
I have searched for similar issues and reinstalled the environment many times as described in #4257, but it did not take effect.
The text was updated successfully, but these errors were encountered: