Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to load the LLM model in vLLM on 8 ARC #11789

Open
oldmikeyang opened this issue Aug 14, 2024 · 4 comments
Open

Failure to load the LLM model in vLLM on 8 ARC #11789

oldmikeyang opened this issue Aug 14, 2024 · 4 comments
Assignees

Comments

@oldmikeyang
Copy link

With the ipex-llm docker container,
intelanalytics/ipex-llm-serving-vllm-xpu-experiment:2.1.0b2

it successfully load model in 4 ARC. But when load model in 8 ARC, it will have the following error.

root@GPU-Xeon4410Y-ARC770:/llm# bash start-vllm-service.sh
/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
2024-08-14 11:07:55,600 - INFO - intel_extension_for_pytorch auto imported
INFO 08-14 11:07:56 api_server.py:258] vLLM API server version 0.3.3
INFO 08-14 11:07:56 api_server.py:259] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, served_model_name='Qwen1.5-7B-Chat', lora_modules=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], load_in_low_bit='fp6', model='/llm/models/Qwen/Qwen1.5-7B-Chat', tokenizer=None, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, download_dir=None, load_format='auto', dtype='float16', kv_cache_dtype='auto', max_model_len=4096, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=8, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, seed=0, swap_space=4, gpu_memory_utilization=0.75, max_num_batched_tokens=10240, max_num_seqs=12, max_paddings=256, max_logprobs=5, disable_log_stats=False, quantization=None, enforce_eager=True, max_context_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, device='xpu', engine_use_ray=False, disable_log_requests=False, max_log_len=None)
WARNING 08-14 11:07:56 config.py:710] Casting torch.bfloat16 to torch.float16.
INFO 08-14 11:07:56 config.py:523] Custom all-reduce kernels are temporarily disabled due to stability issues. We will re-enable them once the issues are resolved.
2024-08-14 11:07:58,897 INFO worker.py:1788 -- Started a local Ray instance.
INFO 08-14 11:07:59 llm_engine.py:68] Initializing an LLM engine (v0.3.3) with config: model='/llm/models/Qwen/Qwen1.5-7B-Chat', tokenizer='/llm/models/Qwen/Qwen1.5-7B-Chat', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=8, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=xpu, seed=0, max_num_batched_tokens=10240, max_num_seqs=12, max_model_len=4096)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
(RayWorkerVllm pid=32282) /usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
(RayWorkerVllm pid=32282) warn(
(RayWorkerVllm pid=32483) 2024-08-14 11:08:17,825 - INFO - intel_extension_for_pytorch auto imported
INFO 08-14 11:08:18 attention.py:71] flash_attn is not found. Using xformers backend.
(RayWorkerVllm pid=32094) INFO 08-14 11:08:18 attention.py:71] flash_attn is not found. Using xformers backend.
2024-08-14 11:08:19,069 - INFO - Converting the current model to fp6 format......
2024-08-14 11:08:19,069 - INFO - Only HuggingFace Transformers models are currently supported for further optimizations
[2024-08-14 11:08:20,124] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect)
(RayWorkerVllm pid=32483) 2024-08-14 11:08:20,271 - INFO - Converting the current model to fp6 format......
(RayWorkerVllm pid=32483) 2024-08-14 11:08:20,272 - INFO - Only HuggingFace Transformers models are currently supported for further optimizations
(RayWorkerVllm pid=32094) /usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? [repeated 6x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(RayWorkerVllm pid=32094) warn( [repeated 6x across cluster]
2024-08-14 11:08:21,272 - INFO - Only HuggingFace Transformers models are currently supported for further optimizations
(RayWorkerVllm pid=32483) [2024-08-14 11:08:21,256] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect)
INFO 08-14 11:08:21 model_convert.py:249] Loading model weights took 1.0264 GB
(RayWorkerVllm pid=32349) 2024-08-14 11:08:18,290 - INFO - intel_extension_for_pytorch auto imported [repeated 6x across cluster]
(RayWorkerVllm pid=32551) 2024-08-14 11:08:20,708 - INFO - Converting the current model to fp6 format...... [repeated 6x across cluster]
(RayWorkerVllm pid=32483) 2024-08-14 11:08:25,761 - INFO - Only HuggingFace Transformers models are currently supported for further optimizations [repeated 7x across cluster]
(RayWorkerVllm pid=32483) INFO 08-14 11:08:26 model_convert.py:249] Loading model weights took 1.0264 GB
(RayWorkerVllm pid=32551) INFO 08-14 11:08:18 attention.py:71] flash_attn is not found. Using xformers backend. [repeated 6x across cluster]
(RayWorkerVllm pid=32551) [2024-08-14 11:08:21,778] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to xpu (auto detect) [repeated 6x across cluster]
2024:08:14-11:08:27:(28904) |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL
2024:08:14-11:08:27:(28904) |CCL_WARN| fallback to 'sockets' mode of ze exchange mechanism, to use CCL_ZE_IPC_EXHANGE=drmfd, set CCL_LOCAL_RANK/SIZE explicitly or use process launcher
(RayWorkerVllm pid=32094) 2024:08:14-11:08:28:(32094) |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL
(RayWorkerVllm pid=32094) 2024:08:14-11:08:28:(32094) |CCL_WARN| fallback to 'sockets' mode of ze exchange mechanism, to use CCL_ZE_IPC_EXHANGE=drmfd, set CCL_LOCAL_RANK/SIZE explicitly or use process launcher
2024:08:14-11:08:29:(33884) |CCL_WARN| no membind support for NUMA node 1, skip thread membind
2024:08:14-11:08:29:(33896) |CCL_WARN| no membind support for NUMA node 1, skip thread membind
(RayWorkerVllm pid=32094) 2024:08:14-11:08:29:(33886) |CCL_WARN| no membind support for NUMA node 1, skip thread membind
(RayWorkerVllm pid=32094) 2024:08:14-11:08:29:(33892) |CCL_WARN| no membind support for NUMA node 1, skip thread membind
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:30:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32094) 2024:08:14-11:08:30:(32094) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:08:14-11:08:32:(28904) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
(RayWorkerVllm pid=32162) INFO 08-14 11:08:27 model_convert.py:249] Loading model weights took 1.0264 GB [repeated 6x across cluster]
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/entrypoints/openai/api_server.py", line 267, in
engine = IPEXLLMAsyncLLMEngine.from_engine_args(engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/engine/engine.py", line 57, in from_engine_args
engine = cls(parallel_config.worker_use_ray,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/engine/engine.py", line 30, in init
super().init(*args, **kwargs)
File "/llm/vllm/vllm/engine/async_llm_engine.py", line 309, in init
self.engine = self._init_engine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/engine/async_llm_engine.py", line 409, in _init_engine
return engine_class(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/engine/llm_engine.py", line 106, in init
self.model_executor = executor_class(model_config, cache_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 77, in init
self._init_cache()
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 249, in _init_cache
num_blocks = self._run_workers(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 347, in _run_workers
driver_worker_output = getattr(self.driver_worker,
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/worker/worker.py", line 136, in profile_num_available_blocks
self.model_runner.profile_run()
File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/worker/model_runner.py", line 645, in profile_run
self.execute_model(seqs, kv_caches)
File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/worker/model_runner.py", line 581, in execute_model
hidden_states = model_executable(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/model_executor/models/qwen2.py", line 316, in forward
hidden_states = self.model(input_ids, positions, kv_caches,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/model_executor/models/qwen2.py", line 257, in forward
hidden_states, residual = layer(
^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/model_executor/models/qwen2.py", line 208, in forward
hidden_states, residual = self.input_layernorm(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/llm/vllm/vllm/model_executor/layers/layernorm.py", line 52, in forward
ops.fused_add_rms_norm(
TypeError: fused_add_rms_norm(): incompatible function arguments. The following argument types are supported:
1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: float) -> None

Invoked with: (tensor([[[-0.0239, 0.0522, 0.0044, ..., -0.0462, 0.1113, 0.0284],
[-0.0239, 0.0522, 0.0044, ..., -0.0462, 0.1113, 0.0284],
[-0.0239, 0.0522, 0.0044, ..., -0.0462, 0.1113, 0.0284],
...,
[-0.0240, 0.0522, 0.0044, ..., -0.0462, 0.1114, 0.0284],
[-0.0240, 0.0522, 0.0044, ..., -0.0462, 0.1114, 0.0284],
[-0.0240, 0.0522, 0.0044, ..., -0.0462, 0.1114, 0.0284]],

    [[-0.0240,  0.0522,  0.0044,  ..., -0.0462,  0.1114,  0.0284],
     [-0.0240,  0.0522,  0.0044,  ..., -0.0462,  0.1114,  0.0284],
     [-0.0240,  0.0522,  0.0044,  ..., -0.0462,  0.1114,  0.0284],
     ...,
     [-0.0239,  0.0521,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     [-0.0239,  0.0521,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     [-0.0239,  0.0521,  0.0044,  ..., -0.0462,  0.1113,  0.0284]],

    [[-0.0239,  0.0521,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     [-0.0239,  0.0521,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     [-0.0239,  0.0521,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     ...,
     [-0.0239,  0.0522,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     [-0.0239,  0.0522,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     [-0.0239,  0.0522,  0.0044,  ..., -0.0462,  0.1113,  0.0284]],

    ...,

    [[-0.0239,  0.0521,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     [-0.0239,  0.0521,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     [-0.0239,  0.0521,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     ...,
     [-0.0239,  0.0521,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     [-0.0239,  0.0521,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     [-0.0239,  0.0521,  0.0044,  ..., -0.0462,  0.1113,  0.0284]],

    [[-0.0239,  0.0521,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     [-0.0239,  0.0521,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     [-0.0239,  0.0521,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     ...,
     [-0.0239,  0.0522,  0.0045,  ..., -0.0461,  0.1113,  0.0284],
     [-0.0239,  0.0522,  0.0045,  ..., -0.0461,  0.1113,  0.0284],
     [-0.0239,  0.0522,  0.0045,  ..., -0.0461,  0.1113,  0.0284]],

    [[-0.0239,  0.0522,  0.0045,  ..., -0.0461,  0.1113,  0.0284],
     [-0.0239,  0.0522,  0.0045,  ..., -0.0461,  0.1113,  0.0284],
     [-0.0239,  0.0522,  0.0045,  ..., -0.0461,  0.1113,  0.0284],
     ...,
     [-0.0239,  0.0522,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     [-0.0239,  0.0522,  0.0044,  ..., -0.0462,  0.1113,  0.0284],
     [-0.0239,  0.0522,  0.0044,  ..., -0.0462,  0.1113,  0.0284]]],
   device='xpu:0', dtype=torch.float16), None), tensor([[[-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0165],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0165],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0165],
     ...,
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166]],

    [[-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     ...,
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166]],

    [[-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     ...,
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0165],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0165],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0165]],

    ...,

    [[-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     ...,
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166]],

    [[-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0166],
     ...,
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0165],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0165],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0165]],

    [[-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0165],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0165],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0165],
     ...,
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0165],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0165],
     [-0.0142, -0.0132,  0.0210,  ...,  0.0883,  0.0250,  0.0165]]],
   device='xpu:0', dtype=torch.float16), tensor([0.1367, 0.0952, 0.1030,  ..., 0.1338, 0.0845, 0.0928], device='xpu:0',
   dtype=torch.float16), 1e-06

(RayWorkerVllm pid=32551) 2024:08:14-11:08:28:(32551) |CCL_WARN| could not get local_idx/count from environment variables, trying to get them from ATL [repeated 6x across cluster]
(RayWorkerVllm pid=32551) 2024:08:14-11:08:28:(32551) |CCL_WARN| fallback to 'sockets' mode of ze exchange mechanism, to use CCL_ZE_IPC_EXHANGE=drmfd, set CCL_LOCAL_RANK/SIZE explicitly or use process launcher [repeated 6x across cluster]
(RayWorkerVllm pid=32551) 2024:08:14-11:08:29:(33894) |CCL_WARN| no membind support for NUMA node 0, skip thread membind [repeated 12x across cluster]
(RayWorkerVllm pid=32551) 2024:08:14-11:08:32:(32551) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices [repeated 728x across cluster]
(RayWorkerVllm pid=32162) 2024-08-14 11:08:27,278 - INFO - Only HuggingFace Transformers models are currently supported for further optimizations [repeated 6x across cluster]

@gc-fu
Copy link
Contributor

gc-fu commented Aug 15, 2024

Hi, I am currently investigating this issue. Will update to this issue once I fix it

@gc-fu
Copy link
Contributor

gc-fu commented Aug 15, 2024

Hi, this should have been fixed by PR: #11817

You can upgrade ipex-llm tomorrow and see if this works.

@oldmikeyang
Copy link
Author

with latest IPEX-LLM, the following error during inference

INFO 08-16 10:12:59 metrics.py:217] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 08-16 10:12:59 async_llm_engine.py:494] Received request cmpl-a50bf7e6bc264357815b2c77018ec28e-0: prompt: 'San Francisco is a', sampling_params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True), prompt_token_ids: [23729, 12879, 374, 264], lora_request: None.
INFO 08-16 10:13:09 metrics.py:217] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 08-16 10:13:19 metrics.py:217] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
ERROR 08-16 10:13:19 async_llm_engine.py:41] Engine background task failed
ERROR 08-16 10:13:19 async_llm_engine.py:41] Traceback (most recent call last):
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 36, in _raise_exception_on_finish
ERROR 08-16 10:13:19 async_llm_engine.py:41] task.result()
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 467, in run_engine_loop
ERROR 08-16 10:13:19 async_llm_engine.py:41] has_requests_in_progress = await asyncio.wait_for(
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/usr/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
ERROR 08-16 10:13:19 async_llm_engine.py:41] return fut.result()
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 441, in engine_step
ERROR 08-16 10:13:19 async_llm_engine.py:41] request_outputs = await self.engine.step_async()
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 211, in step_async
ERROR 08-16 10:13:19 async_llm_engine.py:41] output = await self.model_executor.execute_model_async(
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 443, in execute_model_async
ERROR 08-16 10:13:19 async_llm_engine.py:41] all_outputs = await self._run_workers_async(
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 433, in _run_workers_async
ERROR 08-16 10:13:19 async_llm_engine.py:41] all_outputs = await asyncio.gather(*coros)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/usr/lib/python3.11/asyncio/tasks.py", line 694, in _wrap_awaitable
ERROR 08-16 10:13:19 async_llm_engine.py:41] return (yield from awaitable.await())
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] ray.exceptions.RayTaskError(RuntimeError): ray::RayWorkerVllm.execute_method() (pid=195136, ip=10.240.108.91, actor_id=b933b7411289683bf7fc97c201000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x77b08879b6d0>)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/engine/ray_utils.py", line 37, in execute_method
ERROR 08-16 10:13:19 async_llm_engine.py:41] return executor(*args, **kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 08-16 10:13:19 async_llm_engine.py:41] return func(*args, **kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/worker/worker.py", line 236, in execute_model
ERROR 08-16 10:13:19 async_llm_engine.py:41] output = self.model_runner.execute_model(seq_group_metadata_list,
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 08-16 10:13:19 async_llm_engine.py:41] return func(*args, **kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/worker/model_runner.py", line 581, in execute_model
ERROR 08-16 10:13:19 async_llm_engine.py:41] hidden_states = model_executable(
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
ERROR 08-16 10:13:19 async_llm_engine.py:41] return self._call_impl(*args, **kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
ERROR 08-16 10:13:19 async_llm_engine.py:41] return forward_call(*args, **kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/model_executor/models/qwen2.py", line 316, in forward
ERROR 08-16 10:13:19 async_llm_engine.py:41] hidden_states = self.model(input_ids, positions, kv_caches,
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
ERROR 08-16 10:13:19 async_llm_engine.py:41] return self._call_impl(*args, **kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
ERROR 08-16 10:13:19 async_llm_engine.py:41] return forward_call(*args, **kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/model_executor/models/qwen2.py", line 253, in forward
ERROR 08-16 10:13:19 async_llm_engine.py:41] hidden_states = self.embed_tokens(input_ids)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
ERROR 08-16 10:13:19 async_llm_engine.py:41] return self._call_impl(*args, **kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
ERROR 08-16 10:13:19 async_llm_engine.py:41] return forward_call(*args, **kwargs)
ERROR 08-16 10:13:19 async_llm_engine.py:41] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] File "/home/llm/vllm-ipex-forked/vllm/model_executor/layers/vocab_parallel_embedding.py", line 107, in forward
ERROR 08-16 10:13:19 async_llm_engine.py:41] output_parallel[input_mask, :] = 0.0
ERROR 08-16 10:13:19 async_llm_engine.py:41] ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
ERROR 08-16 10:13:19 async_llm_engine.py:41] RuntimeError: Allocation is out of device memory on current platform.
2024-08-16 10:13:19,835 - ERROR - Exception in callback functools.partial(<function _raise_exception_on_finish at 0x701317444040>, error_callback=<bound method AsyncLLMEngine._error_callback of <ipex_llm.vllm.xpu.engine.engine.IPEXLLMAsyncLLMEngine object at 0x7013133c7310>>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x701317444040>, error_callback=<bound method AsyncLLMEngine._error_callback of <ipex_llm.vllm.xpu.engine.engine.IPEXLLMAsyncLLMEngine object at 0x7013133c7310>>)>
Traceback (most recent call last):
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 36, in _raise_exception_on_finish
task.result()
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 467, in run_engine_loop
has_requests_in_progress = await asyncio.wait_for(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
return fut.result()
^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 441, in engine_step
request_outputs = await self.engine.step_async()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 211, in step_async
output = await self.model_executor.execute_model_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 443, in execute_model_async
all_outputs = await self._run_workers_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 433, in _run_workers_async
all_outputs = await asyncio.gather(*coros)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/tasks.py", line 694, in _wrap_awaitable
return (yield from awaitable.await())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ray.exceptions.RayTaskError(RuntimeError): ray::RayWorkerVllm.execute_method() (pid=195136, ip=10.240.108.91, actor_id=b933b7411289683bf7fc97c201000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x77b08879b6d0>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/engine/ray_utils.py", line 37, in execute_method
return executor(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/worker/worker.py", line 236, in execute_model
output = self.model_runner.execute_model(seq_group_metadata_list,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/worker/model_runner.py", line 581, in execute_model
hidden_states = model_executable(
^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/model_executor/models/qwen2.py", line 316, in forward
hidden_states = self.model(input_ids, positions, kv_caches,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/model_executor/models/qwen2.py", line 253, in forward
hidden_states = self.embed_tokens(input_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/model_executor/layers/vocab_parallel_embedding.py", line 107, in forward
output_parallel[input_mask, :] = 0.0
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
RuntimeError: Allocation is out of device memory on current platform.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 43, in _raise_exception_on_finish
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
INFO 08-16 10:13:19 async_llm_engine.py:152] Aborted request cmpl-a50bf7e6bc264357815b2c77018ec28e-0.
INFO: 127.0.0.1:44858 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call
raise exc
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call
await self.app(scope, receive, _send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in call
await self.app(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/routing.py", line 754, in call
await self.middleware_stack(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/routing.py", line 774, in app
await route.handle(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/routing.py", line 295, in handle
await self.app(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/starlette/routing.py", line 74, in app
response = await f(request)
^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ipex_llm/vllm/xpu/entrypoints/openai/api_server.py", line 213, in create_completion
generator = await openai_serving_completion.create_completion(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/entrypoints/openai/serving_completion.py", line 179, in create_completion
async for i, res in result_generator:
File "/home/llm/vllm-ipex-forked/vllm/entrypoints/openai/serving_completion.py", line 82, in consumer
raise item
File "/home/llm/vllm-ipex-forked/vllm/entrypoints/openai/serving_completion.py", line 67, in producer
async for item in iterator:
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 625, in generate
raise e
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 619, in generate
async for request_output in stream:
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 75, in anext
raise result
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 36, in _raise_exception_on_finish
task.result()
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 467, in run_engine_loop
has_requests_in_progress = await asyncio.wait_for(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
return fut.result()
^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 441, in engine_step
request_outputs = await self.engine.step_async()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/engine/async_llm_engine.py", line 211, in step_async
output = await self.model_executor.execute_model_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 443, in execute_model_async
all_outputs = await self._run_workers_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ipex_llm/vllm/xpu/ipex_llm_gpu_executor.py", line 433, in _run_workers_async
all_outputs = await asyncio.gather(*coros)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/tasks.py", line 694, in _wrap_awaitable
return (yield from awaitable.await())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ray.exceptions.RayTaskError(RuntimeError): ray::RayWorkerVllm.execute_method() (pid=195136, ip=10.240.108.91, actor_id=b933b7411289683bf7fc97c201000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x77b08879b6d0>)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/engine/ray_utils.py", line 37, in execute_method
return executor(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/worker/worker.py", line 236, in execute_model
output = self.model_runner.execute_model(seq_group_metadata_list,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/worker/model_runner.py", line 581, in execute_model
hidden_states = model_executable(
^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/model_executor/models/qwen2.py", line 316, in forward
hidden_states = self.model(input_ids, positions, kv_caches,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/model_executor/models/qwen2.py", line 253, in forward
hidden_states = self.embed_tokens(input_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/llm/vllm-ipex-forked/vllm/model_executor/layers/vocab_parallel_embedding.py", line 107, in forward
output_parallel[input_mask, :] = 0.0
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
RuntimeError: Allocation is out of device memory on current platform.
INFO 08-16 10:13:29 metrics.py:217] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 08-16 10:13:39 metrics.py:217] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
2024:08:16-10:13:39:(196608) |CCL_ERROR| worker.cpp:353 ccl_worker_func: worker 6 caught internal exception: oneCCL: ze_call.cpp:43 do_call: EXCEPTION: ze error at zeCommandQueueExecuteCommandLists, code: ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY
[2024-08-16 10:13:39,930 E 191503 196608] logging.cc:108: Unhandled exception: N3ccl2v19exceptionE. what(): oneCCL: ze_call.cpp:43 do_call: EXCEPTION: ze error at zeCommandQueueExecuteCommandLists, code: ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY
[2024-08-16 10:13:39,938 E 191503 196608] logging.cc:115: Stack trace:
/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ray/_raylet.so(+0x10b7bea) [0x7013082b7bea] ray::operator<<()
/home/llm/venv/ipex-llm-0816/lib/python3.11/site-packages/ray/_raylet.so(+0x10bae72) [0x7013082bae72] ray::TerminateHandler()
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x70128c4ae20c]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x70128c4ae277]
/opt/intel/1ccl-wks/lib/libccl.so.1(+0x4c26e9) [0x6fe1a54c26e9] ccl_worker_func()
/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x70131d494ac3]
/lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x70131d526850]

*** SIGABRT received at time=1723774419 on cpu 41 ***
PC: @ 0x70131d4969fc (unknown) pthread_kill
@ 0x70131d442520 (unknown) (unknown)
[2024-08-16 10:13:39,938 E 191503 196608] logging.cc:440: *** SIGABRT received at time=1723774419 on cpu 41 ***
[2024-08-16 10:13:39,938 E 191503 196608] logging.cc:440: PC: @ 0x70131d4969fc (unknown) pthread_kill
[2024-08-16 10:13:39,939 E 191503 196608] logging.cc:440: @ 0x70131d442520 (unknown) (unknown)
Fatal Python error: Aborted

Extension modules: charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, sentencepiece._sentencepiece, PIL._imaging, PIL._imagingft, markupsafe._speedups, psutil._psutil_linux, psutil._psutil_posix, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, regex._regex, scipy._lib._ccallback_c, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, pyarrow.lib, pyarrow._json, httptools.parser.parser, httptools.parser.url_parser, websockets.speedups (total: 49)

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
LIBXSMM_TARGET: spr [Intel(R) Xeon(R) Silver 4410Y]
Registry and code: 13 MB
Command: python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server --served-model-name Qwen2-72B-Instruct --port 8000 --model /home/llm/local_models/Qwen/Qwen2-72B-Instruct --trust-remote-code --gpu-memory-utilization 0.90 --device xpu --dtype float16 --enforce-eager --load-in-low-bit fp8 --max-model-len 6656 --max-num-batched-tokens 6656 --tensor-parallel-size 8
Uptime: 3880.324215 s
start_vllm_arc.sh: line 28: 191503 Aborted (core dumped) python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server --served-model-name $served_model_name --port 8000 --model $model --trust-remote-code --gpu-memory-utilization 0.90 --device xpu --dtype float16 --enforce-eager --load-in-low-bit fp8 --max-model-len 6656 --max-num-batched-tokens 6656 --tensor-parallel-size 8

(ipex-llm-0816) llm@GPU-Xeon4410Y-ARC770:~/ipex-llm-0816/python/llm/scripts$ bash env-check.sh

PYTHON_VERSION=3.11.9

Transformers is not installed.

PyTorch is not installed.

ipex-llm Version: 2.1.0b20240815

IPEX is not installed.

CPU Information:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Silver 4410Y
CPU family: 6
Model: 143
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
Stepping: 8
CPU max MHz: 3900.0000
CPU min MHz: 800.0000
BogoMIPS: 4000.00

Total CPU Memory: 755.542 GB

Operating System:
Ubuntu 22.04.4 LTS \n \l


Linux GPU-Xeon4410Y-ARC770 6.5.0-45-generic #45~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Jul 15 16:40:02 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

CLI:
Version: 1.2.27.20240626
Build ID: 7f002d24

Service:
Version: 1.2.27.20240626
Build ID: 7f002d24
Level Zero Version: 1.16.0

Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7
Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7
Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7
Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7
Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7
Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7
Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7
Driver UUID 32342e31-332e-3239-3133-382e37000000
Driver Version 24.13.29138.7

Driver related package version:
ii intel-fw-gpu 2024.17.5-32922.04 all Firmware package for Intel integrated and discrete GPUs
ii intel-i915-dkms 1.24.3.23.240419.26+i30-1 all Out of tree i915 driver.
ii intel-level-zero-gpu 1.3.29138.7 amd64 Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii level-zero-dev 1.16.15-881
22.04 amd64 Intel(R) Graphics Compute Runtime for oneAPI Level Zero.

env-check.sh: line 167: sycl-ls: command not found
igpu not detected

xpu-smi is properly installed.

+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information |
+-----------+--------------------------------------------------------------------------------------+
| 0 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-0019-0000-000856a08086 |
| | PCI BDF Address: 0000:19:00.0 |
| | DRM Device: /dev/dri/card1 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
| 1 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-002c-0000-000856a08086 |
| | PCI BDF Address: 0000:2c:00.0 |
| | DRM Device: /dev/dri/card2 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
| 2 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-0052-0000-000856a08086 |
| | PCI BDF Address: 0000:52:00.0 |
| | DRM Device: /dev/dri/card3 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
| 3 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-0065-0000-000856a08086 |
| | PCI BDF Address: 0000:65:00.0 |
| | DRM Device: /dev/dri/card4 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
| 4 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-009b-0000-000856a08086 |
| | PCI BDF Address: 0000:9b:00.0 |
| | DRM Device: /dev/dri/card5 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
| 5 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-00ad-0000-000856a08086 |
| | PCI BDF Address: 0000:ad:00.0 |
| | DRM Device: /dev/dri/card6 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
| 6 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-00d1-0000-000856a08086 |
| | PCI BDF Address: 0000:d1:00.0 |
| | DRM Device: /dev/dri/card7 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
| 7 | Device Name: Intel(R) Arc(TM) A770 Graphics |
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-00e3-0000-000856a08086 |
| | PCI BDF Address: 0000:e3:00.0 |
| | DRM Device: /dev/dri/card8 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
GPU0 Memory size=16M
GPU1 Memory size=16G
GPU2 Memory size=16G
GPU3 Memory size=16G
GPU4 Memory size=16G
GPU5 Memory size=16G
GPU6 Memory size=16G
GPU7 Memory size=16G
GPU8 Memory size=16G

03:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52) (prog-if 00 [VGA controller])
DeviceName: Onboard VGA
Subsystem: ASPEED Technology, Inc. ASPEED Graphics Family
Flags: medium devsel, IRQ 16, NUMA node 0
Memory at 94000000 (32-bit, non-prefetchable) [size=16M]
Memory at 95000000 (32-bit, non-prefetchable) [size=256K]
I/O ports at 2000 [size=128]
Capabilities:
Kernel driver in use: ast
Kernel modules: ast

19:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334
Flags: bus master, fast devsel, latency 0, IRQ 130, NUMA node 0
Memory at 9e000000 (64-bit, non-prefetchable) [size=16M]
Memory at 5f800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at 9f000000 [disabled] [size=2M]
Capabilities:
Kernel driver in use: i915
Kernel modules: i915

2c:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334
Flags: bus master, fast devsel, latency 0, IRQ 133, NUMA node 0
Memory at a8000000 (64-bit, non-prefetchable) [size=16M]
Memory at 6f800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at a9000000 [disabled] [size=2M]
Capabilities:
Kernel driver in use: i915
Kernel modules: i915

52:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334
Flags: bus master, fast devsel, latency 0, IRQ 136, NUMA node 0
Memory at bc000000 (64-bit, non-prefetchable) [size=16M]
Memory at 8f800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at bd000000 [disabled] [size=2M]
Capabilities:
Kernel driver in use: i915
Kernel modules: i915

65:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334
Flags: bus master, fast devsel, latency 0, IRQ 139, NUMA node 0
Memory at c6000000 (64-bit, non-prefetchable) [size=16M]
Memory at 9f800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at c7000000 [disabled] [size=2M]
Capabilities:
Kernel driver in use: i915
Kernel modules: i915

9b:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334
Flags: bus master, fast devsel, latency 0, IRQ 142, NUMA node 1
Memory at d8000000 (64-bit, non-prefetchable) [size=16M]
Memory at cf800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at d9000000 [disabled] [size=2M]
Capabilities:
Kernel driver in use: i915
Kernel modules: i915

ad:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334
Flags: bus master, fast devsel, latency 0, IRQ 145, NUMA node 1
Memory at e0000000 (64-bit, non-prefetchable) [size=16M]
Memory at df800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at e1000000 [disabled] [size=2M]
Capabilities:
Kernel driver in use: i915
Kernel modules: i915

d1:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1334
Flags: bus master, fast devsel, latency 0, IRQ 148, NUMA node 1
Memory at f1000000 (64-bit, non-prefetchable) [size=16M]
Memory at ff800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at f2000000 [disabled] [size=2M]
Capabilities:
Kernel driver in use: i915
Kernel modules: i915

e3:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 1020
Flags: bus master, fast devsel, latency 0, IRQ 151, NUMA node 1
Memory at f9000000 (64-bit, non-prefetchable) [size=16M]
Memory at 10f800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at fa000000 [disabled] [size=2M]
Capabilities:
Kernel driver in use: i915
Kernel modules: i915

@gc-fu
Copy link
Contributor

gc-fu commented Aug 16, 2024

Hi, this problem is due to out of memory. You can reduce gpu-utilization-rate, or reduce max-num-batched-tokens.

Use this command will fix the problem:

#!/bin/bash
model="/home/llm/local_models/Qwen/Qwen2-72B-Instruct"
served_model_name="Qwen2-72B-Instruct"
export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
export SYCL_CACHE_PERSISTENT=1
export TORCH_LLM_ALLREDUCE=0
export CCL_DG2_ALLREDUCE=1
# Tensor parallel related arguments:
export CCL_WORKER_COUNT=2
export FI_PROVIDER=shm
export CCL_ATL_TRANSPORT=ofi
export CCL_ZE_IPC_EXCHANGE=sockets
export CCL_ATL_SHM=1
source /opt/intel/1ccl-wks/setvars.sh
python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server \
--served-model-name $served_model_name \
--port 8000 \
--model $model \
--trust-remote-code \
--gpu-memory-utilization 0.85 \
--device xpu \
--dtype float16 \
--enforce-eager \
--load-in-low-bit fp8 \
--max-model-len 4000 \
--max-num-batched-tokens 4000 \
--tensor-parallel-size 8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants