Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building VLLM from source and running inference: No module named 'vllm._C' #3061

Open
Lena-Jurkschat opened this issue Feb 27, 2024 · 12 comments
Labels

Comments

@Lena-Jurkschat
Copy link

Hi,
after building vllm from source, the following error occures when running a multi-gpu inference using a local ray instance:

File "vllm/vllm/model_executor/layers/quantization/awq.py", line 6, in <module>
    from vllm._C import ops
ModuleNotFoundError: No module named 'vllm._C'

I already checked Issue #1814, which does not help. So there is no additional vllm folder to delete, which could lead to confusion.

I run the following to build vllm:

export VLLM_USE_PRECOMPILED=false
git clone https://github.com/vllm-project/vllm.git
cd vllm
pip install -e .

I run the inference using

from langchain_community.llms import VLLM
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

llm = VLLM(model=model_name,
           trust_remote_code=True,  # mandatory for hf models
           max_new_tokens=100,
           top_k=top_k,
           top_p=top_p,
           temperature=temperature,
           tensor_parallel_size=2)

prompt = PromptTemplate(template=template, input_variables=["ques"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
llm_chain.run(ques)

However, building vllm via pip instead leads to an MPI error when running multi-gpu inference (probably due to version incompatiablity of MPI on my System and the prebuild vllm things?), so I wanted to build it from source.

(RayWorkerVllm pid=3391490) *** An error occurred in MPI_Init_thread
(RayWorkerVllm pid=3391490) *** on a NULL communicator
(RayWorkerVllm pid=3391490) *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
(RayWorkerVllm pid=3391490) ***    and potentially your MPI job)
(RayWorkerVllm pid=3391490) [i8006:3391490] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

Some Specs:

  • Python 3.10
  • CUDA 12.1
  • OpenMPI/4.1.4.
  • Torch 2.1.2
@george-kuanli-peng
Copy link

I have the same problem (vllm built from source):

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/workspace/vllm/vllm/entrypoints/llm.py", line 109, in __init__
    self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "/workspace/vllm/vllm/engine/llm_engine.py", line 371, in from_engine_args
    engine = cls(*engine_configs,
  File "/workspace/vllm/vllm/engine/llm_engine.py", line 120, in __init__
    self._init_workers()
  File "/workspace/vllm/vllm/engine/llm_engine.py", line 143, in _init_workers
    from vllm.worker.worker import Worker
  File "/workspace/vllm/vllm/worker/worker.py", line 11, in <module>
    from vllm.model_executor import set_random_seed
  File "/workspace/vllm/vllm/model_executor/__init__.py", line 2, in <module>
    from vllm.model_executor.model_loader import get_model
  File "/workspace/vllm/vllm/model_executor/model_loader.py", line 10, in <module>
    from vllm.model_executor.weight_utils import (get_quant_config,
  File "/workspace/vllm/vllm/model_executor/weight_utils.py", line 18, in <module>
    from vllm.model_executor.layers.quantization import (get_quantization_config,
  File "/workspace/vllm/vllm/model_executor/layers/quantization/__init__.py", line 4, in <module>
    from vllm.model_executor.layers.quantization.awq import AWQConfig
  File "/workspace/vllm/vllm/model_executor/layers/quantization/awq.py", line 6, in <module>
    from vllm._C import ops
ModuleNotFoundError: No module named 'vllm._C'

@cocoderss
Copy link

cocoderss commented Mar 3, 2024

I have the same problem (vllm built from source):

Had the same issue too, it turned out because i had a folder named vllm in my working directory.
So whenever I import vllm, I get this issue, so the solution is to run/import vllm when you are not in a directory that contains a folder named vllm.

Unfortunately I had another issue, so now gave up.

@Lena-Jurkschat
Copy link
Author

Installing vllm==0.2.6 solves at least the No module named 'vllm._C' error but the downgrade is not nice.

When building vllm without precompiled (VLLM_USE_PRECOMPILED=false), it leads to thevllm._C unfound. Is there something missing in the setup.py then and a way to fix that?

@ghost
Copy link

ghost commented Mar 30, 2024

I installed using "python setup.py install" and got this error. I fixed it with "python setup.py develop"

@MojHnd
Copy link

MojHnd commented Apr 2, 2024

@Lena-Jurkschat @george-kuanli-peng
Did you find the solution?

@Lena-Jurkschat
Copy link
Author

@Lena-Jurkschat @george-kuanli-peng Did you find the solution?

Unfortunately, not. "python setup.py develop" did not work either in combination with VLLM_USE_PRECOMPILED=false.

@MojHnd
Copy link

MojHnd commented Apr 4, 2024

@liangfu

I successfully installed vllm-0.4.0.post1+neuron213.

In setup.py, there is this function:

if not _is_neuron():
    ext_modules.append(CMakeExtension(name="vllm._C"))

and

cmdclass={"build_ext": cmake_build_ext} if not _is_neuron() else {},

So, vllm._C won't be created. This results in ModuleNotFoundError: No module named 'vllm._C'.

How to fix it?

@MojHnd
Copy link

MojHnd commented Apr 5, 2024

@Lena-Jurkschat @george-kuanli-peng Did you find the solution?

Unfortunately, not. "python setup.py develop" did not work either in combination with VLLM_USE_PRECOMPILED=false.

I just solved it this way.

The problem is with from vllm._C import ops while there is no vllm._C.
We need ops that exists in your_environment_name/lib/python3.10/site-packages/vllm/model_executor/layers/ (see the figure below)
image

So, what we have to do is to change from vllm._C import ops to from vllm.model_executor.layers import ops in every single file of the package.
This solves the problem :)

@dongreenberg
Copy link

dongreenberg commented Apr 5, 2024

I too am getting this error and not in a position to find and replace all the instances of 'vllm._C' in the code. Cc @liangfu

Hardware: inf2.8xlarge
AMI: Neuron DLAMI us-east-1 (ami-0e0f965ee5cfbf89b)
Versions:

aws-neuronx-runtime-discovery==2.9
libneuronxla==2.0.965
neuronx-cc==2.13.66.0+6dfecc895
torch-neuronx==2.1.2.2.1.0
transformers-neuronx==0.10.0.21
torch==2.1.2
torch-neuronx==2.1.2.2.1.0
torch-xla==2.1.2

@adamrb
Copy link
Contributor

adamrb commented Apr 24, 2024

Was also seeing the same error on an inf2 instance with the latest release.

Running the following in the vLLM directory before installing with pip solved the issue for me.

find . -type f -exec sed -i 's/from vllm\._C import ops/from vllm.model_executor.layers import ops/g' {} +

I'm not sure if this is a solution for all distributions.

diff --git a/benchmarks/kernels/benchmark_aqlm.py b/benchmarks/kernels/benchmark_aqlm.py
index 9602d20..02c816b 100644
--- a/benchmarks/kernels/benchmark_aqlm.py
+++ b/benchmarks/kernels/benchmark_aqlm.py
@@ -6,7 +6,7 @@ from typing import Optional
 import torch
 import torch.nn.functional as F

-from vllm._C import ops
+from vllm.model_executor.layers import ops
 from vllm.model_executor.layers.quantization.aqlm import (
     dequantize_weight, generic_dequantize_gemm, get_int_dtype,
     optimized_dequantize_gemm)
diff --git a/vllm/_custom_ops.py b/vllm/_custom_ops.py
index e4b16ed..a7ae8b4 100644
--- a/vllm/_custom_ops.py
+++ b/vllm/_custom_ops.py
@@ -4,7 +4,7 @@ import torch

 try:
     from vllm._C import cache_ops as vllm_cache_ops
-    from vllm._C import ops as vllm_ops
+    from vllm.model_executor.layers import ops as vllm_ops
 except ImportError:
     pass

diff --git a/vllm/model_executor/layers/quantization/aqlm.py b/vllm/model_executor/layers/quantization/aqlm.py
index 6115b1d..566a9cf 100644
--- a/vllm/model_executor/layers/quantization/aqlm.py
+++ b/vllm/model_executor/layers/quantization/aqlm.py
@@ -8,7 +8,7 @@ import torch
 import torch.nn.functional as F
 from torch.nn.parameter import Parameter

-from vllm._C import ops
+from vllm.model_executor.layers import ops
 from vllm.model_executor.layers.linear import (LinearMethodBase,
                                                set_weight_attrs)
 from vllm.model_executor.layers.quantization.base_config import (

@scao0208
Copy link

Hi guys I have one way to solve this problem. I found it because it used vllm module "/home/user/vllm/vllm" but not the env package "/home/user/miniconda3/envs/vllm/lib/pythonxx.xx/site-packages/vllm".
So I just copy the .so compilation files under the "/home/user/vllm/vllm"
Untitled 2

This is the successful:
success

Copy link

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

@github-actions github-actions bot added the stale label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants