Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][Core] try to manage nccl version #3802

Closed
wants to merge 6 commits into from

Conversation

youkaichao
Copy link
Member

first step to manage nccl version

@youkaichao
Copy link
Member Author

Tried to manage nccl version in the framework under pip, it is very difficult.

  • version management and specification is a disaster: The code https://github.com/vllm-project/vllm-nccl itself has a version, and we have nccl version, and also cuda version. It ends up with something like vllm-nccl==0.1.0.nccl2.18.3.cuda11, or specify via environment, e.g. VLLM_INSTALL_NCCL=2.18+cu12 pip install vllm-nccl .
  • We want pip to be able to remove the data, then we have to use data_files. But that makes nccl inside the wheel, which again hits the 100MB size limit of pypi.

Finally, I will go for the cupy approach, i.e. provide a script for users to download nccl if they want. Then we can have as many args as we want.

python -m vllm.tools.install_nccl --cuda 11 --nccl 2.18.3

@simon-mo
Copy link
Collaborator

simon-mo commented Apr 3, 2024

Constraints:

  • We cannot vendor nccl into vLLM bdist because of size issue.
  • We cannot install the PyPI nccl from NVIDIA because PyTorch already depends on it.
  • We cannot redistribute nccl to PyPI because size issue.

Goal: the out of the box experience for users should be no memory increase.

Therefore, we can upload two packages: vllm-nccl-cu11=2.18.3, vllm-nccl-cu12=2.18.3 to PyPI.

The workflow:
vLLM (bdist) -> vllm-nccl (sdist) -> download the so from nvidia site (but in case nvidia breaks the downloads, we can upload the file to our own storage bucket).

  • vLLM by default should pin to vllm-nccl-cu12=2.18.3 (and torch=2.2, doesn't really matter now)
  • vLLM cu11 distribution should pin to vllm-nccl-cu11 is possible, if not, just update the docs to add a new line
    # Install vLLM with CUDA 11.8.
    export VLLM_VERSION=0.4.0
    export PYTHON_VERSION=39
    pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VERSION}/vllm-${VLLM_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux1_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
    + pip install vllm-nccl-cu11=2.18.3

The sdist will just download the nccl so into ~/.config/vllm/nccl/cu{11,12}/*.so. Then vLLM is running, vLLM knows the cuda version running right now just through torch.cuda.... and it can append the right path to load so file for nncl.

To incorporate vLLM version as wheel (this feel a bit over-engineer to me): vllm-nccl-cu12=2.18.3.0.4.1 which is the combo of nccl version and vLLM version. https://packaging.python.org/en/latest/specifications/version-specifiers/#final-releases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants