-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cmake based build system #2830
Cmake based build system #2830
Conversation
4cb0650
to
e9ae52d
Compare
@bnellnm I have access to some AMD GPUs (mi-210/mi-250) if we need to do any testing there, just lmk |
4212328
to
1f1d8c3
Compare
@bnellnm is there a build command for nvidia gpu with this new setup? I'd like to try the build on our a100 to confirm I can still build once this is merged so I can continue to test and provide comments on open PRs. I'm currently able to build on tip of main. Do I need to set any specific environment variables or should something like the usual: mdkir build && cd build
cmake .. work? |
You should be able to use cmake directly w/o going thru python. The cmake build should honor the same pytorch architecture environment variables, e.g.
The only caveat is that cmake will try to build all the extensions ( |
85a76a5
to
97ca1cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we rename supported
to default
?
CMakeLists.txt
Outdated
set(VLLM_PUNICA_GPU_ARCHES) | ||
|
||
# Process each `gencode` flag. | ||
foreach(ARCH ${VLLM_CUDA_ARCH_FLAGS}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you so much for supporting the existing workflow of VLLM_CUDA_ARCH_FLAGS
. greatly appreciated!
This looks great to me overall. @pcmoritz please do a detail pass. |
@pcmoritz, @simon-mo I had a question about the punica kernels setup. It looks like the existing I was trying to replicate the logic from
The loop in setup.py that checks the device capabilities never gets run in this situation either. |
@bnellnm Thanks for all the updates, I have one more questions: Have you looked into whether the hippification can be done without duplicating the logic from pytorch? How does the pytorch based CMake build system do it? If we can do it the same way without poking through the abstractions, it will very likely be more maintainable going forward. On the punica question, I don't exactly know -- your logic seems simpler. I think we need to make sure that these two things keep working:
|
@pcmoritz Thanks for all the good feedback. The As for the punica kernels, I'm pretty sure they will work as expected since the build now follows the same process as the other kernels, with the exception of filtering out additional unsupported architectures. And manually setting |
Here are some benchmarks -- these all look very reasonable: On this branch,
On master,
Incremental compilation on this branch
and incremental compilation with cmake:
On master,
|
The changes in setup.py messed up ROCm/HIP support. |
Hi @hliuca, this PR compiled on AMD machine. Can you elaborate on the issue? |
hon-310/vllm/model_executor/layers/fused_moe/configs |
def _is_cuda() -> bool: was changed to |
the following function checks nvcc, and it doesn't seem to check HIP. class cmake_build_ext(build_ext):
|
I think a check for |
yes. also, _is_cuda should be restored... new version doesn't work :-) def _is_cuda() -> bool: |
This PR didn't change the definition of |
This PR replaces the setup.py extension build system with a cmake build system. I've tried to preserve and move all the existing logic about flags and supported architectures into the CMakeLists.txt file so that it is standalone, i.e. you should be able to build the C++/CUDA artifacts without going thru python or setup.py. All the new setup.py does is select which extensions to build and the threading level for the nvcc compiler. The cmake build should also honor most of the same environment variables as setup.py. Some notable exceptions are
NVCC_THREADS
andVLLM_ENABLE_PUNICA_KERNELS
which are replaced by command line cmake defines, e.g.cmake ... -DNVCC_THREADS=8
.The CMakeLists.txt defines three targets:
_C
,_moe_C
and_punica_C
which correspond to the names of the object files that were previously generated by the old setup.py.There's also a
default
target which triggers the proper combination of_C
,_moe_C
and_punica_C
based on the platform, architecture andVLLM_ENABLE_PUNICA_KERNELS
variable. This makes standalone use of cmake easier since the user doesn't need to know which specific extension targets need to be built for the current platform, e.g.The CMakeLists.txt logic is a duplicate of the logic in setup.py that is used to select the extensions to build and should be kept in sync.
Some notable cmake variables for controlling versions:
PYTHON_SUPPORTED_VERSIONS
- lists the python versions supported by vllm. If a new version of python is needed (or an old one removed), this variable should be updated.CUDA_SUPPORTED_ARCHS
- a semi-colon separated list (i.e. a cmake list) of supported NVIDIA architectures. (this used to be NVIDIA_SUPPORTED_ARCHS in setup.py)HIP_SUPPORTED_ARCHS
- a cmake list of supported AMD/ROCM architectures. (this used to be ROCM_SUPPORTED_ARCHS in setup.py)The source files for each extension are controlled by the following variables which must be updated if the list of files changes (note that globs are not used since this is not recommended practice for cmake):
VLLM_EXT_SRC
for the _C extension.VLLM_MOE_EXT_SRC
for the _moe_C extension.VLLM_PUNICA_EXT_SRC
for the _punica_C extension.Incremental builds should be supported "out of the box" but
ccache
will also be used if it is present. The build can also be done with traditional makefiles if ninja isn't installed. I've tried to balance the build parallelism with number of nvcc threads but this is a heuristic that might need to be tweaked.The build flavor is controlled by the
CMAKE_BUILD_TYPE
environment variable with the default beingRelWithDebInfo
. To change the default, edit thecmake_build_ext
class insetup.py
.None of the build systems I looked at (cmake, meson, bazel) had any specific support for other GPU architectures. Although the popularity of cmake probably makes it most likely to get support for new platforms.
I've tried to comment the CMakeLists.txt file with a description of the configuration process. There were a few quirky bits that needed special handling. In particular, setting the CUDA architecture flags required a somewhat convoluted process due to how torch/cmake interact.
For a ROCm build, there is an additional "hipify" preprocessor step that is run on the CUDA source files. The
hipify.py
script is a command line wrapper around pytorch's hipify function that is used byCMakeLists.txt
. For uniformity, I've hooked the preprocessor up for all targets (not just_C
) even though_moe_C
and_punica_C
currently aren't compiled (or supported?) for ROCm.resolves #2654