Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Hung on building docker image #3601

Closed
wizd opened this issue Mar 24, 2024 · 9 comments
Closed

[Bug]: Hung on building docker image #3601

wizd opened this issue Mar 24, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@wizd
Copy link

wizd commented Mar 24, 2024

Your current environment

$ python collect_env.py
Collecting environment information...
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.2 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.5.119
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 4090
GPU 1: NVIDIA GeForce RTX 4090

Nvidia driver version: 546.17
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      46 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             32
On-line CPU(s) list:                0-31
Vendor ID:                          GenuineIntel
Model name:                         Intel(R) Core(TM) i9-14900K
CPU family:                         6
Model:                              183
Thread(s) per core:                 2
Core(s) per socket:                 16
Socket(s):                          1
Stepping:                           1
BogoMIPS:                           6374.40
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization:                     VT-x
Hypervisor vendor:                  Microsoft
Virtualization type:                full
L1d cache:                          768 KiB (16 instances)
L1i cache:                          512 KiB (16 instances)
L2 cache:                           32 MiB (16 instances)
L3 cache:                           36 MiB (1 instance)
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.4
[pip3] onnxruntime==1.16.3
[pip3] torch==2.1.2
[pip3] torchvision==0.15.2a0
[pip3] triton==2.1.0
[conda] blas                      1.0                         mkl    conda-forge
[conda] cudatoolkit               11.8.0               h6a678d5_0
[conda] magma                     2.5.4                h6103c52_2    conda-forge
[conda] mkl                       2023.1.0         h213fc3f_46344
[conda] mkl-service               2.4.0           py311h5eee18b_1
[conda] mkl_fft                   1.3.8           py311h5eee18b_0
[conda] mkl_random                1.2.4           py311hdb19cb5_0
[conda] numpy                     1.24.4                   pypi_0    pypi
[conda] torch                     2.1.2                    pypi_0    pypi
[conda] torchvision               0.15.2          cuda118py311h4cc2eb7_0
[conda] triton                    2.1.0                    pypi_0    pypiROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.2.6
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    GPU1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      SYS                             N/A
GPU1    SYS      X                              N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

try to build for CUDA:

# build docker image
DOCKER_BUILDKIT=1 docker build . --target vllm-openai --tag vllm/vllm-openai --build-arg max_jobs=8 --build-arg nvcc_threads=2

stuck for a very long time:

[+] Building 3232.5s (30/33)                                                         docker:default
 => CACHED [build  1/10] COPY requirements-build.txt requirements-build.txt                    0.0s
 => CACHED [build  2/10] RUN --mount=type=cache,target=/root/.cache/pip     pip install -r re  0.0s
 => CACHED [build  3/10] COPY csrc csrc                                                        0.0s
 => CACHED [build  4/10] COPY setup.py setup.py                                                0.0s
 => CACHED [build  5/10] COPY cmake cmake                                                      0.0s
 => CACHED [build  6/10] COPY CMakeLists.txt CMakeLists.txt                                    0.0s
 => CACHED [build  7/10] COPY requirements.txt requirements.txt                                0.0s
 => CACHED [build  8/10] COPY pyproject.toml pyproject.toml                                    0.0s
 => CACHED [build  9/10] COPY vllm/__init__.py vllm/__init__.py                                0.0s
 => [build 10/10] RUN python3 setup.py build_ext --inplace                                  3230.1s
 => => # [16/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_fp16_bf16
 => => # .cu.o                                                                                     
 => => # [17/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_bf16_bf16
 => => # .cu.o                                                                                     
 => => # [18/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_fp16_fp16
 => => # .cu.o                                                                                     
@wizd wizd added the bug Something isn't working label Mar 24, 2024
@youkaichao
Copy link
Member

Emmmm that might be related with #3600 . You can try to watch the cpu usage of docker image, it might be overloaded by the compilation task.

@youkaichao
Copy link
Member

BTW, I think environment variables are capitalized, e.g. MAX_JOBS and NVCC_THREADS

@wizd
Copy link
Author

wizd commented Mar 24, 2024

set max_job to 1:

 > [build 10/10] RUN python3 setup.py build_ext --inplace:                                          
1.357 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'                                   
1.372 running build_ext
1.438 -- The CXX compiler identification is GNU 11.4.0
1.459 -- Detecting CXX compiler ABI info
1.560 -- Detecting CXX compiler ABI info - done
1.567 -- Check for working CXX compiler: /usr/bin/c++ - skipped
1.567 -- Detecting CXX compile features
1.567 -- Detecting CXX compile features - done
1.568 -- Build type: RelWithDebInfo
1.618 -- Found Python: /usr/bin/python3 (found version "3.10.12") found components: Interpreter Development.Module 
1.619 -- Found python matching: /usr/bin/python3.
2.457 -- Found CUDA: /usr/local/cuda (found version "12.3") 
2.918 -- The CUDA compiler identification is NVIDIA 12.3.107
2.923 -- Detecting CUDA compiler ABI info
3.370 -- Detecting CUDA compiler ABI info - done
3.395 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
3.415 -- Detecting CUDA compile features
3.415 -- Detecting CUDA compile features - done
3.417 -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.3.107") 
3.419 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
3.505 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
3.506 -- Found Threads: TRUE  
3.516 -- Caffe2: CUDA detected: 12.3
3.516 -- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
3.516 -- Caffe2: CUDA toolkit directory: /usr/local/cuda
3.611 -- Caffe2: Header version is: 12.3
3.612 CMake Warning at /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/public/cuda.cmake:185 (message):
3.612   Failed to compute shorthash for libnvrtc.so
3.612 Call Stack (most recent call first):
3.612   /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:87 (include)
3.612   /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
3.612   CMakeLists.txt:64 (find_package)
3.612 
3.612 
3.612 -- USE_CUDNN is set to 0. Compiling without cuDNN support
3.612 -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
3.612 CMake Warning at /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/public/utils.cmake:385 (message):
3.612   In the future we will require one to explicitly pass TORCH_CUDA_ARCH_LIST
3.612   to cmake instead of implicitly setting it as an env variable.  This will
3.612   become a FATAL_ERROR in future version of pytorch.
3.612 Call Stack (most recent call first):
3.612   /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/public/cuda.cmake:343 (torch_cuda_get_nvcc_gencode_flag)
3.612   /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:87 (include)
3.612   /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
3.612   CMakeLists.txt:64 (find_package)
3.612 
3.612 
3.612 -- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_89,code=sm_89;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_90,code=compute_90
3.615 CMake Warning at /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
3.615   static library kineto_LIBRARY-NOTFOUND not found.
3.615 Call Stack (most recent call first):
3.615   /usr/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
3.615   CMakeLists.txt:64 (find_package)
3.615 
3.615 
3.616 -- Found Torch: /usr/local/lib/python3.10/dist-packages/torch/lib/libtorch.so  
3.616 -- CUDA supported arches: 7.0;7.5;8.0;8.6;8.9;9.0
3.616 -- CUDA target arches: 70;75;80;86;89;90;90-virtual
4.757 -- Punica target arches: 80;86;89;90;90-virtual
4.758 -- Enabling C extension.
4.758 -- Enabling moe extension.
4.758 -- Enabling punica extension.
4.758 -- Configuring done (3.4s)
4.765 -- Generating done (0.0s)
4.794 -- Build files have been written to: /workspace/build/temp.linux-x86_64-3.10
17.50 [1/3] Building CXX object CMakeFiles/_moe_C.dir/csrc/moe/moe_ops.cpp.o
51.86 [2/3] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o
52.20 [3/3] Linking CXX shared module /workspace/build/lib.linux-x86_64-3.10/vllm/_moe_C.cpython-310-x86_64-linux-gnu.so
70.25 [1/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_bf16_bf16.cu.o
70.25 FAILED: CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_bf16_bf16.cu.o 
70.25 /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DTORCH_EXTENSION_NAME=_punica_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_punica_C_EXPORTS -I/workspace/csrc -isystem /usr/include/python3.10 -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_86,code=[compute_86,sm_86]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" "--generate-code=arch=compute_90,code=[compute_90]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8_E5M2 --threads=2 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_bf16_bf16.cu.o -MF CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_bf16_bf16.cu.o.d -x cu -c /workspace/csrc/punica/bgmv/bgmv_fp16_bf16_bf16.cu -o CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_bf16_bf16.cu.o
70.25 Segmentation fault
70.25 Segmentation fault
73.63 [2/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_fp16_bf16.cu.o
73.63 FAILED: CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_fp16_bf16.cu.o 
73.63 /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DTORCH_EXTENSION_NAME=_punica_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_punica_C_EXPORTS -I/workspace/csrc -isystem /usr/include/python3.10 -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_86,code=[compute_86,sm_86]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" "--generate-code=arch=compute_90,code=[compute_90]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8_E5M2 --threads=2 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_fp16_bf16.cu.o -MF CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_fp16_bf16.cu.o.d -x cu -c /workspace/csrc/punica/bgmv/bgmv_fp16_fp16_bf16.cu -o CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_fp16_bf16.cu.o
73.63 Segmentation fault
74.49 [3/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_fp16_bf16.cu.o
74.49 FAILED: CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_fp16_bf16.cu.o 
74.49 /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DTORCH_EXTENSION_NAME=_punica_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_punica_C_EXPORTS -I/workspace/csrc -isystem /usr/include/python3.10 -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_86,code=[compute_86,sm_86]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" "--generate-code=arch=compute_90,code=[compute_90]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8_E5M2 --threads=2 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_fp16_bf16.cu.o -MF CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_fp16_bf16.cu.o.d -x cu -c /workspace/csrc/punica/bgmv/bgmv_fp32_fp16_bf16.cu -o CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_fp16_bf16.cu.o
74.49 Segmentation fault
74.77 [4/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_bf16_bf16.cu.o
74.77 FAILED: CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_bf16_bf16.cu.o 
74.77 /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DTORCH_EXTENSION_NAME=_punica_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_punica_C_EXPORTS -I/workspace/csrc -isystem /usr/include/python3.10 -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_86,code=[compute_86,sm_86]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" "--generate-code=arch=compute_90,code=[compute_90]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8_E5M2 --threads=2 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_bf16_bf16.cu.o -MF CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_bf16_bf16.cu.o.d -x cu -c /workspace/csrc/punica/bgmv/bgmv_bf16_bf16_bf16.cu -o CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_bf16_bf16.cu.o
74.77 Segmentation fault
75.07 [5/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_fp16_bf16.cu.o
75.07 FAILED: CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_fp16_bf16.cu.o 
75.07 /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DTORCH_EXTENSION_NAME=_punica_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_punica_C_EXPORTS -I/workspace/csrc -isystem /usr/include/python3.10 -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_86,code=[compute_86,sm_86]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" "--generate-code=arch=compute_90,code=[compute_90]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8_E5M2 --threads=2 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_fp16_bf16.cu.o -MF CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_fp16_bf16.cu.o.d -x cu -c /workspace/csrc/punica/bgmv/bgmv_bf16_fp16_bf16.cu -o CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_fp16_bf16.cu.o
75.07 Segmentation fault
75.34 [6/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_bf16_fp16.cu.o
75.34 FAILED: CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_bf16_fp16.cu.o 
75.34 /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DTORCH_EXTENSION_NAME=_punica_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_punica_C_EXPORTS -I/workspace/csrc -isystem /usr/include/python3.10 -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_86,code=[compute_86,sm_86]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" "--generate-code=arch=compute_90,code=[compute_90]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8_E5M2 --threads=2 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_bf16_fp16.cu.o -MF CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_bf16_fp16.cu.o.d -x cu -c /workspace/csrc/punica/bgmv/bgmv_fp16_bf16_fp16.cu -o CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_bf16_fp16.cu.o
75.34 Segmentation fault
75.77 [7/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_fp16_fp16.cu.o
75.77 FAILED: CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_fp16_fp16.cu.o 
75.77 /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DTORCH_EXTENSION_NAME=_punica_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_punica_C_EXPORTS -I/workspace/csrc -isystem /usr/include/python3.10 -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_86,code=[compute_86,sm_86]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" "--generate-code=arch=compute_90,code=[compute_90]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8_E5M2 --threads=2 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_fp16_fp16.cu.o -MF CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_fp16_fp16.cu.o.d -x cu -c /workspace/csrc/punica/bgmv/bgmv_bf16_fp16_fp16.cu -o CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_fp16_fp16.cu.o
75.77 Segmentation fault
75.79 [8/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_fp16_fp16.cu.o
75.79 FAILED: CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_fp16_fp16.cu.o 
75.79 /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DTORCH_EXTENSION_NAME=_punica_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_punica_C_EXPORTS -I/workspace/csrc -isystem /usr/include/python3.10 -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_86,code=[compute_86,sm_86]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" "--generate-code=arch=compute_90,code=[compute_90]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8_E5M2 --threads=2 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_fp16_fp16.cu.o -MF CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_fp16_fp16.cu.o.d -x cu -c /workspace/csrc/punica/bgmv/bgmv_fp32_fp16_fp16.cu -o CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_fp16_fp16.cu.o
75.79 Segmentation fault
76.25 [9/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_bf16_bf16.cu.o
76.25 FAILED: CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_bf16_bf16.cu.o 
76.25 /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DTORCH_EXTENSION_NAME=_punica_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_punica_C_EXPORTS -I/workspace/csrc -isystem /usr/include/python3.10 -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_86,code=[compute_86,sm_86]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" "--generate-code=arch=compute_90,code=[compute_90]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8_E5M2 --threads=2 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_bf16_bf16.cu.o -MF CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_bf16_bf16.cu.o.d -x cu -c /workspace/csrc/punica/bgmv/bgmv_fp32_bf16_bf16.cu -o CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_bf16_bf16.cu.o
76.25 Segmentation fault
87.78 [10/20] Building CXX object CMakeFiles/_punica_C.dir/csrc/punica/punica_ops.cc.o
106.9 [11/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_bf16_fp16.cu.o
107.0 [12/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_fp32_fp16.cu.o
107.1 [13/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_fp32_fp16.cu.o
107.4 [14/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_bf16_fp16.cu.o
107.4 [15/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_fp32_fp16.cu.o
107.5 [16/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_fp16_fp16.cu.o
107.5 [17/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp16_fp32_bf16.cu.o
107.6 [18/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_bf16_fp32_bf16.cu.o
108.2 [19/20] Building CUDA object CMakeFiles/_punica_C.dir/csrc/punica/bgmv/bgmv_fp32_fp32_bf16.cu.o
108.2 ninja: build stopped: subcommand failed.
108.2 Traceback (most recent call last):
108.2   File "/workspace/setup.py", line 338, in <module>
108.2     setup(
108.2   File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 153, in setup
108.2     return distutils.core.setup(**attrs)
108.2   File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
108.2     dist.run_commands()
108.2   File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
108.2     self.run_command(cmd)
108.2   File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
108.2     cmd_obj.run()
108.2   File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 79, in run
108.2     _build_ext.run(self)
108.2   File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
108.2     self.build_extensions()
108.2   File "/workspace/setup.py", line 167, in build_extensions
108.2     subprocess.check_call(['cmake', *build_args], cwd=self.build_temp)
108.2   File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
108.2     raise CalledProcessError(retcode, cmd)
108.2 subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', '_punica_C', '-j', '64']' returned non-zero exit status 1.
------
Dockerfile:59
--------------------
  57 |     ENV VLLM_INSTALL_PUNICA_KERNELS=1
  58 |     
  59 | >>> RUN python3 setup.py build_ext --inplace
  60 |     #################### EXTENSION Build IMAGE ####################
  61 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c python3 setup.py build_ext --inplace" did not complete successfully: exit code: 1

@youkaichao
Copy link
Member

#3600 just gets merged. You need to pull the main branch first.

@youkaichao
Copy link
Member

From the command output, you can see '-j', '64', i.e. it is launching 64 compilation jobs simultaneously 🤣 No wonder your docker image hangs.

@wizd
Copy link
Author

wizd commented Mar 25, 2024

From the command output, you can see '-j', '64', i.e. it is launching 64 compilation jobs simultaneously 🤣 No wonder your docker image hangs.

lol I was wondering why WSL get crashed...

@youkaichao
Copy link
Member

Please do give me an update if #3600 solves your problem 😉

@wizd
Copy link
Author

wizd commented Mar 25, 2024

sure! docker image builds well for now. I'll test the new image soon.

@wizd
Copy link
Author

wizd commented Mar 25, 2024

Please do give me an update if #3600 solves your problem 😉

Thank you for this excellent contribution! With your changes, the Docker image now builds successfully and runs smoothly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants