Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] tp=4 tp=8 no response #1755

Open
2 tasks done
zeroleavebaoyang opened this issue Jun 11, 2024 · 11 comments
Open
2 tasks done

[Bug] tp=4 tp=8 no response #1755

zeroleavebaoyang opened this issue Jun 11, 2024 · 11 comments
Assignees

Comments

@zeroleavebaoyang
Copy link

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.

Describe the bug

发现一个问题, 在rtx4090 * 8 环境, 针对qwen1.5-110b-awq设置--tp 8 或者 qwen2-72b-awq 设置--tp 4 都会卡死 一直无响应,张量并行 设置大了 好像基本都会有这样的卡死情况。

Reproduction

CUDA_VISIBLE_DEVICES=0,1,2,3 lmdeploy serve api_server /home/nlp/pretrain_models/Qwen2-72B-Instruct-AWQ
--model-name qwen
--server-name 0.0.0.0
--server-port 23334
--tp 4
--cache-max-entry-count 0.1
--quant-policy 4
--model-format awq

Environment

sys.platform: linux
Python: 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6,7,8,9: NVIDIA GeForce RTX 4090
CUDA_HOME: /usr/local/cuda-11.8
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.2.2+cu118
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.7
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

TorchVision: 0.17.2+cu118
LMDeploy: 0.4.2+
transformers: 4.41.2
gradio: Not Found
fastapi: 0.111.0
pydantic: 2.7.3
triton: 2.2.0

Error traceback

No response

@lvhan028
Copy link
Collaborator

可能和 #1750 遇到的是同一类问题。
尝试下下面的方法,看看能不能解决问题

export NCCL_P2P_DISABLE=1

如果不能解决的话,麻烦在启动命令中加入 --log-level INFO,把日志贴上来吧。

@zeroleavebaoyang
Copy link
Author

image
image

@zeroleavebaoyang
Copy link
Author

zeroleavebaoyang commented Jun 11, 2024

@

可能和 #1750 遇到的是同一类问题。 尝试下下面的方法,看看能不能解决问题

export NCCL_P2P_DISABLE=1

如果不能解决的话,麻烦在启动命令中加入 --log-level INFO,把日志贴上来吧。

如图所示, 加入了 export NCCL_P2P_DISABLE=1 之后 也是一样 ,一直卡死, 并且 最后一张卡 100%

@lvhan028 lvhan028 self-assigned this Jun 12, 2024
@lvhan028
Copy link
Collaborator

I haven't reproduced this issue. My device is A100-80G(x8)
Could you try the docker image openmmlab/lmdeploy:v0.4.2?

@lvhan028
Copy link
Collaborator

我感觉得用 gdb 来debug问题所在。
在 hang 住之后,开另一个窗口,执行下面的命令

gdb attach <pid> # pid 是服务进程 id,可以通过 nvidia-smi 查看
set logging on
thread apply all bt
# 按 c,会显示所有的堆栈信息,这些信息会写到日志 gdb.txt 中
set logging off
q

执行完上述操作后,会在当前工作目录产生一个 gdb.txt 文件,麻烦把这个文件传到issue中来吧。

@CocaColaKing
Copy link

我感觉得用 gdb 来debug问题所在。 在 hang 住之后,开另一个窗口,执行下面的命令

gdb attach <pid> # pid 是服务进程 id,可以通过 nvidia-smi 查看
set logging on
thread apply all bt
# 按 c,会显示所有的堆栈信息,这些信息会写到日志 gdb.txt 中
set logging off
q

执行完上述操作后,会在当前工作目录产生一个 gdb.txt 文件,麻烦把这个文件传到issue中来吧。

gdb.txt

@LUXUS1
Copy link

LUXUS1 commented Jul 8, 2024

我也遇到了这个问题,在A8004使用lmdeploy serve api_server models/Qwen2-72B-Instruct/ --tp 4 --log-level INFO推理Qwen2-72B没有response,nvitop显示有两个卡的利用率为0。
image
使用A800
2推理Qwen2-72B-Instruct出现乱码的情况,但同样参数下Qwen2-7B-Instruct可以正常推理,结果如下:

image
#############################
--backend设为pytorch后,tp=4 or tp=2都可以正常推理且输出正常
image
image

@yixuantt
Copy link

yixuantt commented Nov 2, 2024

H800*8, same error. use llama3.1-70B-instruct. The system hangs after about 900 calls.

@lvhan028
Copy link
Collaborator

lvhan028 commented Nov 5, 2024

Maybe related to #2706
@LUXUS1 and @yixuantt, could you try to build the source code of PR #2706 and verify if it still an issue?

@lvhan028
Copy link
Collaborator

lvhan028 commented Nov 5, 2024

cc @lzhangzz

@yixuantt
Copy link

yixuantt commented Nov 5, 2024

@lvhan028 Hi, I tried that branch last night. But still does not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants