Error with 32k Long Text in chatglm2-6b-32k Model #1725

junior-zsy · 2023-11-20T10:01:40Z

python3 api_server.py --model /hbox2dir/chatglm2-6b-32k --trust-remote-code --host 0.0.0.0 --port 7070 --tensor-parallel-size 2

2023-11-20 09:55:13,313 INFO worker.py:1642 -- Started a local Ray instance.
INFO:     Started server process [278296]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:7070 (Press CTRL+C to quit)
(RayWorker pid=281502) [2023-11-20 09:55:58,328 E 281502 281502] logging.cc:97: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered
(RayWorker pid=281502) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(RayWorker pid=281502) For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
(RayWorker pid=281502) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(RayWorker pid=281502) 
(RayWorker pid=281502) Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
(RayWorker pid=281502) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f5026eea617 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f5026ea598d in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f5026fa59f8 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
(RayWorker pid=281502) frame #3: <unknown function> + 0x16746 (0x7f5026f6e746 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
(RayWorker pid=281502) frame #4: <unknown function> + 0x1947d (0x7f5026f7147d in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
(RayWorker pid=281502) frame #5: <unknown function> + 0x1989d (0x7f5026f7189d in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
(RayWorker pid=281502) frame #6: <unknown function> + 0x510c46 (0x7f4faf33fc46 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
(RayWorker pid=281502) frame #7: <unknown function> + 0x55ca7 (0x7f5026ecfca7 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #8: c10::TensorImpl::~TensorImpl() + 0x1e3 (0x7f5026ec7cb3 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #9: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f5026ec7e49 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #10: <unknown function> + 0x7c1708 (0x7f4faf5f0708 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
(RayWorker pid=281502) frame #11: THPVariable_subclass_dealloc(_object*) + 0x325 (0x7f4faf5f0ab5 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
(RayWorker pid=281502) frame #12: ray::RayWorker.execute_method() [0x4e0970]
(RayWorker pid=281502) frame #13: ray::RayWorker.execute_method() [0x4f1828]
(RayWorker pid=281502) frame #14: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #15: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #16: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #17: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #18: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #19: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #20: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #21: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #22: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #23: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #24: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #25: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #26: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #27: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #28: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #29: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #30: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #31: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #32: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #33: ray::RayWorker.execute_method() [0x4f1811]
(RayWorker pid=281502) frame #34: <unknown function> + 0x644015 (0x7f503140e015 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #35: std::_Function_handler<ray::Status (ray::rpc::Address const&, ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::string const&, std::string const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::string*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string, bool, bool, bool), ray::Status (*)(ray::rpc::Address const&, ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::string, std::string, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::string*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string, bool, bool, bool)>::_M_invoke(std::_Any_data const&, ray::rpc::Address const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::string const&, std::string const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::string*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&, bool&&, bool&&, bool&&) + 0x157 (0x7f503134a547 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #36: ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*, std::string*) + 0xc1e (0x7f5031534e5e in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #37: std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*, std::string*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>, std::_Placeholder<6>, std::_Placeholder<7>, std::_Placeholder<8>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*, std::string*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&, std::string*&&) + 0x58 (0x7f50314697d8 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #38: <unknown function> + 0x793684 (0x7f503155d684 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #39: <unknown function> + 0x79498a (0x7f503155e98a in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #40: <unknown function> + 0x7ac04e (0x7f503157604e in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #41: ray::core::ActorSchedulingQueue::AcceptRequestOrRejectIfCanceled(ray::TaskID, ray::core::InboundRequest&) + 0x10c (0x7f503157735c in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #42: <unknown function> + 0x7b02cb (0x7f503157a2cb in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #43: ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) + 0x400 (0x7f503157bda0 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #44: ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) + 0x1216 (0x7f503155d016 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #45: <unknown function> + 0x735e25 (0x7f50314ffe25 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #46: <unknown function> + 0xa59886 (0x7f5031823886 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #47: <unknown function> + 0xa4b55e (0x7f503181555e in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #48: <unknown function> + 0xa4bab6 (0x7f5031815ab6 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #49: <unknown function> + 0x102fdbb (0x7f5031df9dbb in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #50: <unknown function> + 0x1031d99 (0x7f5031dfbd99 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #51: <unknown function> + 0x10324a2 (0x7f5031dfc4a2 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #52: ray::core::CoreWorker::RunTaskExecutionLoop() + 0x1c (0x7f50314fea8c in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #53: ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() + 0x8c (0x7f503154025c in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #54: ray::core::CoreWorkerProcess::RunTaskExecutionLoop() + 0x1d (0x7f503154040d in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #55: <unknown function> + 0x57b5d7 (0x7f50313455d7 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so)
(RayWorker pid=281502) frame #56: ray::RayWorker.execute_method() [0x4ecb84]
(RayWorker pid=281502) frame #57: _PyEval_EvalFrameDefault + 0x6b2 (0x4d87c2 in ray::RayWorker.execute_method)
(RayWorker pid=281502) frame #58: _PyFunction_Vectorcall + 0x106 (0x4e81a6 in ray::RayWorker.execute_method)
(RayWorker pid=281502) frame #59: _PyEval_EvalFrameDefault + 0x6b2 (0x4d87c2 in ray::RayWorker.execute_method)
(RayWorker pid=281502) frame #60: _PyEval_EvalCodeWithName + 0x2f1 (0x4d70d1 in ray::RayWorker.execute_method)
(RayWorker pid=281502) frame #61: PyEval_EvalCodeEx + 0x39 (0x585e29 in ray::RayWorker.execute_method)
(RayWorker pid=281502) frame #62: PyEval_EvalCode + 0x1b (0x585deb in ray::RayWorker.execute_method)
(RayWorker pid=281502) frame #63: ray::RayWorker.execute_method() [0x5a5bd1]
(RayWorker pid=281502) 
(RayWorker pid=281502) [E ProcessGroupNCCL.cpp:915] [Rank 1] NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
(RayWorker pid=281502) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(RayWorker pid=281502) For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
(RayWorker pid=281502) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(RayWorker pid=281502) 
(RayWorker pid=281502) Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
(RayWorker pid=281502) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f5026eea617 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f5026ea598d in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f5026fa59f8 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
(RayWorker pid=281502) frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7f44659ddaf0 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f44659e1918 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x24b (0x7f44659f815b in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x78 (0x7f44659f8468 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #7: <unknown function> + 0xdbbf4 (0x7f5030c81bf4 in /root/miniconda3/envs/vllm/bin/../lib/libstdc++.so.6)
(RayWorker pid=281502) frame #8: <unknown function> + 0x8609 (0x7f5032c5a609 in /lib/x86_64-linux-gnu/libpthread.so.0)
(RayWorker pid=281502) frame #9: clone + 0x43 (0x7f5032a25133 in /lib/x86_64-linux-gnu/libc.so.6)
(RayWorker pid=281502) 
(RayWorker pid=281502) [2023-11-20 09:55:58,356 E 281502 281738] logging.cc:97: Unhandled exception: St13runtime_error. what(): [Rank 1] NCCL watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
(RayWorker pid=281502) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(RayWorker pid=281502) For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
(RayWorker pid=281502) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(RayWorker pid=281502) 
(RayWorker pid=281502) Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
(RayWorker pid=281502) frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f5026eea617 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f5026ea598d in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10.so)
(RayWorker pid=281502) frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f5026fa59f8 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
(RayWorker pid=281502) frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x80 (0x7f44659ddaf0 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7f44659e1918 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #5: c10d::ProcessGroupNCCL::workCleanupLoop() + 0x24b (0x7f44659f815b in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x78 (0x7f44659f8468 in /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
(RayWorker pid=281502) frame #7: <unknown function> + 0xdbbf4 (0x7f5030c81bf4 in /root/miniconda3/envs/vllm/bin/../lib/libstdc++.so.6)
(RayWorker pid=281502) frame #8: <unknown function> + 0x8609 (0x7f5032c5a609 in /lib/x86_64-linux-gnu/libpthread.so.0)
(RayWorker pid=281502) frame #9: clone + 0x43 (0x7f5032a25133 in /lib/x86_64-linux-gnu/libc.so.6)
(RayWorker pid=281502) 
(RayWorker pid=281502) [2023-11-20 09:55:58,369 E 281502 281738] logging.cc:104: Stack trace: 
(RayWorker pid=281502)  /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so(+0xf2e81a) [0x7f5031cf881a] ray::operator<<()
(RayWorker pid=281502) /root/miniconda3/envs/vllm/lib/python3.8/site-packages/ray/_raylet.so(+0xf30fd8) [0x7f5031cfafd8] ray::TerminateHandler()
(RayWorker pid=281502) /root/miniconda3/envs/vllm/bin/../lib/libstdc++.so.6(+0xb135a) [0x7f5030c5735a] __cxxabiv1::__terminate()
(RayWorker pid=281502) /root/miniconda3/envs/vllm/bin/../lib/libstdc++.so.6(+0xb13c5) [0x7f5030c573c5]
(RayWorker pid=281502) /root/miniconda3/envs/vllm/bin/../lib/libstdc++.so.6(+0xb134f) [0x7f5030c5734f]
(RayWorker pid=281502) /root/miniconda3/envs/vllm/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so(+0xc86dc5) [0x7f4465763dc5] c10d::ProcessGroupNCCL::ncclCommWatchdog()
(RayWorker pid=281502) /root/miniconda3/envs/vllm/bin/../lib/libstdc++.so.6(+0xdbbf4) [0x7f5030c81bf4] execute_native_thread_routine
(RayWorker pid=281502) /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f5032c5a609] start_thread
(RayWorker pid=281502) /lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f5032a25133] __clone
(RayWorker pid=281502) 
(RayWorker pid=281502) *** SIGABRT received at time=1700474158 on cpu 37 ***
(RayWorker pid=281502) PC: @     0x7f503294900b  (unknown)  raise
(RayWorker pid=281502)     @     0x7f5032c66420       4048  (unknown)
(RayWorker pid=281502)     @     0x7f5030c5735a  (unknown)  __cxxabiv1::__terminate()
(RayWorker pid=281502)     @     0x7f5030c57070  (unknown)  (unknown)
(RayWorker pid=281502) [2023-11-20 09:55:58,370 E 281502 281738] logging.cc:361: *** SIGABRT received at time=1700474158 on cpu 37 ***
(RayWorker pid=281502) [2023-11-20 09:55:58,370 E 281502 281738] logging.cc:361: PC: @     0x7f503294900b  (unknown)  raise
(RayWorker pid=281502) [2023-11-20 09:55:58,370 E 281502 281738] logging.cc:361:     @     0x7f5032c66420       4048  (unknown)
(RayWorker pid=281502) [2023-11-20 09:55:58,370 E 281502 281738] logging.cc:361:     @     0x7f5030c5735a  (unknown)  __cxxabiv1::__terminate()
(RayWorker pid=281502) [2023-11-20 09:55:58,370 E 281502 281738] logging.cc:361:     @     0x7f5030c57070  (unknown)  (unknown)
(RayWorker pid=281502) Fatal Python error: Aborted
(RayWorker pid=281502) 
2023-11-20 09:55:59,311 WARNING worker.py:2058 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffcfd63a7a512721ad3a92ab3f01000000 Worker ID: 759bd1d9c4374681899225cd8a5c6cecd4f171ef93854ae32ad3913d Node ID: 6509b19bea0fda8e704988e22d2253a24f7c73b62f35af462915cc7a Worker IP address: 10.178.166.6 Worker port: 35669 Worker PID: 281501 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
ERROR:asyncio:Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7ff838639940>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7ff7b41c2820>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7ff838639940>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7ff7b41c2820>)>
Traceback (most recent call last):
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 28, in _raise_exception_on_finish
    task.result()
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 350, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 329, in engine_step
    request_outputs = await self.engine.step_async()
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 191, in step_async
    output = await self._run_workers_async(
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 219, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
  File "/root/miniconda3/envs/vllm/lib/python3.8/asyncio/tasks.py", line 695, in _wrap_awaitable
    return (yield from awaitable.__await__())
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
        class_name: RayWorker
        actor_id: cfd63a7a512721ad3a92ab3f01000000
        pid: 281501
        namespace: 10668c9a-16f3-4e63-88cf-73ed6147602d
        ip: 10.178.166.6
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 37, in _raise_exception_on_finish
    raise exc
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 32, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 28, in _raise_exception_on_finish
    task.result()
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 350, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 329, in engine_step
    request_outputs = await self.engine.step_async()
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 191, in step_async
    output = await self._run_workers_async(
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 219, in _run_workers_async
    all_outputs = await asyncio.gather(*coros)
  File "/root/miniconda3/envs/vllm/lib/python3.8/asyncio/tasks.py", line 695, in _wrap_awaitable
    return (yield from awaitable.__await__())
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
        class_name: RayWorker
        actor_id: cfd63a7a512721ad3a92ab3f01000000
        pid: 281501
        namespace: 10668c9a-16f3-4e63-88cf-73ed6147602d
        ip: 10.178.166.6
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/fastapi/applications.py", line 292, in __call__
    await super().__call__(scope, receive, send)
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/middleware/cors.py", line 83, in __call__
    await self.app(scope, receive, send)
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/fastapi/routing.py", line 273, in app
    raw_response = await run_endpoint_function(
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/fastapi/routing.py", line 190, in run_endpoint_function
    return await dependant.call(**values)
  File "api_server.py", line 523, in create_completion
    async for res in result_generator:
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 435, in generate
    raise e
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 429, in generate
    async for request_output in stream:
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 70, in __anext__
    raise result
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 37, in _raise_exception_on_finish
    raise exc
  File "/root/miniconda3/envs/vllm/lib/python3.8/site-packages/vllm/engine/async_llm_engine.py", line 32, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
2023-11-20 09:56:06,456 WARNING worker.py:2058 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff76970464e1639f53b2da3a3e01000000 Worker ID: c3af2451891da03970065e8a4a1faf9c7bcc1b6c181be98b922f6158 Node ID: 6509b19bea0fda8e704988e22d2253a24f7c73b62f35af462915cc7a Worker IP address: 10.178.166.6 Worker port: 39467 Worker PID: 281502 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.

Strangely, the inference process fails even on 8 GPUs, whereas the Hugging Face version of the model performs well on a 2-GPU setup.

The text was updated successfully, but these errors were encountered:

simon-mo · 2023-11-20T19:17:39Z

Does it fail consistently regardless of inputs or only on specific input?

It looks like some PyTorch and GPU memory access issue

(RayWorker pid=281502) [2023-11-20 09:55:58,328 E 281502 281502] logging.cc:97: Unhandled exception: N3c105ErrorE. what(): CUDA error: an illegal memory access was encountered
(RayWorker pid=281502) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(RayWorker pid=281502) For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
(RayWorker pid=281502) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(RayWorker pid=281502) 
(RayWorker pid=281502) Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):

junior-zsy · 2023-11-21T02:08:18Z

short text inputs without any problems, Long text can cause the above problem,I used 30000 tokens for long text because the model above supports 32k

simon-mo · 2023-11-21T06:05:27Z

What's your hardware configuration? I wonder whether this is an OOM issue in disguise...

junior-zsy · 2023-11-21T07:27:24Z

A100 and 3090 both have the same error, with cuda version 12.2, I found slight differences in model inference between THUDM/chatglm2-6b-32k and THUDM/chatglm2-6b, with RotaryEmbedding and kvcache The logic of cache is different. Currently, VLLM supports chatglm2-6b but does not support chatglm2-6b-32k
chatglm2-6b-32k: https://huggingface.co/THUDM/chatglm2-6b-32k/blob/main/modeling_chatglm.py#RotaryEmbedding
chatglm2-6b: https://huggingface.co/THUDM/chatglm2-6b/blob/main/modeling_chatglm.py#RotaryEmbedding

junior-zsy · 2023-11-28T08:43:08Z

这个我找到原因了，原来的逻辑有bug 这个需要修改改两个地方，一个是 rotary_embedding.py 文件里面的 _compute_inv_freq 函数加上 base = base * self.rope_ratio 或者直接写死 base = base * 50 ，因为官方还不支持 rope_ratio 参数传递，2是修改 GLMAttention 类 79 行，写成 self.attn = PagedAttentionWithRoPE(
self.num_heads,
self.head_dim,
self.scaling,
rotary_dim=self.head_dim // 2,
num_kv_heads=self.num_kv_heads,
is_neox_style=False,
max_position = config.seq_length
) 这样，最后加入 max_position = config.seq_length ，现在崩溃的原因是，到了 8192 个字符就崩溃，因为 seq_length 没传递进去，默认是 8192 ，我本地改了之后测试是跟 huggingface 版本去掉随机是一致的，可以推理 32000 token 的长文本

simon-mo · 2023-11-30T22:11:46Z

@junior-zsy can you submit a PR to address this so others won't run into the same issue. 🙇‍♂️

WoosukKwon · 2023-11-30T23:57:56Z

@simon-mo While I can't read Chinese, I guess I fixed the issue in #1841

simon-mo · 2023-12-01T00:35:53Z

Ah i think so.

junior-zsy · 2023-12-01T03:05:52Z

Yes, # 1841 has been resolved

junior-zsy mentioned this issue Nov 21, 2023

vllm加载ChatGLM2-6B-32K报错 #1723

Closed

This was referenced Nov 21, 2023

ChatGLM2 Support #1261

Merged

Run long conetxt error : CUDA error: an illegal memory access was encountered #1700

Closed

junior-zsy closed this as completed Nov 28, 2023

simon-mo reopened this Nov 30, 2023

simon-mo closed this as completed Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Error with 32k Long Text in chatglm2-6b-32k Model #1725

junior-zsy commented Nov 20, 2023 •

edited by simon-mo

Loading

simon-mo commented Nov 20, 2023

junior-zsy commented Nov 21, 2023

simon-mo commented Nov 21, 2023

junior-zsy commented Nov 21, 2023

junior-zsy commented Nov 28, 2023

simon-mo commented Nov 30, 2023

WoosukKwon commented Nov 30, 2023

simon-mo commented Dec 1, 2023

junior-zsy commented Dec 1, 2023

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Error with 32k Long Text in chatglm2-6b-32k Model #1725

Comments

junior-zsy commented Nov 20, 2023 • edited by simon-mo Loading

simon-mo commented Nov 20, 2023

junior-zsy commented Nov 21, 2023

simon-mo commented Nov 21, 2023

junior-zsy commented Nov 21, 2023

junior-zsy commented Nov 28, 2023

simon-mo commented Nov 30, 2023

WoosukKwon commented Nov 30, 2023

simon-mo commented Dec 1, 2023

junior-zsy commented Dec 1, 2023

junior-zsy commented Nov 20, 2023 •

edited by simon-mo

Loading