Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipex-llm Llama.cpp port inside ipex-llm Docker containers getting SIGBUS #10955

Closed
simonlui opened this issue May 7, 2024 · 4 comments
Closed
Assignees

Comments

@simonlui
Copy link

simonlui commented May 7, 2024

This might be compute-runtime or kernel related but I am posting here first since I don't know. For getting the simplest reproduction, I pulled the Docker image from intelanalytics/ipex-llm-xpu:cpp-test which was recently published to the public but I had been using another Docker container for trying to run the Llama.cpp fork included inside the bigdl-core-cpp pip package and had the same error show up. I used the same command as in the Quickstart guide and I got this as output.

...
llm_load_tensors: ggml ctx size =    0.30 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:      SYCL0 buffer size =  7605.33 MiB
llm_load_tensors:  SYCL_Host buffer size =   532.31 MiB
.
Thread 1 "main" received signal SIGBUS, Bus error.
...

GDB stacktrace shows the following.

Thread 1 "main" received signal SIGBUS, Bus error.
__memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:708
708	../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
(gdb) bt
#0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:708
#1  0x00007f72618ef82b in ?? () from /lib/x86_64-linux-gnu/libze_intel_gpu.so.1
#2  0x00007f72618fe3f8 in ?? () from /lib/x86_64-linux-gnu/libze_intel_gpu.so.1
#3  0x00007f7261824f14 in ?? () from /lib/x86_64-linux-gnu/libze_intel_gpu.so.1
#4  0x00007f727ef25bc4 in enqueueMemCopyHelper(ur_command_t, ur_queue_handle_t_*, void*, unsigned char, unsigned long, void const*, unsigned int, ur_event_handle_t_* const*, ur_event_handle_t_**, bool) ()
   from /opt/intel/oneapi/compiler/2024.0/lib/libpi_level_zero.so
#5  0x00007f727ef2bf48 in urEnqueueUSMMemcpy () from /opt/intel/oneapi/compiler/2024.0/lib/libpi_level_zero.so
#6  0x00007f727ef4ed9b in piextUSMEnqueueMemcpy () from /opt/intel/oneapi/compiler/2024.0/lib/libpi_level_zero.so
#7  0x00007f727fcb452f in _pi_result sycl::_V1::detail::plugin::call_nocheck<(sycl::_V1::detail::PiApiKind)97, _pi_queue*, unsigned int, void*, void const*, unsigned long, unsigned long, _pi_event**, _pi_event**>(_pi_queue*, unsigned int, void*, void const*, unsigned long, unsigned long, _pi_event**, _pi_event**) const () from /opt/intel/oneapi/compiler/2024.0/lib/libsycl.so.7
#8  0x00007f727fca854f in sycl::_V1::detail::MemoryManager::copy_usm(void const*, std::shared_ptr<sycl::_V1::detail::queue_impl>, unsigned long, void*, std::vector<_pi_event*, std::allocator<_pi_event*> >, _pi_event**, std::shared_ptr<sycl::_V1::detail::event_impl> const&) () from /opt/intel/oneapi/compiler/2024.0/lib/libsycl.so.7
#9  0x00007f727fcfc8f2 in sycl::_V1::detail::queue_impl::memcpy(std::shared_ptr<sycl::_V1::detail::queue_impl> const&, void*, void const*, unsigned long, std::vector<sycl::_V1::event, std::allocator<sycl::_V1::event> > const&) () from /opt/intel/oneapi/compiler/2024.0/lib/libsycl.so.7
#10 0x00007f727fda5146 in sycl::_V1::queue::memcpy(void*, void const*, unsigned long, sycl::_V1::detail::code_location const&) () from /opt/intel/oneapi/compiler/2024.0/lib/libsycl.so.7
#11 0x00000000006b334e in ggml_backend_sycl_buffer_set_tensor(ggml_backend_buffer*, ggml_tensor*, void const*, unsigned long, unsigned long) ()
#12 0x00000000005017ba in llm_load_tensors(llama_model_loader&, llama_model&, int, llama_split_mode, int, float const*, bool, bool (*)(float, void*), void*) ()
#13 0x00000000004ac199 in llama_model_load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, llama_model&, llama_model_params&) ()
#14 0x00000000004a8d91 in llama_load_model_from_file ()
#15 0x0000000000440456 in llama_init_from_gpt_params(gpt_params&) ()
#16 0x000000000042a8ce in main ()

If I run with SYCL_PI_TRACE=-1, I see this as the last snippet before the SIGBUS

...
---> piextUSMEnqueueMemcpy(
	<unknown> : 0x8e294c0
	<unknown> : 0
	<unknown> : 0xffffd556cbe54000
	<unknown> : 0x8e91270
	<unknown> : 16384
	<unknown> : 0
	pi_event * : 0[ nullptr ]
	pi_event * : 0x8e8ede8[ 0 ... ]
UR ---> TmpWaitList.createAndRetainUrZeEventList( NumEventsInWaitList, EventWaitList, Queue, UseCopyEngine)
UR <--- TmpWaitList.createAndRetainUrZeEventList( NumEventsInWaitList, EventWaitList, Queue, UseCopyEngine)(UR_RESULT_SUCCESS)
UR ---> Queue->Context->getAvailableCommandList(Queue, CommandList, UseCopyEngine, OkToBatch)
UR ---> Queue->insertStartBarrierIfDiscardEventsMode(CommandList)
UR <--- Queue->insertStartBarrierIfDiscardEventsMode(CommandList)(UR_RESULT_SUCCESS)
UR <--- Queue->Context->getAvailableCommandList(Queue, CommandList, UseCopyEngine, OkToBatch)(UR_RESULT_SUCCESS)
UR ---> createEventAndAssociateQueue(Queue, Event, CommandType, CommandList, IsInternal)
UR ---> EventCreate(Queue->Context, Queue, HostVisible.value(), Event)
UR <--- EventCreate(Queue->Context, Queue, HostVisible.value(), Event)(UR_RESULT_SUCCESS)
UR ---> urEventRetain(*Event)
UR <--- urEventRetain(*Event)(UR_RESULT_SUCCESS)
UR <--- createEventAndAssociateQueue(Queue, Event, CommandType, CommandList, IsInternal)(UR_RESULT_SUCCESS)

I am using Linux kernel 6.8.8 and I was under the impression that any kernel or compute-runtime issues had been fixed with regards to seeing sycl-ls output the GPU correctly and not seeing the kernel hang for a workload. Hope this is enough information to track the issue but I can provide full logs upon request.

@hzjane
Copy link
Contributor

hzjane commented May 8, 2024

This image is still under internal testing,we will update you with the latest image after development is completed.

@simonlui
Copy link
Author

simonlui commented May 8, 2024

I understand that, but I am getting the same problem regardless of if I use this image or if I use my custom Docker container running the llama.cpp fork inside bigdl-core-cpp. Is bigdl-core-cpp or at least the preproduction version not usable given it is still going by the bigdl name when the project has changed its name to ipex-llm?

@hzjane
Copy link
Contributor

hzjane commented May 8, 2024

I understand that, but I am getting the same problem regardless of if I use this image or if I use my custom Docker container running the llama.cpp fork inside bigdl-core-cpp. Is bigdl-core-cpp or at least the preproduction version not usable given it is still going by the bigdl name when the project has changed its name to ipex-llm?

Maybe this issue is caused by a higher version of linux kenel. We have validated the kenel version of 5.19.0-41-generic and 6.2.0 but not 6.8.8.

@simonlui
Copy link
Author

simonlui commented May 8, 2024

I found what the problem was. I checked a few other issues, and one of the troubleshooting steps was to run the utility scripts in ipex-llm. I ran that and then found out that in my lscpi output, the addressable memory was limited to 256MB. That meant ReBAR was disabled on my system. I wondered why but it turns out I had forgotten to disable CSM after enabling it the other day when I was troubleshooting something unrelated the other day. Turning on ReBAR fixed the SIGBUS issue and allowed the llama.cpp fork to proceed as normal after a prolonged warmup. Not sure if there is a way to modify the utility script to detect if ReBAR is enabled or not in the script, but that may help worth diagnosing issues like this.
There was an unrelated issue with the application getting stuck on one core 100% with the compute runtime on kernel 6.8.5 or higher but I reverted back to kernel 6.8.4 for now until upstream figures out the issue and how to mitigate it without loss of performance and everything now works fine with the fork. Thanks!

@simonlui simonlui closed this as completed May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants