Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : become thread-safe #3960

Closed
ggerganov opened this issue Nov 5, 2023 · 14 comments
Closed

ggml : become thread-safe #3960

ggerganov opened this issue Nov 5, 2023 · 14 comments
Labels
refactoring Refactoring

Comments

@ggerganov
Copy link
Owner

ggerganov commented Nov 5, 2023

ref #499 (reply in thread)

We should be able to run inference on multiple graphs, backends and devices in parallel.
Currently, there are CUDA singletons that break this requirement and possibly there could be other problems.

@pseudotensor
Copy link

Any updates?

@pseudotensor
Copy link

FYI, I noticed with recent llama.cpp that while there are speed improvements, the thread safety has gotten worse. Now when I run in 2 threads a TTS model and a GGUF model, everything crashes in latest llama.cpp when did not used to.

I get:

CUDA error: an illegal memory access was encountered
  current device: 1, in function ggml_backend_cuda_buffer_cpy_tensor at /tmp/pip-install-r9_ew1og/llama-cpp-python_858d7004c05a49ac876ecf41f562f687/vendor/llama.cpp/ggml-cuda.cu:11578
  cudaDeviceSynchronize()
GGML_ASSERT: /tmp/pip-install-r9_ew1og/llama-cpp-python_858d7004c05a49ac876ecf41f562f687/vendor/llama.cpp/ggml-cuda.cu:255: !"CUDA error"
!!!! kernel execution error. (m: 3072, n: 77, k: 1024, error: 13) 

Worked perfectly with heavy use before.

@zsogitbe
Copy link

I am a bit confused. I thought that @slaren solved this problem with 'llama : add pipeline parallelism support (#6017)'? Or do you mean here something else?

@slaren said that when he was ready with 6017 he will fix the backend to release all CUDA memory, what is a big problem to many of us. I am not impatient just would like to understand what is happening.

@slaren
Copy link
Collaborator

slaren commented Mar 14, 2024

What I meant is that I will work on this after the pipeline parallelism is merged, which is what I am doing. It will still take a while to complete, as fixing this will require infrastructure changes in other parts of the code. Sorry for the confusion.

@zsogitbe
Copy link

I understand. Thank you that you care about this issue and that you will work on it! I have tried to solve it but I could not.

@DEVDEVIL007
Copy link

#5396

@slaren
Copy link
Collaborator

slaren commented Mar 20, 2024

#6170 should fix this issue in the CUDA backend.

@github-actions github-actions bot added the stale label Apr 20, 2024
Copy link
Contributor

github-actions bot commented May 5, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed May 5, 2024
@martindevans
Copy link
Contributor

Is this actually fixed?

@zsogitbe
Copy link

zsogitbe commented May 5, 2024

Some additional problems #6909 maybe because of this fix?

@slaren
Copy link
Collaborator

slaren commented May 8, 2024

@martindevans It should be fixed, please report any issues with thread safety. For example, using multiple llama contexts simultaneously each with a different CUDA GPU on different threads should now be possible. CPU and Metal also should be thread-safe, other backends probably not.

@martindevans
Copy link
Contributor

That's great to hear! I'll experiment with removing some of the locks we added into LLamaSharp and will report any bugs. Thanks.

@pseudotensor
Copy link

What about same GPU? Why isn't that thread safe too?

@slaren
Copy link
Collaborator

slaren commented May 8, 2024

It should also be thread-safe, but I don't expect that to be a very useful use case.

@ggerganov ggerganov moved this from Todo to Done in ggml : roadmap May 8, 2024
@ggerganov ggerganov removed the stale label Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Refactoring
Projects
Status: Done
Development

No branches or pull requests

6 participants