-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda : refactor to remove global resources #6170
Conversation
0c714f2
to
bffb49c
Compare
There are now |
bffb49c
to
9c72e1d
Compare
Is there a way to run the server tests with |
The CUDA backend no longer uses |
If you rebase on cd examples/server/tests
N_GPU_LAYERS=99 LLAMA_SERVER_BIN_PATH=../../../build/bin/server ./tests.sh |
Thanks. Unfortunately I am not able to complete all the server tests on my system because intermittent connection failures to huggingface cause the tests to abort. However, the tests that ran passed.
|
The server tests pass on V100 |
* cuda : refactor to remove global resources
* cuda : refactor to remove global resources
* cuda : refactor to remove global resources
Pools and other resources are tied to the
ggml_backend
instance and are freed along with it.It should also be thread-safe.