Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server exits abnormally on API call after long period of inactivity #1991

Closed
QuantiusBenignus opened this issue Mar 24, 2024 · 1 comment
Closed

Comments

@QuantiusBenignus
Copy link

Have been using whisper.cpp server on my local machine in lieu of the "main" program, to serve Blurt and BlahST because of the performance advantage (~90 x real time, as described here ).

I load the server on machine startup and keep it running, but many times now, after a long period of inactivity (say an hour), when I send an API request, the server will blow up and exit abnormally. ps call shows it as zombie i.e. defunct and the logs say:

image
... ending with the curl API call being unsuccessful: "curl: (52) Empty reply from server"
(Timeline is bottom to top, I do not understand why it is repeating the request receipt and inference steps, a successful call doesn't look like that)

Anecdotally, "unspecified launch failure" is most often a segfault but could it be thread synchronization issue in this case?
Or is it indeed some sort of out-of-bounds memory access on memory that has been released on a timeout?
GPU is RTX3060, 12GB

I can probably test a server run without cuBLAS support to see if the issue persists.

@QuantiusBenignus
Copy link
Author

I investigated this a bit and I think I found the root of the problem.
One significant event during that "long period" mentioned in the title is that my system goes into SUSPEND.
That seems to do two things, since the problem happens with or without the GPU acceleration being enabled.

  1. CUDA is disabled after RESUME since one of the Nvidia modules does not handle suspend well (nvidia-uvm).
    Then, running AI models with CUDA support (e.g. whisper.cpp, stable-difussion etc.) results in "no CUDA devices found"
  2. The whisper.cpp example server also breaks down as described above, irrespective of whether --no-gpu is used or not.

A solution is described here for those who want to preserve VRAM allocations beyond a SUSPEND/RESUME cycle.

I will probably write a pre-suspend script to send a SIGINT to the server on systemd-sleep and then start it anew on RESUME. A greener solution, with the only penalty being a slower first transcription.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant