You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have been using whisper.cpp server on my local machine in lieu of the "main" program, to serve Blurt and BlahST because of the performance advantage (~90 x real time, as described here ).
I load the server on machine startup and keep it running, but many times now, after a long period of inactivity (say an hour), when I send an API request, the server will blow up and exit abnormally. ps call shows it as zombie i.e. defunct and the logs say:
... ending with the curl API call being unsuccessful: "curl: (52) Empty reply from server"
(Timeline is bottom to top, I do not understand why it is repeating the request receipt and inference steps, a successful call doesn't look like that)
Anecdotally, "unspecified launch failure" is most often a segfault but could it be thread synchronization issue in this case?
Or is it indeed some sort of out-of-bounds memory access on memory that has been released on a timeout?
GPU is RTX3060, 12GB
I can probably test a server run without cuBLAS support to see if the issue persists.
The text was updated successfully, but these errors were encountered:
I investigated this a bit and I think I found the root of the problem.
One significant event during that "long period" mentioned in the title is that my system goes into SUSPEND.
That seems to do two things, since the problem happens with or without the GPU acceleration being enabled.
CUDA is disabled after RESUME since one of the Nvidia modules does not handle suspend well (nvidia-uvm).
Then, running AI models with CUDA support (e.g. whisper.cpp, stable-difussion etc.) results in "no CUDA devices found"
The whisper.cpp example server also breaks down as described above, irrespective of whether --no-gpu is used or not.
A solution is described here for those who want to preserve VRAM allocations beyond a SUSPEND/RESUME cycle.
I will probably write a pre-suspend script to send a SIGINT to the server on systemd-sleep and then start it anew on RESUME. A greener solution, with the only penalty being a slower first transcription.
Have been using whisper.cpp server on my local machine in lieu of the "main" program, to serve Blurt and BlahST because of the performance advantage (~90 x real time, as described here ).
I load the server on machine startup and keep it running, but many times now, after a long period of inactivity (say an hour), when I send an API request, the server will blow up and exit abnormally.
ps
call shows it aszombie i.e. defunct
and the logs say:... ending with the curl API call being unsuccessful: "curl: (52) Empty reply from server"
(Timeline is bottom to top, I do not understand why it is repeating the request receipt and inference steps, a successful call doesn't look like that)
Anecdotally, "unspecified launch failure" is most often a segfault but could it be thread synchronization issue in this case?
Or is it indeed some sort of out-of-bounds memory access on memory that has been released on a timeout?
GPU is RTX3060, 12GB
I can probably test a server run without cuBLAS support to see if the issue persists.
The text was updated successfully, but these errors were encountered: