Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A server example for whisper.cpp, based on the LLaMa.cpp server example.
This is still a work-in-progress, but the very basics work.
Why would you want to run whisper as a server? If you need to process a short clip often, for example on a voice-controlled application, the model you're using is loaded into RAM every time
./main
is invoked. For very short voice snippets of about 3-5 seconds, about 50% of the time is spent loading the base-en model every time.With this you can simply send a http request to this web-server and your wav file gets converted immediately.
Progress
/convert
endpoint, send a .wav filename and get transcription + metadata back as JSONNote that it's been a while since I used C++, I made most of this happen with frankenstein copy-pasting. If I'm doing something stupid please let me know.