-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama.cpp server / embeddings broken #156
Comments
@skadefro thanks for letting me know. I'll take a look. I think llama.cpp has also added grammar and image support in their API, so that should be fun to explore. |
Ohh, that would be a dream come true ... |
@skadefro I looked into whisper.cpp a little bit. Do you know if there are any projects that let you spin up whisper.cpp as a server? |
For someone who just want a web interface, this have worked fine for me Whisper-WebUI but i have not found any good api's for hosting it ( there is a few gist's laying around ) and I have seen at least 2 attempts on whisper.cpp's issue tracker and it looks like this one is coming out soon, so that looks promising. I had this one my my to-look-into list, but havent had time yet, but if the issue above turns into the "official" api, i would probably bet on that. |
@skadefro I just tried out the latest If you encounter the error with the llama.cpp server, could you provide more details about your setup? Re ollama: it's been on my list for a while, just need to get around and add it. It seems simpler to use compared to llama.cpp and could be a good alternative. |
Ah, sorry, forgot to mention what endpoint. It is the /embedding endpoint
and testing with this
This fails if the server is built from the latest version in master, but works if i checkout a commit that is around 2 weeks old This still works just fine, on latest version
|
@skadefro just shipped |
Wow, that was fast ... Tested and it works 😍 Thank you very much. |
This seems to be related to parallelization. Several calls are made and one is rejected by Llama.cpp because there are no free slots. Instead, an error
|
Add flag for parallelizable embedding model calls. #156
hey
I had an older git clone of llama cpp and your integration with the llamacpp server was working perfectly.
I cloned the latest version on to a new server but keept getting
'Invalid JSON response
error.After testing different git commit's of llama.cpp i found that cb33f43a2a9f5a5a5f8d290dd97c625d9ba97a2f was one of the last ones to still work. ( so one of those around that )
I know they have an issue open about implementing a new api, but to me it looks like they have not merged that yet so i hope it's a simple fix, to get this module to handle what ever they changed within the last two weeks ( just "nice to have" since they keep tweaking things and improving it, so would be nice, to be able to use latest version )
For anyone else having issues with that, you can go back to that version with
The text was updated successfully, but these errors were encountered: