Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama.cpp server / embeddings broken #156

Closed
skadefro opened this issue Nov 3, 2023 · 10 comments
Closed

llama.cpp server / embeddings broken #156

skadefro opened this issue Nov 3, 2023 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@skadefro
Copy link
Contributor

skadefro commented Nov 3, 2023

hey
I had an older git clone of llama cpp and your integration with the llamacpp server was working perfectly.
I cloned the latest version on to a new server but keept getting 'Invalid JSON response error.

RetryError: Failed after 1 attempt(s) with non-retryable error: 'Invalid JSON response'
    at _retryWithExponentialBackoff (/mnt/data/vscode/config/workspace/ai/jsagent/node_modules/modelfusion/core/api/retryWithExponentialBackoff.cjs:42:15)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async LlamaCppTextEmbeddingModel.doEmbedValues (/mnt/data/vscode/config/workspace/ai/jsagent/node_modules/modelfusion/model-provider/llamacpp/LlamaCppTextEmbeddingModel.cjs:73:26)
    at async Promise.all (index 1)
    at async generateResponse (/mnt/data/vscode/config/workspace/ai/jsagent/node_modules/modelfusion/model-function/embed/embed.cjs:44:31)
    at async runSafe (/mnt/data/vscode/config/workspace/ai/jsagent/node_modules/modelfusion/util/runSafe.cjs:6:35)
    at async executeStandardCall (/mnt/data/vscode/config/workspace/ai/jsagent/node_modules/modelfusion/model-function/executeStandardCall.cjs:45:20) {
  errors: [
    ApiCallError: Invalid JSON response
        at /mnt/data/vscode/config/workspace/ai/jsagent/node_modules/modelfusion/core/api/postToApi.cjs:8:15
        at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
        ... 6 lines matching cause stack trace ...
        at async executeStandardCall (/mnt/data/vscode/config/workspace/ai/jsagent/node_modules/modelfusion/model-function/executeStandardCall.cjs:45:20) {
      url: 'http://10.0.0.100:8080/embedding',
      requestBodyValues: [Object],
      statusCode: 200,
      cause: [ZodError],
      isRetryable: false
    }
  ],
  reason: 'errorNotRetryable'
}

After testing different git commit's of llama.cpp i found that cb33f43a2a9f5a5a5f8d290dd97c625d9ba97a2f was one of the last ones to still work. ( so one of those around that )
I know they have an issue open about implementing a new api, but to me it looks like they have not merged that yet so i hope it's a simple fix, to get this module to handle what ever they changed within the last two weeks ( just "nice to have" since they keep tweaking things and improving it, so would be nice, to be able to use latest version )
For anyone else having issues with that, you can go back to that version with

git checkout cb33f43a2a9f5a5a5f8d290dd97c625d9ba97a2f
@lgrammel
Copy link
Collaborator

lgrammel commented Nov 3, 2023

@skadefro thanks for letting me know. I'll take a look. I think llama.cpp has also added grammar and image support in their API, so that should be fun to explore.

@lgrammel lgrammel self-assigned this Nov 3, 2023
@lgrammel lgrammel added the bug Something isn't working label Nov 3, 2023
@skadefro
Copy link
Contributor Author

skadefro commented Nov 3, 2023

Ohh, that would be a dream come true ...
If we had an easy to use framework that would support both openai and llama.cpp and whisper.cpp for both chat/embeddings/images and voice.

@lgrammel
Copy link
Collaborator

lgrammel commented Nov 3, 2023

@skadefro I looked into whisper.cpp a little bit. Do you know if there are any projects that let you spin up whisper.cpp as a server?

@skadefro
Copy link
Contributor Author

skadefro commented Nov 3, 2023

For someone who just want a web interface, this have worked fine for me Whisper-WebUI but i have not found any good api's for hosting it ( there is a few gist's laying around ) and I have seen at least 2 attempts on whisper.cpp's issue tracker and it looks like this one is coming out soon, so that looks promising. I had this one my my to-look-into list, but havent had time yet, but if the issue above turns into the "official" api, i would probably bet on that.
Speaking of new things; I just came across ollama .. for a die hard, docker/kubernetes fan like me, this looks very promising.

@lgrammel
Copy link
Collaborator

lgrammel commented Nov 4, 2023

@skadefro I just tried out the latest llama.cpp (commit hash d9b33fe95bd257b36c84ee5769cc048230067d6f) with ModelFusion and it works for me. Have you started the llama.cpp server, e.g. like this? ./server -m models/llama-2-7b-chat.GGUF.q4_0.bin -c 4096 (you need to have the model available)

If you encounter the error with the llama.cpp server, could you provide more details about your setup?

Re ollama: it's been on my list for a while, just need to get around and add it. It seems simpler to use compared to llama.cpp and could be a good alternative.

@skadefro
Copy link
Contributor Author

skadefro commented Nov 4, 2023

Ah, sorry, forgot to mention what endpoint. It is the /embedding endpoint
I'm running with this.

./server -m models/llama-7b/llama-2-7b-chat.Q4_0.gguf -c 2048 --host 10.0.0.161

and testing with this

        const Llamaapi = new LlamaCppApiConfiguration({
            baseUrl: "http://10.0.0.161:8080"
        
        });
        const embeddings = await embedMany(new LlamaCppTextEmbeddingModel({
            api: Llamaapi}), [
            "At first, Nox didn't know what to do with the pup.",
            "He keenly observed and absorbed everything around him, from the birds in the sky to the trees in the forest.",
        ]);
        console.log(embeddings);

This fails if the server is built from the latest version in master, but works if i checkout a commit that is around 2 weeks old

This still works just fine, on latest version

        const text = await generateText(
            new LlamaCppTextGenerationModel({ api: Llamaapi
            }),
            "Write a short story about a robot learning to love:\n\n"
        );
        console.log(text);

@lgrammel lgrammel changed the title [info] llama.cpp has made an breaking updated to the api within the last two weeks [info] llama.cpp server / embeddings has made an breaking updated to the api within the last two weeks Nov 4, 2023
@lgrammel
Copy link
Collaborator

lgrammel commented Nov 4, 2023

@skadefro just shipped v0.55.0 with Ollama text generation & streaming support: https://github.com/lgrammel/modelfusion/releases/tag/v0.55.0

@skadefro
Copy link
Contributor Author

skadefro commented Nov 4, 2023

Wow, that was fast ... Tested and it works 😍 Thank you very much.

@lgrammel lgrammel changed the title [info] llama.cpp server / embeddings has made an breaking updated to the api within the last two weeks llama.cpp server / embeddings broken Nov 4, 2023
@lgrammel
Copy link
Collaborator

lgrammel commented Nov 4, 2023

This seems to be related to parallelization. Several calls are made and one is rejected by Llama.cpp because there are no free slots. Instead, an error {"content":"slot unavailable"} is returned.

❯ npx ts-node src/model-provider/llamacpp/llamacpp-embed-many-example.ts
{"content":"slot unavailable"}
RetryError: Failed after 1 attempt(s) with non-retryable error: 'Failed to process successful response'
    at _retryWithExponentialBackoff (/Users/lgrammel/repositories/modelfusion/dist/core/api/retryWithExponentialBackoff.cjs:42:15)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async LlamaCppTextEmbeddingModel.doEmbedValues (/Users/lgrammel/repositories/modelfusion/dist/model-provider/llamacpp/LlamaCppTextEmbeddingModel.cjs:73:26)
    at async Promise.all (index 1)
    at async generateResponse (/Users/lgrammel/repositories/modelfusion/dist/model-function/embed/embed.cjs:44:31)
    at async runSafe (/Users/lgrammel/repositories/modelfusion/dist/util/runSafe.cjs:6:35)
    at async executeStandardCall (/Users/lgrammel/repositories/modelfusion/dist/model-function/executeStandardCall.cjs:45:20)
    at async main (/Users/lgrammel/repositories/modelfusion/examples/basic/src/model-provider/llamacpp/llamacpp-embed-many-example.ts:4:22) {
  errors: [
    ApiCallError: Failed to process successful response
        at postToApi (/Users/lgrammel/repositories/modelfusion/dist/core/api/postToApi.cjs:94:19)
        at processTicksAndRejections (node:internal/process/task_queues:95:5)
        at async _retryWithExponentialBackoff (/Users/lgrammel/repositories/modelfusion/dist/core/api/retryWithExponentialBackoff.cjs:18:16)
        ... 3 lines matching cause stack trace ...
        at async runSafe (/Users/lgrammel/repositories/modelfusion/dist/util/runSafe.cjs:6:35)
        at async executeStandardCall (/Users/lgrammel/repositories/modelfusion/dist/model-function/executeStandardCall.cjs:45:20)
        at async main (/Users/lgrammel/repositories/modelfusion/examples/basic/src/model-provider/llamacpp/llamacpp-embed-many-example.ts:4:22) {
      url: 'http://127.0.0.1:8080/embedding',
      requestBodyValues: [Object],
      statusCode: 200,
      cause: TypeError: Body is unusable
          at specConsumeBody (node:internal/deps/undici/undici:4712:15)
          at _Response.json (node:internal/deps/undici/undici:4614:18)
          at /Users/lgrammel/repositories/modelfusion/dist/core/api/postToApi.cjs:7:66
          at processTicksAndRejections (node:internal/process/task_queues:95:5)
          at async postToApi (/Users/lgrammel/repositories/modelfusion/dist/core/api/postToApi.cjs:82:20)
          at async _retryWithExponentialBackoff (/Users/lgrammel/repositories/modelfusion/dist/core/api/retryWithExponentialBackoff.cjs:18:16)
          at async LlamaCppTextEmbeddingModel.doEmbedValues (/Users/lgrammel/repositories/modelfusion/dist/model-provider/llamacpp/LlamaCppTextEmbeddingModel.cjs:73:26)
          at async Promise.all (index 1)
          at async generateResponse (/Users/lgrammel/repositories/modelfusion/dist/model-function/embed/embed.cjs:44:31)
          at async runSafe (/Users/lgrammel/repositories/modelfusion/dist/util/runSafe.cjs:6:35),
      isRetryable: false
    }
  ],
  reason: 'errorNotRetryable'
}
{"embedding":[0.05569692328572273,-0.020548203960061073,0.27377715706825256,0.4976423382759094,0.16579614579677582,0.04679970443248749,0.19974836707115173,0.2295011579990387,-0.15478861331939697,0.3044094145298004,0.024075830355286598,-0.04952937737107277,0.1346544623374939,0.15864624083042145,-0.15292425453662872,-0.04481641948223114,0.07410169392824173,0.16139250993728638,0.013992399908602238,0.0525520034134388,0.17047853767871857,0.14821892976760864,-0.196890190243721,-0.34336787462234497,-0.03041764535009861,0.09776932001113892,0.2469785362482071,0.15258672833442688,-0.14246588945388794,0.03391014412045479,-0.20064757764339447,0.18357722461223602,-0.03650486096739769,-0.09382735937833786,-0.07598888128995895,-0.03402281180024147,-0.047186095267534256,-0.0483274981379509,-0.14382801949977875,0.17244981229305267,0.055998265743255615,-0.0007336181006394327,

lgrammel added a commit that referenced this issue Nov 4, 2023
Add flag for parallelizable embedding model calls. #156
@lgrammel
Copy link
Collaborator

lgrammel commented Nov 4, 2023

@lgrammel lgrammel closed this as completed Nov 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants