Parse embedding error, expected all numeric but string is prefixed with "embedding 0: #213

kbrisso · 2024-09-02T23:50:13Z

Describe the bug
When using llama-embeddings.exe the embedding returns a string from the command out that is formatted like this "embedding 0:" 0.011059 -0.014784 -0.018492 0.020268 -0.027386 0.022915 " This part "embedding 0" is causing the issue.

To Reproduce
Steps to reproduce the behavior:
llama-embedding.exe -m mxbai-embed-large-v1-f16.gguf --pooling mean -p "Madam Speaker Madam Vice President our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. Last year COVID-19 kept us apart. This year we are finally together again. Tonight we meet as Democrats Republicans and Independents. But most importantly as Americans. With a duty to one another to the American people to the Constitution. And with an u" >>log.text

Expected behavior
String with spaces with no alpha text, all text should numeric.

Screenshots
Posted issues in Discord.

Desktop (please complete the following information):

OS: Windows

Additional context
Add any other context about the problem here.

kbrisso · 2024-09-03T00:25:38Z

I have a local branch with a fix in it. I can create a pull if needed.

henomis · 2024-09-07T09:55:59Z

Hi @kbrisso, I'm not sure this issue is related to Lingoose, the project doesn't have a tool called llama-embeddings.exe

kbrisso · 2024-09-07T19:03:23Z

Hi @kbrisso, I'm not sure this issue is related to Lingoose, the project doesn't have a tool called llama-embeddings.exe

Here is the full code snippet. You call the llama.cpp embedder exe "llama-embedding.exe" with this method llamacppembedder.New().WithModel(......).WithLlamaCppPath(.......)

This is how I fixed it in your llamaccp.go file in your project.

henomis · 2024-09-10T15:51:47Z

Could you try without --embd-output-format and --embd-separator?

kbrisso · 2024-09-10T16:12:27Z

Could you try without --embd-output-format and --embd-separator?

I tried all the settings and none of them worked. I spent all weekend on this. Your code expects a perfect string that can be converted to a slice. If you review the llama.cpp code you will see it returns the string like below

"embedding 0:" 0.011059 -0.014784 -0.018492 0.020268 -0.027386 0.022915

henomis · 2024-09-10T16:51:24Z

Ok let's try the lingoose version v0.2.1-alpha.2 with the hotfix for that and remove --embd-output-format and --embd-separator args.

kbrisso · 2024-09-10T18:49:02Z

Ok let's try the lingoose version v0.2.1-alpha.2 with the hotfix for that and remove --embd-output-format and --embd-separator args.

Works great! Thanks!

henomis · 2024-11-02T17:24:00Z

fixed here: #200

henomis mentioned this issue Sep 10, 2024

Hotfix llamacppembedder output parser #214

Closed

henomis closed this as completed Nov 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse embedding error, expected all numeric but string is prefixed with "embedding 0: #213

Parse embedding error, expected all numeric but string is prefixed with "embedding 0: #213

kbrisso commented Sep 2, 2024

kbrisso commented Sep 3, 2024

henomis commented Sep 7, 2024

kbrisso commented Sep 7, 2024

henomis commented Sep 10, 2024

kbrisso commented Sep 10, 2024

henomis commented Sep 10, 2024

kbrisso commented Sep 10, 2024

henomis commented Nov 2, 2024

Parse embedding error, expected all numeric but string is prefixed with "embedding 0: #213

Parse embedding error, expected all numeric but string is prefixed with "embedding 0: #213

Comments

kbrisso commented Sep 2, 2024

kbrisso commented Sep 3, 2024

henomis commented Sep 7, 2024

kbrisso commented Sep 7, 2024

henomis commented Sep 10, 2024

kbrisso commented Sep 10, 2024

henomis commented Sep 10, 2024

kbrisso commented Sep 10, 2024

henomis commented Nov 2, 2024