Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse embedding error, expected all numeric but string is prefixed with "embedding 0: #213

Closed
kbrisso opened this issue Sep 2, 2024 · 8 comments

Comments

@kbrisso
Copy link

kbrisso commented Sep 2, 2024

Describe the bug
When using llama-embeddings.exe the embedding returns a string from the command out that is formatted like this "embedding 0:" 0.011059 -0.014784 -0.018492 0.020268 -0.027386 0.022915 " This part "embedding 0" is causing the issue.

To Reproduce
Steps to reproduce the behavior:
llama-embedding.exe -m mxbai-embed-large-v1-f16.gguf --pooling mean -p "Madam Speaker Madam Vice President our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. Last year COVID-19 kept us apart. This year we are finally together again. Tonight we meet as Democrats Republicans and Independents. But most importantly as Americans. With a duty to one another to the American people to the Constitution. And with an u" >>log.text

Expected behavior
String with spaces with no alpha text, all text should numeric.

Screenshots
Posted issues in Discord.

Desktop (please complete the following information):

  • OS: Windows

Additional context
Add any other context about the problem here.

@kbrisso
Copy link
Author

kbrisso commented Sep 3, 2024

I have a local branch with a fix in it. I can create a pull if needed.

@henomis
Copy link
Owner

henomis commented Sep 7, 2024

Hi @kbrisso, I'm not sure this issue is related to Lingoose, the project doesn't have a tool called llama-embeddings.exe

@kbrisso
Copy link
Author

kbrisso commented Sep 7, 2024

Hi @kbrisso, I'm not sure this issue is related to Lingoose, the project doesn't have a tool called llama-embeddings.exe

Here is the full code snippet. You call the llama.cpp embedder exe "llama-embedding.exe" with this method llamacppembedder.New().WithModel(......).WithLlamaCppPath(.......)

Screenshot 2024-09-07 115142

This is how I fixed it in your llamaccp.go file in your project.

Screenshot 2024-09-07 115716

@henomis
Copy link
Owner

henomis commented Sep 10, 2024

Could you try without --embd-output-format and --embd-separator?

@kbrisso
Copy link
Author

kbrisso commented Sep 10, 2024

Could you try without --embd-output-format and --embd-separator?

I tried all the settings and none of them worked. I spent all weekend on this. Your code expects a perfect string that can be converted to a slice. If you review the llama.cpp code you will see it returns the string like below

"embedding 0:" 0.011059 -0.014784 -0.018492 0.020268 -0.027386 0.022915

@henomis
Copy link
Owner

henomis commented Sep 10, 2024

Ok let's try the lingoose version v0.2.1-alpha.2 with the hotfix for that and remove --embd-output-format and --embd-separator args.

@kbrisso
Copy link
Author

kbrisso commented Sep 10, 2024

Ok let's try the lingoose version v0.2.1-alpha.2 with the hotfix for that and remove --embd-output-format and --embd-separator args.

Works great! Thanks!

@henomis
Copy link
Owner

henomis commented Nov 2, 2024

fixed here: #200

@henomis henomis closed this as completed Nov 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@henomis @kbrisso and others