You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apologies for not figuring out a better title. I've been banging my head against the code for the better part of a day trying to figure out what's going on.
langchain chromadb is unable to retrieve relevant chunks using the openai embeddings api.
This is after applying the proposed pull request from: Pulll Request 4147
Is there an existing issue for this?
I have searched the existing issues
Reproduction
I'm attaching a simple python script to demonstrate, and a profanity-free version of chapter 2 of a free online book.
Commenting out the relevant openai_api_base elements will route to either openai or your local embedding api. chapter2.txt
When you encode and bounce the search off of the official openai api, you get the following types of retrievals.
These retrievals are Caroline-related, as you'd expect from a simple vectordb search.
[
Document(page_content='the myriad threads of causality to find out which of the billions of chemicals, which errant cell, was responsible for this person\'s physiological collapse? One thing Prime Intellect knew: It had to figure it out.\nIt could not, through inaction, allow Caroline to die.\n"She\'s still in trouble. Look at her pupils."\n"It\'s the morphine."\nEveryone looked at the older nurse, whose name was Jill. "The chart must be wrong," she said. "I gave her what it said."\n"She has a tolerance," AnneMarie said, and', metadata={'source': 'chapter2.txt'}),
Document(page_content='The drops of residual solution within them were remarkably pure, and Prime Intellect easily singled out the large organic molecule they carried. Then it created an automatic process to scan Caroline\'s body molecule by molecule, eliminating each and every molecule of morphine that it found. This took three minutes, and created a faintly visible blue glow.\n\nThis was the human onlookers\' first clue, other than Caroline\'s miraculously restarted heart, as to what was happening.\n"What in tarnation!,"', metadata={'source': 'chapter2.txt'}),
Document(page_content="the pain, which had subsided for real for the first time in years, returned. Caroline moaned. But Prime Intellect didn't know about that part of it, not yet.\nThere was still a whole constellation of stuff wrong with Caroline Hubert's body, and emboldened by its success it set about correcting what it could. It found long chain molecules, which it would later learn were called collagens, cross-linked. It un-cross-linked them. It found damaged DNA, which it fixed. It found whole masses of cells", metadata={'source': 'chapter2.txt'}),
Document(page_content='she hoped that the impossibility would go away if she challenged it.\n"I need a drink," said the doctor who had come with the machine to re-start Caroline\'s heart.\nPrime Intellect stopped working. There were still huge differences between Caroline and the others. Prime Intellect did not yet realize the differences were due to Caroline\'s age. It needed more information, and it needed finer control to analyse the situation. But it was at a bottleneck; it could not stop monitoring Caroline, whose', metadata={'source': 'chapter2.txt'})
]
When you encode and bounce the search off of the local openai extension api the results do not relate to the search term at all.
[
Document(page_content='found Lawrence sitting on one of ChipTecs\' park benches, watching some pigeons play. He wished very much that he could have fed the pigeons, but he had no food for them. They strutted up to him and cooed, not comprehending that a human could lack for something.\nThe pigeons scattered as the nation\'s designated military representatives marched up.\n"You have to turn it off," Blake said directly. His tone made it clear that he expected obedience.\n"Circuit breakers are in the basement," Lawrence', metadata={'source': 'chapter2.txt'}),
Document(page_content="most likely pulled the plug on this awesome new technology, a technology which might just vindicate Dr. Lawrence's nonviolent approach. Blake had stopped short, but only just short, of threatening to call the Strategic Air Command and have the building nuked. Privately, he still held that out as an option if Prime Intellect wasn't somehow neutralized. It would take some doing, but Blake was one of the few people in the country who could demand an air strike against Silicon Valley and, just", metadata={'source': 'chapter2.txt'}),
Document(page_content='was a tricky business; the words Lawrence used only had meaning through other associations within the GAT, and those meanings weren\'t always what Lawrence thought they were. But now he would try to plug the drain for good.\n"Force Association: Use of any technology to manipulate the environment of a human being without its permission shall be a violation of the First Law of severity two."\nThere was no immediate response.\nThen:\n\n*\tASSOCIATION REJECTED BY FIRST LAW ARBITRATOR DUE TO AN EXISTING', metadata={'source': 'chapter2.txt'}),
Document(page_content='from the TV, and words began to scroll across the screen:\n\n*\tJOHN TAYLOR IS IN THE ROOM WITH HIM. HE IS DIRECTING STEBBINS.\n\nLawrence read this as he talked. "Jail for what? I just borrowed the papers to see if Prime Intellect could expand on them."\nAnother pause. "What? It didn\'t come up with anything, did it?"\n"Well, it\'s..." (Why do you care if you\'ve just been fired? Lawrence wondered.)\n\n*\tSTEBBINS IS LYING. HE WENT TO TAYLOR AS SOON YOU LEFT AND TOLD HIM THAT YOU BROUGHT THEM TO', metadata={'source': 'chapter2.txt'})
]
Screenshot
No response
Logs
(textgen) alansrobotlab@goliath:~/Documents/brayden/langchain$ /home/alansrobotlab/anaconda3/envs/textgen/bin/python /home/alansrobotlab/Documents/brayden/langchain/brokenembeddings.py
[Document(page_content='found Lawrence sitting on one of ChipTecs\' park benches, watching some pigeons play. He wished very much that he could have fed the pigeons, but he had no food for them. They strutted up to him and cooed, not comprehending that a human could lack for something.\nThe pigeons scattered as the nation\'s designated military representatives marched up.\n"You have to turn it off," Blake said directly. His tone made it clear that he expected obedience.\n"Circuit breakers are in the basement," Lawrence', metadata={'source': 'chapter2.txt'}), Document(page_content="most likely pulled the plug on this awesome new technology, a technology which might just vindicate Dr. Lawrence's nonviolent approach. Blake had stopped short, but only just short, of threatening to call the Strategic Air Command and have the building nuked. Privately, he still held that out as an option if Prime Intellect wasn't somehow neutralized. It would take some doing, but Blake was one of the few people in the country who could demand an air strike against Silicon Valley and, just", metadata={'source': 'chapter2.txt'}), Document(page_content='was a tricky business; the words Lawrence used only had meaning through other associations within the GAT, and those meanings weren\'t always what Lawrence thought they were. But now he would try to plug the drain for good.\n"Force Association: Use of any technology to manipulate the environment of a human being without its permission shall be a violation of the First Law of severity two."\nThere was no immediate response.\nThen:\n\n*\tASSOCIATION REJECTED BY FIRST LAW ARBITRATOR DUE TO AN EXISTING', metadata={'source': 'chapter2.txt'}), Document(page_content='from the TV, and words began to scroll across the screen:\n\n*\tJOHN TAYLOR IS IN THE ROOM WITH HIM. HE IS DIRECTING STEBBINS.\n\nLawrence read this as he talked. "Jail for what? I just borrowed the papers to see if Prime Intellect could expand on them."\nAnother pause. "What? It didn\'t come up with anything, did it?"\n"Well, it\'s..." (Why do you care if you\'ve just been fired? Lawrence wondered.)\n\n*\tSTEBBINS IS LYING. HE WENT TO TAYLOR AS SOON YOU LEFT AND TOLD HIM THAT YOU BROUGHT THEM TO', metadata={'source': 'chapter2.txt'})]
[chain/start] [1:chain:RetrievalQA] Entering Chain run with input:
{
"query": "Who is Caroline?"
}
[chain/start] [1:chain:RetrievalQA > 3:chain:StuffDocumentsChain] Entering Chain run with input:
[inputs]
[chain/start] [1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain] Entering Chain run with input:
{
"question": "Who is Caroline?",
"context": "found Lawrence sitting on one of ChipTecs' park benches, watching some pigeons play. He wished very much that he could have fed the pigeons, but he had no food for them. They strutted up to him and cooed, not comprehending that a human could lack for something.\nThe pigeons scattered as the nation's designated military representatives marched up.\n\"You have to turn it off,\" Blake said directly. His tone made it clear that he expected obedience.\n\"Circuit breakers are in the basement,\" Lawrence\n\nmost likely pulled the plug on this awesome new technology, a technology which might just vindicate Dr. Lawrence's nonviolent approach. Blake had stopped short, but only just short, of threatening to call the Strategic Air Command and have the building nuked. Privately, he still held that out as an option if Prime Intellect wasn't somehow neutralized. It would take some doing, but Blake was one of the few people in the country who could demand an air strike against Silicon Valley and, just\n\nwas a tricky business; the words Lawrence used only had meaning through other associations within the GAT, and those meanings weren't always what Lawrence thought they were. But now he would try to plug the drain for good.\n\"Force Association: Use of any technology to manipulate the environment of a human being without its permission shall be a violation of the First Law of severity two.\"\nThere was no immediate response.\nThen:\n\n*\tASSOCIATION REJECTED BY FIRST LAW ARBITRATOR DUE TO AN EXISTING\n\nfrom the TV, and words began to scroll across the screen:\n\n*\tJOHN TAYLOR IS IN THE ROOM WITH HIM. HE IS DIRECTING STEBBINS.\n\nLawrence read this as he talked. \"Jail for what? I just borrowed the papers to see if Prime Intellect could expand on them.\"\nAnother pause. \"What? It didn't come up with anything, did it?\"\n\"Well, it's...\" (Why do you care if you've just been fired? Lawrence wondered.)\n\n*\tSTEBBINS IS LYING. HE WENT TO TAYLOR AS SOON YOU LEFT AND TOLD HIM THAT YOU BROUGHT THEM TO"
}
[llm/start] [1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain > 5:llm:OpenAI] Entering LLM run with input:
{
"prompts": [
"Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nfound Lawrence sitting on one of ChipTecs' park benches, watching some pigeons play. He wished very much that he could have fed the pigeons, but he had no food for them. They strutted up to him and cooed, not comprehending that a human could lack for something.\nThe pigeons scattered as the nation's designated military representatives marched up.\n\"You have to turn it off,\" Blake said directly. His tone made it clear that he expected obedience.\n\"Circuit breakers are in the basement,\" Lawrence\n\nmost likely pulled the plug on this awesome new technology, a technology which might just vindicate Dr. Lawrence's nonviolent approach. Blake had stopped short, but only just short, of threatening to call the Strategic Air Command and have the building nuked. Privately, he still held that out as an option if Prime Intellect wasn't somehow neutralized. It would take some doing, but Blake was one of the few people in the country who could demand an air strike against Silicon Valley and, just\n\nwas a tricky business; the words Lawrence used only had meaning through other associations within the GAT, and those meanings weren't always what Lawrence thought they were. But now he would try to plug the drain for good.\n\"Force Association: Use of any technology to manipulate the environment of a human being without its permission shall be a violation of the First Law of severity two.\"\nThere was no immediate response.\nThen:\n\n*\tASSOCIATION REJECTED BY FIRST LAW ARBITRATOR DUE TO AN EXISTING\n\nfrom the TV, and words began to scroll across the screen:\n\n*\tJOHN TAYLOR IS IN THE ROOM WITH HIM. HE IS DIRECTING STEBBINS.\n\nLawrence read this as he talked. \"Jail for what? I just borrowed the papers to see if Prime Intellect could expand on them.\"\nAnother pause. \"What? It didn't come up with anything, did it?\"\n\"Well, it's...\" (Why do you care if you've just been fired? Lawrence wondered.)\n\n*\tSTEBBINS IS LYING. HE WENT TO TAYLOR AS SOON YOU LEFT AND TOLD HIM THAT YOU BROUGHT THEM TO\n\nQuestion: Who is Caroline?\nHelpful Answer:"
]
}
[llm/end] [1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain > 5:llm:OpenAI] [551ms] Exiting LLM run with output:
{
"generations": [
[
{
"text": "She is a woman who has worked closely with John Taylor since she first joined the company. She is his right-hand person, and her job is to help him keep track of all the details of the project.\n",
"generation_info": {
"finish_reason": "stop",
"logprobs": null
}
}
]
],
"llm_output": {
"token_usage": {
"total_tokens": 639,
"prompt_tokens": 594,
"completion_tokens": 45
},
"model_name": "text-davinci-003"
},
"run": null
}
[chain/end] [1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain] [552ms] Exiting Chain run with output:
{
"text": "She is a woman who has worked closely with John Taylor since she first joined the company. She is his right-hand person, and her job is to help him keep track of all the details of the project.\n"
}
[chain/end] [1:chain:RetrievalQA > 3:chain:StuffDocumentsChain] [552ms] Exiting Chain run with output:
{
"output_text": "She is a woman who has worked closely with John Taylor since she first joined the company. She is his right-hand person, and her job is to help him keep track of all the details of the project.\n"
}
[chain/end] [1:chain:RetrievalQA] [565ms] Exiting Chain run with output:
{
"result": "She is a woman who has worked closely with John Taylor since she first joined the company. She is his right-hand person, and her job is to help him keep track of all the details of the project.\n"
}
(textgen) alansrobotlab@goliath:~/Documents/brayden/langchain$
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Describe the bug
Apologies for not figuring out a better title. I've been banging my head against the code for the better part of a day trying to figure out what's going on.
langchain chromadb is unable to retrieve relevant chunks using the openai embeddings api.
This is after applying the proposed pull request from: Pulll Request 4147
Is there an existing issue for this?
Reproduction
I'm attaching a simple python script to demonstrate, and a profanity-free version of chapter 2 of a free online book.
Commenting out the relevant openai_api_base elements will route to either openai or your local embedding api.
chapter2.txt
When you encode and bounce the search off of the official openai api, you get the following types of retrievals.
These retrievals are Caroline-related, as you'd expect from a simple vectordb search.
When you encode and bounce the search off of the local openai extension api the results do not relate to the search term at all.
Screenshot
No response
Logs
System Info
Ubuntu 22.04 anaconda python 3.10 rtx 3090 launched via the following bash file and referenced yaml. langchain7.sh CUDA_VISIBLE_DEVICES=1 \ python server.py \ --listen \ --listen-port 7807 \ --verbose \ --loader exllamav2 \ --model 'TheBloke_CodeLlama-7B-Instruct-GPTQ' \ --max_seq_len 8192 \ --settings langchain7.yaml \ --extensions openai langchain7.yaml openai-port: 5007 openai-embedding_device: cuda embedding_model: text-embedding-ada-002 openai-sd_webui_url: http://192.168.50.108:7807 openai-debug: 1 truncation_length: 8192
The text was updated successfully, but these errors were encountered: