You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I can tell you where I need to do query for unique metadata.
I am ingesting large document texts as embeddings into chromadb. I am creating chunks of tokens of these texts due to token limitation of embedding model. The token size is 512.
I will be generating embeddings of these tokens but these chunks are of same document which is referred as doc_id.
When I do query and if any of the chunk in this document is matched then i do not want any other chunk from same document. This ensures that one document chunk if matched then we do not search other chunks as it will be of same document.
I am planning to store the doc_id as metadata for all chunks.
So I need a distinct query on metadata for doc_id. Currently I am doing manual filtering by keeping doc_id in set and then trying to check whether doc_id exists or not which is ineffiecient.
The text was updated successfully, but these errors were encountered:
hey @Mhsh, you are not alone in thinking of this way of dealing with queries (e.g., avoid chunks from the same document - reasoning: if the document contains relevant info, I don't want any more paragraphs with less relevant info, greater distance from the query).
I had some work done on this; let me try to dig them out. This will require a PR on core Chroma, as the filtering itself won't help unless you can afford multiple queries.
I can tell you where I need to do query for unique metadata.
The text was updated successfully, but these errors were encountered: