-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support small-to-big retrieval #179
Comments
Hey @synio-wesley , This makes a lot of sense, even more since the context windows of a lot of LLMs have increased by a lot. |
@MaximeThoonsen yes I have something running locally which seems to work ok for my purposes. But it might require some tweaking and I'm not sure if it works as well with other vector DBs (I've been working/testing with Redis) |
This feature would be really interesting. I'm not sure to understand how the particular kind of vector store could influence it though, @synio-wesley could you elaborate? |
@f-lombardo because we somehow have to retrieve the related pieces.. but I might not be doing it in the best way yet. Right now for myself I have implemented it in such a way without a modification to I added a For And then I have a After fetching them, I filter away duplicate chunks. I also group the fetched documents in order of importance (how they were retrieved originally) and chunk number so it makes the most sense. I also have a I basically run the For my purposes, this works great. But I don't know if this approach is the best approach for everyone and I don't know if it's equally easy to find other chunks of the same document easily with other vector stores. I've only been working with RAG for a very short while so it's new territory for me. I'm not 100% happy about the API I have created for myself. But for the project I'm using this for, my approach works well. There's a commercial competitor that I'm comparing against and my results are consistently way better after all these modifications. |
@synio-wesley thank you for the clarification. |
Of course vector store agnostic is great. But in any case we will need to implement a new method for all vector stores so we can fetch related docs right? Depending on how we store them it could be simpler or not with the different vector stores. For Redis I didn't need any adjustment to saving of the docs, at least not for SlidingWindowTransformer. But other types of small-to-big retrieval might be different. And other stores maybe as well. Maybe we could discuss the feature a little bit somewhere? But I'm on holiday for 2 weeks now so not a lot of time. |
The SlidingWindowTransformer make sense only if have very big document right? Or I am missing something? @synio-wesley |
Yes, of course this is one option. Another one is to have a DocumentStore that differs from the VectorStore, as in LangChain: I'm not sure which one could be the best solution. |
I think so, even if the concept of "big" may differ a lot based on various use cases. |
I am not a RAG expert, but as far as I understand, if you make smaller blocks/chunks of text (few sentences) then the vector that gets calculated for it makes more sense, because the chance of multiple different concepts being inside of one chunk gets smaller. But if we would then only use this small chunk to give as context to the LLM, that would be too small and contain not enough information. So that's why some chunks before get prepended to it, and some chunks after get appended to it. You could also grab the whole parent document (all chunks) but that will be a lot of content, especially if you retrieved multiple candidates and want to retrieve the parent documents for all of them and they are different documents. For some queries, chunks in multiple documents might be needed to answer the query correctly. Then you would end up with multiple full documents (in my case scraped webpages) that might become quite large for the context, making it rather expensive as well (which is a concern in my application). That's why for my application the SlidingWindowTransformer approach seems to work well. The calculated vectors are more aligned with the content of the smaller chunks. And then I make the chunk bigger by retrieving extra chunks around the 3 best chunk candidates. And that result is given to the LLM as context to work with. |
Only glanced at that page you linked to, but it looks like LangChain is allowing 2 different stores, one for the retrieval of the child docs/chunks and one for grabbing the parent docs. This might make a lot of sense because different stores might be optimized for different things. A good vector retrieval store might be different from another type of DB that is good at fetching the parent docs based on the ID of the child doc. In my current implementation everything is a bit simplified and tailored to my own use case, but I like the idea of allowing the use of 2 different stores for these 2 different functionalities. And I guess you could still opt to use the same store for both as well. At least the underlying DB could be the same I mean, as long as the DB supports both functionalities (like Redis would) |
I think that this issue has been addressed by #193 |
What I want to achieve basically is something like parent document retrieval and/or sentence window retrieval.
Basically I want to create smaller chunks so that the LLM can create more targeted vectors for the smaller chunks. But because we lose a lot of context that way, I also want to save bigger parent chunks. The smaller chunks should then point to the bigger chunks using metadata or something.
So basically we retrieve the smaller chunks using the vector distance algorithm like before, but then we grab the bigger parent chunks which contains more context and feed this to the LLM instead.
I'm not sure if this is supported yet or if we can add this functionality ourselves easily without modifying LLPhant?
The text was updated successfully, but these errors were encountered: