Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAGatouille: Dedicated ModelIndex #158

Merged
merged 12 commits into from
Mar 15, 2024

Conversation

jlscheerer
Copy link
Collaborator

@jlscheerer jlscheerer commented Feb 24, 2024

Starts to decouple a concrete ModelIndex from the LateInteractionModel.
The aim here is to be able to support a variety of different Indexes in the future (e.g., flat, or HNSW).
In particular, it should be possible to support lower depdenceny/CRUD-friendly indices in case of a small number of documents.

  • In case the concept looks good, I can proceed to move more functionality from the Model to the Index.
  • Once PLAID Indexes are supported, it should be rather straight-forward to add support for lower dependency indexes.

Relevant issue: #110

@bclavie
Copy link
Collaborator

bclavie commented Feb 24, 2024

Hey Luca. Thank you! This looks great, and very close to what I had in mind (decoupling Index and Model has been long overdue). I like the approach a lot.

@okhat
Copy link
Collaborator

okhat commented Feb 25, 2024

Wonderful things!! :D

@jlscheerer
Copy link
Collaborator Author

The issue mentioned in #168 should be fixed with these changes. However, it seems there is a similar issue for delete_from_index (also reproducible on main).

@jlscheerer jlscheerer marked this pull request as ready for review March 7, 2024 17:47
@jlscheerer
Copy link
Collaborator Author

@anirudhdharmarajan would you mind taking a look at these changes to make sure I have incorporated your fixes correctly? Thanks in advance.

@ksadov
Copy link

ksadov commented Mar 8, 2024

Can confirm that this appears to resolve #168, as well as the issue that I mentioned with the searcher not updating after add. I did have to modify my test script to add 64 new documents instead of 1, or I ran into the error:

RuntimeError: Error in void faiss::Clustering::train_encoded(faiss::idx_t, const uint8_t *, const faiss::Index *, faiss::Index &, const float *) at /Users/runner/work/faiss-wheels/faiss-wheels/faiss/faiss/Clustering.cpp:267: Error: 'nx >= k' failed: Number of training points (11) should be at least as large as number of clusters (32)

Not sure if this is considered an issue (tho kind of a pain for my use-case, since I do plan to be adding documents one at a time)

@bclavie
Copy link
Collaborator

bclavie commented Mar 8, 2024

This looks great so far, thank you @jlscheerer. I'll try and review it more thoroughly and have it merged Monday at the latest, along with a new release!

@bclavie
Copy link
Collaborator

bclavie commented Mar 14, 2024

Hey, back from the food poisoning, not quite recovered but not quite as bad either 🥲.

LGTM so far and the tests are passing. All seems good locally, there's just the one comment above (about emptying self.collection) which I'd like to clarify but happy to then merge & release. Thank you so much!

@jlscheerer jlscheerer requested a review from bclavie March 15, 2024 03:08
@bclavie bclavie enabled auto-merge (squash) March 15, 2024 10:48
@bclavie
Copy link
Collaborator

bclavie commented Mar 15, 2024

Fantastic, thank you! Will release this as 0.8.0 later today 😄

@bclavie bclavie merged commit 2d8265e into AnswerDotAI:main Mar 15, 2024
2 checks passed
@bclavie bclavie mentioned this pull request Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants