-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Init ColBERTv2 managed index #9656
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Do you mind also removing |
Oh, or did it show up as a move/rename? I think the old notebook is still there at least |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this would make more sense as just a retriever? Similar to BM25Retriever
? Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
colbertv2 indeed build index. it has its own indexer, retriever
removed old notebook in this pr. the colbert files are renamed |
@@ -121,12 +124,9 @@ def _build_index_from_nodes(self, nodes: Sequence[BaseNode]) -> IndexDict: | |||
kmeans_niters=self.kmeans_niters, | |||
) | |||
indexer = Indexer(checkpoint=self.model_name, config=config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@logan-markewich their indexer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I suppose. It just feels very similar to BM25 since you can't add or delete data from it? It doesn't fit our definition of index very well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh they do have CRUD for index data: https://github.com/stanford-futuredata/ColBERT/blob/117098348e5196ede1c8e396c1c14f24d9a8754e/colbert/index_updater.py
the previous pr does not support it. I will add them for another pr
this pr is mostly for testing retrieval performance
Description
test ColBERT v2 model as a managed index for benchmarking purpose
https://arxiv.org/abs/2112.01488
Fixes # (issue)
Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Suggested Checklist:
make format; make lint
to appease the lint gods