-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synthetic Data Generation for Retriever Evaluation #338
Open
vinay-raman
wants to merge
95
commits into
NVIDIA:main
Choose a base branch
from
vinay-raman:sdg_pipeline/retriever_evalset_generation_signoff_fixed
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Synthetic Data Generation for Retriever Evaluation #338
vinay-raman
wants to merge
95
commits into
NVIDIA:main
from
vinay-raman:sdg_pipeline/retriever_evalset_generation_signoff_fixed
+12,286
−665
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@ryantwolf I have added the tests and fixed the signoff checks as well. Thanks! |
ryantwolf
requested changes
Oct 30, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more pieces of feedback, more about the configuration this time.
tutorials/synthetic_retrieval_evaluation_customization/filters.py
Outdated
Show resolved
Hide resolved
tutorials/synthetic_retrieval_evaluation_customization/requirements.txt
Outdated
Show resolved
Hide resolved
tutorials/synthetic_retrieval_evaluation_customization/retriever_evalset_generator.py
Outdated
Show resolved
Hide resolved
tutorials/synthetic_retrieval_evaluation_customization/README.md
Outdated
Show resolved
Hide resolved
tutorials/synthetic_retrieval_evaluation_customization/README.md
Outdated
Show resolved
Hide resolved
tutorials/synthetic_retrieval_evaluation_customization/notebooks/quickstart.ipynb
Outdated
Show resolved
Hide resolved
tutorials/synthetic_retrieval_evaluation_customization/filters.py
Outdated
Show resolved
Hide resolved
tutorials/synthetic_retrieval_evaluation_customization/retriever_evalset_generator.py
Outdated
Show resolved
Hide resolved
vinay-raman
force-pushed
the
sdg_pipeline/retriever_evalset_generation_signoff_fixed
branch
from
November 12, 2024 22:52
062cac6
to
d5dc0ae
Compare
Signed-off-by: Ryan Wolf <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
* Sign and add Signed-off-by: Praateek Mahajan <[email protected]> * implement strtobool Signed-off-by: Praateek Mahajan <[email protected]> * pre-commit Signed-off-by: Praateek Mahajan <[email protected]> * Update readme Signed-off-by: Praateek Mahajan <[email protected]> --------- Signed-off-by: Praateek Mahajan <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
* Make embedding column flexible for semdedup Signed-off-by: Ryan Wolf <[email protected]> * Fix embedding col in add_dist_to_cents Signed-off-by: Ryan Wolf <[email protected]> * Add image curation tutorial Signed-off-by: Ryan Wolf <[email protected]> * Address Sarah's feedback Signed-off-by: Ryan Wolf <[email protected]> * Add output to image curation tutorial Signed-off-by: Ryan Wolf <[email protected]> * Add punctuation Signed-off-by: Ryan Wolf <[email protected]> * Address Vibhu's comments Signed-off-by: Ryan Wolf <[email protected]> --------- Signed-off-by: Ryan Wolf <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
* Fixing Model Address Signed-off-by: Chris Alexiuk <[email protected]> * Fixing Model Address v1 Signed-off-by: Chris Alexiuk <[email protected]> --------- Signed-off-by: Chris Alexiuk <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Fix NVIDIA#264. Scikit-learn is not expecting anymore the "affinity parameter" Signed-off-by: Miguel Martínez <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
* start adding dask-expr support Signed-off-by: rjzamora <[email protected]> * add query_planning_enabled util Signed-off-by: rjzamora <[email protected]> * add global keyword Signed-off-by: rjzamora <[email protected]> * Forgot to remove top level query-planning check Signed-off-by: rjzamora <[email protected]> * fix other shuffle-arg problems that don't 'work' with dask-expr Signed-off-by: rjzamora <[email protected]> * remove name arg usage for now Signed-off-by: rjzamora <[email protected]> * fix bugs Signed-off-by: rjzamora <[email protected]> --------- Signed-off-by: rjzamora <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Praateek Mahajan <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Praateek Mahajan <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Praateek Mahajan <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Praateek Mahajan <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Praateek Mahajan <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Praateek Mahajan <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Sarah Yurick <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
…VIDIA#256) * Improve performance in jsonl files Signed-off-by: miguelusque <[email protected]> * Improve performance in jsonl files Signed-off-by: miguelusque <[email protected]> * Shutdown Dask cluster at exit Signed-off-by: miguelusque <[email protected]> * Remove unneeded persist() and wait() operations Signed-off-by: miguelusque <[email protected]> * Display only Dask error messages or above Signed-off-by: miguelusque <[email protected]> * Cancel any remaining futures Signed-off-by: miguelusque <[email protected]> * Remove Dask warning message Signed-off-by: miguelusque <[email protected]> * Rename new arguments Signed-off-by: miguelusque <[email protected]> * Refactor separate_by_metadata Signed-off-by: miguelusque <[email protected]> --------- Signed-off-by: miguelusque <[email protected]> Co-authored-by: miguelusque <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
* Pin to spacy<3.8 temporarily to unblock CI Signed-off-by: Ayush Dattagupta <[email protected]> * Update pin in rapids nightly dep as well Signed-off-by: Ayush Dattagupta <[email protected]> --------- Signed-off-by: Ayush Dattagupta <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: viraman <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: viraman <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: viraman <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: viraman <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: viraman <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: viraman <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: viraman <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: viraman <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: viraman <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: viraman <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: viraman <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: viraman <[email protected]> Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
…meters to filters constructor Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
Signed-off-by: Vinay Raman <[email protected]>
vinay-raman
force-pushed
the
sdg_pipeline/retriever_evalset_generation_signoff_fixed
branch
from
November 13, 2024 00:18
d5dc0ae
to
052044c
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Synthetic data generation for Retriever Evaluation
Usage
Checklist