Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batched decorator #18

Merged
merged 2 commits into from
Apr 2, 2024
Merged

Add batched decorator #18

merged 2 commits into from
Apr 2, 2024

Conversation

ryantwolf
Copy link
Collaborator

@ryantwolf ryantwolf commented Mar 27, 2024

Refactors batched to be in a decorator. This frees the user from having to be knowledgable about how the underlying DocumentFilter or DocumentModifier when initializing ScoreFilter or Modify respectively.

Unit tests pass. The following scripts were manually tested and work properly.

examples/classifier_filtering.py
examples/find_pii_and_deidentify.py
nemo_curator/scripts/find_pii_and_deidentify.py
tutorials/tinystories/main.py

The examples and other scripts have been manually tested to work.

Signed-off-by: Ryan Wolf <[email protected]>
@ryantwolf ryantwolf requested a review from ayushdg April 1, 2024 20:37
Copy link
Collaborator

@ayushdg ayushdg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! Just a couple of places with the stray batched argument but generally lgtm.
I like the idea of having batched be an attribute of the filters/modifiers. Prevents the caller from having knowledge of which variant should be used.

@ryantwolf ryantwolf merged commit ccf107a into main Apr 2, 2024
3 checks passed
@ryantwolf ryantwolf deleted the rywolf/fixed-batched-refactor branch April 24, 2024 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants