Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move preprocessing to base classes #1807

Merged
merged 5 commits into from
Sep 5, 2024

Conversation

mattdangerw
Copy link
Member

I think this will overall be a nice simplification for maintenance. Push whatever logic we can down onto the base preprocessing classes. Saves a lot of code. To assist with this, I am adding a special_tokens property to tokenizers, which I think will be useful anyway.

@mattdangerw
Copy link
Member Author

Probably still some test breakages to work though, not mailing this out quite yet.

@mattdangerw mattdangerw force-pushed the preprocessing-simplify branch 2 times, most recently from ce944da to 9250f79 Compare September 4, 2024 00:05
I think this will overall be a nice simplification for maintenance.
Push whatever logic we can down onto the base preprocessing classes.
Saves a lot of code. To assist with this, I am adding a
`special_tokens` property to tokenizers, which I think will be useful
anyway.
@mattdangerw
Copy link
Member Author

Ok! Passing besides the nightly failure (which is unrelated). Mailing out.

@mattdangerw
Copy link
Member Author

The nightly breakage is unrelated btw.

Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@SamanehSaadat SamanehSaadat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Matt! It's really nice to move all these common logic to the base classes!
Just left some nit comments!

keras_nlp/src/tokenizers/tokenizer.py Outdated Show resolved Hide resolved
keras_nlp/src/tokenizers/tokenizer.py Outdated Show resolved Hide resolved
keras_nlp/src/tokenizers/word_piece_tokenizer.py Outdated Show resolved Hide resolved
@mattdangerw
Copy link
Member Author

Thanks for review! Will pull this in once tests pass.

@mattdangerw mattdangerw added the kokoro:force-run Runs Tests on GPU label Sep 4, 2024
@mattdangerw mattdangerw merged commit 9707bb2 into keras-team:master Sep 5, 2024
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kokoro:force-run Runs Tests on GPU
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants