Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to get character indicies #19

Merged
merged 7 commits into from
Apr 5, 2023
Merged

Conversation

Masboes
Copy link

@Masboes Masboes commented Mar 14, 2023

  • Add optional char_indices parameter, default=false
  • When true, it will replace the token indices by character indices
  • When false, everything behaves as before

Implements #17

@Masboes Masboes added the enhancement New feature or request label Mar 14, 2023
@Masboes Masboes changed the title Character ranges Add option to get character indicies Mar 14, 2023
@davidberenstein1957
Copy link
Owner

davidberenstein1957 commented Mar 15, 2023

Hi, the PR looks great however, in this case, I would opt for 2 more return keys "clusters_char" and "cluster_heads_chars" or even better, replace the old keys to deprecate this behaviour and go for a more tokenization-agnostic implementation.

@Masboes
Copy link
Author

Masboes commented Mar 16, 2023

Thanks for taking a look. Returning the chars in separate return keys makes sense to me. Wouldn't your second suggestion just be my PR but with char_indices forced to be on? I imagine that's quite the breaking change. Maybe I misunderstand?

@davidberenstein1957
Copy link
Owner

davidberenstein1957 commented Mar 16, 2023 via email

@Masboes
Copy link
Author

Masboes commented Mar 16, 2023

Sounds fine to me. Shall I then just remove the option and make it so it always returns char indices? I can clean the code up a bit doing that. I'll also update the readme to reflect this change.

@Masboes
Copy link
Author

Masboes commented Mar 20, 2023

@davidberenstein1957 I've implemented your suggestions, let me know if this is what you had in mind. If so, I'll update the readme.

@Masboes
Copy link
Author

Masboes commented Apr 3, 2023

@davidberenstein1957 any chance you can take a look at this pr?

@davidberenstein1957
Copy link
Owner

davidberenstein1957 commented Apr 3, 2023 via email

Copy link
Owner

@davidberenstein1957 davidberenstein1957 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LFTM

@davidberenstein1957 davidberenstein1957 merged commit 390dc71 into main Apr 5, 2023
@davidberenstein1957 davidberenstein1957 deleted the character-ranges branch April 5, 2023 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants