Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NormalizedString.clear() broken? #1636

Open
lkurlandski opened this issue Sep 25, 2024 · 1 comment
Open

NormalizedString.clear() broken? #1636

lkurlandski opened this issue Sep 25, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@lkurlandski
Copy link

lkurlandski commented Sep 25, 2024

Hello. I think there are some problems with NormalizedString (tokenizers 0.15.2).

In the following example, append() works as expected.

from tokenizers import NormalizedString

s = NormalizedString("Hi.")  # NormalizedString(original="Hi.", normalized="Hi.")
s.append("Hello.") # NormalizedString(original="Hi.", normalized="Hi. Hello.")

After using clear(), append() no longer modifies the normalized attribute.

from tokenizers import NormalizedString

s = NormalizedString("Hi.")  # NormalizedString(original="Hi.", normalized="Hi.")
s.clear()  # NormalizedString(original="Hi.", normalized="")
s.append("Hello.")  # NormalizedString(original="Hi.", normalized="")

This is also a problem with prepend.

@ArthurZucker ArthurZucker added the bug Something isn't working label Sep 26, 2024
@ArthurZucker
Copy link
Collaborator

Indeed, would you like to have a go at it and open a PR ? 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants