Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in Word Segmentation demo #39

Open
Lucretiel opened this issue Feb 8, 2021 · 0 comments
Open

Bug in Word Segmentation demo #39

Lucretiel opened this issue Feb 8, 2021 · 0 comments
Labels
from public Feedback/bug report from the public, that is, not from a Unicode Tools/UCD contributor/maintainer.

Comments

@Lucretiel
Copy link

Consider the string "abc를".

According to the word segmentation rules, this should be treated as a single word (see detailed discussion: unicode-rs/unicode-segmentation#90, rust-lang/regex#743). However, the current demo site splits it into two words. While splitting here is permitted per the Notes section on splitting between different character sets, the formal rules are that there should be no split here, which is what the demo site should reflect.

@markusicu markusicu added the from public Feedback/bug report from the public, that is, not from a Unicode Tools/UCD contributor/maintainer. label Sep 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from public Feedback/bug report from the public, that is, not from a Unicode Tools/UCD contributor/maintainer.
Projects
None yet
Development

No branches or pull requests

2 participants