-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "term grouping exceptions" to mandarin parser #430
Comments
Working on this currently, have a good handle on it. Lute will need to be launched for this capability, as there are changes required in the abstract parser. |
Branch Now have to call the
Then test it out:
|
In |
Launched in 3.4.2. |
Per this discord thread - "Overriding the highlight"
Sometimes jieba groups things incorrectly. Users need to be able to get at the underlying characters. The mandarin fork had a text file of parsing exceptions, e.g.
which means "if you try to group "XY" together in a term, instead parse it as "X" and "Y". Note that this could still be grouped a bit, eg "X,YZ" means "if you try to group XZY all together, instead parse it as X and YZ."
The text was updated successfully, but these errors were encountered: