Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "term grouping exceptions" to mandarin parser #430

Closed
jzohrab opened this issue May 22, 2024 · 4 comments
Closed

Add "term grouping exceptions" to mandarin parser #430

jzohrab opened this issue May 22, 2024 · 4 comments
Labels
enhancement New feature or request fixed Fixed in develop or master, to be launched.

Comments

@jzohrab
Copy link
Collaborator

jzohrab commented May 22, 2024

Per this discord thread - "Overriding the highlight"

Sometimes jieba groups things incorrectly. Users need to be able to get at the underlying characters. The mandarin fork had a text file of parsing exceptions, e.g.

X,Y

which means "if you try to group "XY" together in a term, instead parse it as "X" and "Y". Note that this could still be grouped a bit, eg "X,YZ" means "if you try to group XZY all together, instead parse it as X and YZ."

@jzohrab jzohrab added the enhancement New feature or request label May 22, 2024
@jzohrab jzohrab added this to Lute-v3 May 22, 2024
@jzohrab jzohrab moved this to Todo in Lute-v3 May 22, 2024
@jzohrab jzohrab moved this from Todo to In Progress in Lute-v3 May 26, 2024
@jzohrab
Copy link
Collaborator Author

jzohrab commented May 26, 2024

Working on this currently, have a good handle on it. Lute will need to be launched for this capability, as there are changes required in the abstract parser.

@jzohrab
Copy link
Collaborator Author

jzohrab commented May 26, 2024

Branch wip_issue_430_parser_exceptions pushed, tests with exceptions are working for mandarin parser.

Now have to call the init_data_dir for each parser and loaded plugin in the app_factory, should be straightforward.

*** MAYBE move code for init_plugins to app_factory ... seems like the right place, as the factory has to do some extra stuff for the plugins (?)
*** create the top-level `userparserdata` dir if any parser actually has a data dir, in app_factory
*** assign parser's directory for all parsers
*** if any parser needs a data dir, call top-level "create data dir" thing for all parsers
*** after parsers loaded, loop and call "set up data" method - parsers handle that - create files and dirs

Then test it out:

  • install lute only, no plugin
  • start it up -- no extra data dir
  • install mandarin plugin
  • start it up -- extra data dir
  • test it out - add some exceptions to the file, check with the demo story

@jzohrab
Copy link
Collaborator Author

jzohrab commented May 28, 2024

In develop, seems to work fine.

@jzohrab jzohrab added the fixed Fixed in develop or master, to be launched. label May 28, 2024
@jzohrab jzohrab moved this from In Progress to Done in Lute-v3 May 28, 2024
@jzohrab
Copy link
Collaborator Author

jzohrab commented May 30, 2024

Launched in 3.4.2.

@jzohrab jzohrab closed this as completed May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fixed Fixed in develop or master, to be launched.
Projects
Archived in project
Development

No branches or pull requests

1 participant