anysplit: Fix split to more than 2 parts #481
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Main fixes:
MOR-
regexes.The number of suffixes is still variable.
This can be fixed, supposing a definition of how to perform the marking in
case of more than 3 tokens.
I noted a bad interaction with the random splitting and the current way sane-morphism is done:
A sentence may have many millions of parses, but only a few ones are displayed (say even 1-3).
This happens due to a combination of several things:
Due to that, only very few "sane" linkages remain.
Possible fix:
Make sane-morphism on the fly when filling the linkage array. This will also simplify the program.
Am lower !limit can be used then to prevent issuing of the default 1000 linkages.
A similar problem currently happens also in English for some sentences with many linkages
(when a large portion of them is insane due to bogus unit splitting tries).
I can send a PR if this is a good fix.