-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct Tokenization #1707
Comments
INCEpTION internally uses the Java Breakiterator when you import documents as plain text. Where did you find information about us using StanfordNLP tokenization? In case you need your own tokenzitation, you need to convert your document into a tokenized file format, e.g. webanno tsv or conll2002. You can get a list of file formats and description in the INCEpTION documentation. |
@jcklie Let's close this one and open a separate one for actually making tokenization editable in the brat view? |
- Fixed layout if arc labels fall back to layer names - Changed color of arc labels from the arc color to black to make them better readable, in particular when selected
…s-not-having-a-label-is-broken #1707 - Layout for relations not having a label is broken
* 3.6.x: #1707 - Layout for relations not having a label is broken
- Fixed layout if arc labels fall back to layer names - Changed color of arc labels from the arc color to black to make them better readable, in particular when selected
* commit '608c37c619cb677952908154d813f61ac2b34a1e': [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release webanno-4.0.0-beta-14 Bump spring.security.version from 5.2.3.RELEASE to 5.3.3.RELEASE #1472 - Upgrade dependencies (4.0.0) #1472 - Upgrade dependencies (4.0.0) #1472 - Upgrade dependencies (4.0.0) #1707 - Layout for relations not having a label is broken #1712 Auto logout not succeeding % Conflicts: % pom.xml
I am running INCEpTION to get some Named Entities, It is doing really good job of IOB2 tagging of those Named Entities. However it is saying its using StanfordNLP tokenization. However the tokenization with special characters/symbols could have been better.
for e.g.
Token - As Is --> To Be
*True values masked
We tokenized the same text using StanfordNLP Tokenizer and its giving correct tokens (as mentioned under To Be)
Is there a setting through which we achieve the correct tokenization or this is feature not currently incorporated. Please let us know. That will be a great help.
~Sachin
The text was updated successfully, but these errors were encountered: