-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to specify token breaking zones when calling tokenizer #5043
Comments
reckart
added a commit
that referenced
this issue
Sep 7, 2024
- Added new signature to the tokenizer call - Added test - Consolicated existing code
reckart
added a commit
that referenced
this issue
Sep 7, 2024
…to-specify-token-breaking-zones-when-calling-tokenizer #5043 - Ability to specify token breaking zones when calling tokenizer
reckart
added a commit
that referenced
this issue
Sep 8, 2024
* main: (114 commits) #5047 - Clean up layer detail UI a bit #4949 - Showing the start and end points of relations in left side bar No issue: Minor cleaning up #5043 - Ability to specify token breaking zones when calling tokenizer [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-34.0-beta-6 #5009 - Better handling of stacked annotations with link features in curation #5009 - Better handling of stacked annotations with link features in curation [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-34.0-beta-6 [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-33.6 #5040 - Improve feature form tab navigation #5037 - Show fraction of annotators that chose a certain label in curation sidebar mode #5035 - NBSPs should not be treated as tokens #5031 - ChatGPT recommender fails because format is not a supported parameter #5029 - Duplicate lines on the about page #5027 - Add more CSP configurations #4753 - Entity linker should skip already linked concepts #5007 - Lazy details on suggestions for multi-value concept features fail rendering ... % Conflicts: % pom.xml
reckart
added a commit
that referenced
this issue
Sep 8, 2024
…rs-interactively-on-the-annotation-page * main: #5047 - Clean up layer detail UI a bit #4949 - Showing the start and end points of relations in left side bar No issue: Minor cleaning up #5043 - Ability to specify token breaking zones when calling tokenizer [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-34.0-beta-6 #5009 - Better handling of stacked annotations with link features in curation #5009 - Better handling of stacked annotations with link features in curation [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-34.0-beta-6 [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-33.6 #5040 - Improve feature form tab navigation #5037 - Show fraction of annotators that chose a certain label in curation sidebar mode #5035 - NBSPs should not be treated as tokens #4904 - Upgrade to RDF4J 5.x
reckart
added a commit
that referenced
this issue
Sep 8, 2024
…code * main: (42 commits) #5047 - Clean up layer detail UI a bit #4949 - Showing the start and end points of relations in left side bar No issue: Minor cleaning up #5043 - Ability to specify token breaking zones when calling tokenizer [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-34.0-beta-6 #5009 - Better handling of stacked annotations with link features in curation #5009 - Better handling of stacked annotations with link features in curation [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-34.0-beta-6 [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-33.6 #5040 - Improve feature form tab navigation #5037 - Show fraction of annotators that chose a certain label in curation sidebar mode #5035 - NBSPs should not be treated as tokens #5031 - ChatGPT recommender fails because format is not a supported parameter #5029 - Duplicate lines on the about page #5027 - Add more CSP configurations #4753 - Entity linker should skip already linked concepts #5007 - Lazy details on suggestions for multi-value concept features fail rendering ...
reckart
added a commit
that referenced
this issue
Sep 8, 2024
* main: (359 commits) #5047 - Clean up layer detail UI a bit #4949 - Showing the start and end points of relations in left side bar No issue: Minor cleaning up #5043 - Ability to specify token breaking zones when calling tokenizer [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-34.0-beta-6 #5009 - Better handling of stacked annotations with link features in curation #5009 - Better handling of stacked annotations with link features in curation [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-34.0-beta-6 [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-33.6 #5040 - Improve feature form tab navigation #5037 - Show fraction of annotators that chose a certain label in curation sidebar mode #5035 - NBSPs should not be treated as tokens #5031 - ChatGPT recommender fails because format is not a supported parameter #5029 - Duplicate lines on the about page #5027 - Add more CSP configurations #4753 - Entity linker should skip already linked concepts #5007 - Lazy details on suggestions for multi-value concept features fail rendering ...
reckart
added a commit
that referenced
this issue
Sep 29, 2024
- Fix case where there is only a single boundary given
reckart
added a commit
that referenced
this issue
Sep 30, 2024
- Fix case where there is only a single boundary given
reckart
added a commit
that referenced
this issue
Sep 30, 2024
- Fix case where there is only a single boundary given
reckart
added a commit
that referenced
this issue
Sep 30, 2024
…to-specify-token-breaking-zones-when-calling-tokenizer #5043 - Ability to specify token breaking zones when calling tokenizer
reckart
added a commit
that referenced
this issue
Sep 30, 2024
reckart
added a commit
that referenced
this issue
Oct 1, 2024
…ases * main: (110 commits) [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-34.0 No issue: Mention MS SQL and PosgreSQL as experimental DB options No issue: Fix versions after merge #5043 - Ability to specify token breaking zones when calling tokenizer #5064 - Project template for PHI annotation #5071 - Better document which layers are supported by which formats #5061 - Multiple synchronous recommenders only the last one wins #5064 - Project template for PHI annotation #5068 - Show annotation sidebar by default #5033 - Ability to configure recommenders interactively on the annotation page #5066 - Increase default number of rows for brat editors #5056 - Ability to configure additional languages for knowledge bases #4909 - Upgrade dependencies #5056 - Ability to configure additional languages for knowledge bases #4909 - Upgrade dependencies No issue: Actually, document-metadata doesn't seem to be experimental after all... No issue: Set version to 35.0-SNAPSHOT [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-33.7 ...
reckart
added a commit
that referenced
this issue
Oct 1, 2024
…-JSON * main: (308 commits) [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-34.0 No issue: Mention MS SQL and PosgreSQL as experimental DB options No issue: Fix versions after merge #5043 - Ability to specify token breaking zones when calling tokenizer #5064 - Project template for PHI annotation #5071 - Better document which layers are supported by which formats #5061 - Multiple synchronous recommenders only the last one wins #5064 - Project template for PHI annotation #5068 - Show annotation sidebar by default #5033 - Ability to configure recommenders interactively on the annotation page #5066 - Increase default number of rows for brat editors #5056 - Ability to configure additional languages for knowledge bases #4909 - Upgrade dependencies #5056 - Ability to configure additional languages for knowledge bases #4909 - Upgrade dependencies No issue: Actually, document-metadata doesn't seem to be experimental after all... No issue: Set version to 35.0-SNAPSHOT [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-33.7 ... % Conflicts: % inception/inception-schema/src/main/java/de/tudarmstadt/ukp/inception/schema/exporters/LayerExporter.java % inception/inception-schema/src/main/java/de/tudarmstadt/ukp/inception/schema/exporters/TagSetExporter.java % inception/inception-ui-project/src/main/java/de/tudarmstadt/ukp/clarin/webanno/ui/project/layers/LayerDetailForm.java % inception/inception-ui-project/src/main/java/de/tudarmstadt/ukp/clarin/webanno/ui/project/layers/ProjectLayersPanel.java
reckart
added a commit
that referenced
this issue
Oct 1, 2024
* main: (30 commits) [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-34.0 No issue: Mention MS SQL and PosgreSQL as experimental DB options No issue: Fix versions after merge #5043 - Ability to specify token breaking zones when calling tokenizer #5064 - Project template for PHI annotation #5071 - Better document which layers are supported by which formats #5061 - Multiple synchronous recommenders only the last one wins #5064 - Project template for PHI annotation #5068 - Show annotation sidebar by default #5033 - Ability to configure recommenders interactively on the annotation page #5066 - Increase default number of rows for brat editors #5056 - Ability to configure additional languages for knowledge bases #4909 - Upgrade dependencies #5056 - Ability to configure additional languages for knowledge bases #4909 - Upgrade dependencies No issue: Actually, document-metadata doesn't seem to be experimental after all... No issue: Set version to 35.0-SNAPSHOT [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-33.7 ...
reckart
added a commit
that referenced
this issue
Oct 1, 2024
…he-Annotator-editor * main: (330 commits) [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-34.0 No issue: Mention MS SQL and PosgreSQL as experimental DB options No issue: Fix versions after merge #5043 - Ability to specify token breaking zones when calling tokenizer #5064 - Project template for PHI annotation #5071 - Better document which layers are supported by which formats #5061 - Multiple synchronous recommenders only the last one wins #5064 - Project template for PHI annotation #5068 - Show annotation sidebar by default #5033 - Ability to configure recommenders interactively on the annotation page #5066 - Increase default number of rows for brat editors #5056 - Ability to configure additional languages for knowledge bases #4909 - Upgrade dependencies #5056 - Ability to configure additional languages for knowledge bases #4909 - Upgrade dependencies No issue: Actually, document-metadata doesn't seem to be experimental after all... No issue: Set version to 35.0-SNAPSHOT [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-33.7 ...
reckart
added a commit
that referenced
this issue
Oct 1, 2024
…causes-problems-in-search * main: (32 commits) [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-34.0 No issue: Mention MS SQL and PosgreSQL as experimental DB options No issue: Fix versions after merge #5043 - Ability to specify token breaking zones when calling tokenizer #5064 - Project template for PHI annotation #5071 - Better document which layers are supported by which formats #5061 - Multiple synchronous recommenders only the last one wins #5064 - Project template for PHI annotation #5068 - Show annotation sidebar by default #5033 - Ability to configure recommenders interactively on the annotation page #5066 - Increase default number of rows for brat editors #5056 - Ability to configure additional languages for knowledge bases #4909 - Upgrade dependencies #5056 - Ability to configure additional languages for knowledge bases #4909 - Upgrade dependencies No issue: Actually, document-metadata doesn't seem to be experimental after all... No issue: Set version to 35.0-SNAPSHOT [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release inception-33.7 ... % Conflicts: % inception/inception-ui-project/src/main/java/de/tudarmstadt/ukp/clarin/webanno/ui/project/layers/ProjectLayersPanel.html
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem? Please describe.
The tokenizer implicitly respects sentence boundaries. But if there is a token-breaking event inside a sentence, there is no way to communicate that to the tokenizer. Such an event could e.g. be a
<br/>
milestone element in an XML/HTML file.Describe the solution you'd like
Allow zone boundaries on the tokenizer call just as we do on the sentence splitter call.
The text was updated successfully, but these errors were encountered: