Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make tokenization editable #1778

Open
12 of 19 tasks
jcklie opened this issue Jul 28, 2020 · 1 comment
Open
12 of 19 tasks

Make tokenization editable #1778

jcklie opened this issue Jul 28, 2020 · 1 comment
Assignees
Labels

Comments

@jcklie
Copy link
Contributor

jcklie commented Jul 28, 2020

Is your feature request related to a problem? Please describe.
When using the builtin tokenizer of INCEpTION, it sometimes does errors. It would be nice if the tokenization can be edited.

Describe the solution you'd like
Make the tokenization somehow editable, e.g. in the brat view.

  • #2466 - Displayed Text changes as annotations are added / removed #2474
    • Decoupling of rendering in brat from existence of tokens
    • Option to enable token layer as editable layer
    • Option to enable sentence layer as editable layer
  • #1778 - Make tokenization editable #2989
    • Display sentences/tokens in the annotator views on the curation page if they are editable
    • If tokens/sentences are editable, run them through the merge process instead of obtaining them from the template
    • Use a line-oriented editor instead of a sentence-oriented editor while the sentence layer is not read-only
    • Add documentation for editable sentences
    • Disable changing anchoring mode and overlap mode for sentences / tokens
    • Disable coloring rules for sentences
  • #1778 - Make tokenization editable #5180
    • Splitting a Token by inserting a zero-width token annotation at the split location
    • Deleting a Token (will resizes adjacent tokens as necessary
    • When deleting a Token at the start/end of a sentence, the merge token does not cross then sentence boundary
    • Deleting a Token can be un-done (via undo functionality)
    • Annotations attached to the token will be appropriately resizes (in particular Lemma)
    • Annotations with token-level granularity will be appropriately resized
  • Todo / Open questions
    • Deleting a token also removes attached annotations (eg. POS and transitively dependency relations)
    • What to do with "cross sentence" behavior if a document contains no sentences?
    • Should it be allowed to delete the sentence / token layer?
    • What about tokens / sentences in the curation view?
@jcklie jcklie added this to the Feature backlog milestone Jul 28, 2020
jcklie pushed a commit that referenced this issue Feb 2, 2021
…e-API-when-external-authentication-is-enabled-v4

#1776 - Unable to use remote API when external authentication is enabled (master)
reckart added a commit that referenced this issue Mar 27, 2021
- Remove redundant definition of getLayersToRender()
reckart added a commit that referenced this issue Mar 27, 2021
- Option to decouple brat rendering from the existence of Tokens
reckart added a commit that referenced this issue Mar 27, 2021
- Option to enable tokens and sentences as layers that appear in the layer list and that could in principle be edited
@reckart reckart self-assigned this Mar 27, 2021
@reckart reckart added the ⭐️ Enhancement New feature or request label Mar 27, 2021
reckart added a commit that referenced this issue Mar 28, 2021
- Fixed JavaDoc problem
reckart added a commit that referenced this issue Mar 28, 2021
- Remove unnecessary dependency
reckart added a commit that referenced this issue Mar 28, 2021
- Remove unnecessary dependency
reckart added a commit that referenced this issue May 8, 2021
- Added missing logger
reckart added a commit that referenced this issue May 8, 2021
- When deleting a Token, also recursively delete attached layers like POS and relations attached to these such as Dependency
reckart added a commit that referenced this issue Jul 11, 2021
- Move the properties to the proper location *cough*
reckart added a commit that referenced this issue Jul 11, 2021
- Drop reliance on tokens for rendering data in brat - instead use whitespace in the text to generate UI-only tokens
- This is a "backport" of a set of changes previously being introduced as part of #1778 Make tokenization editable
reckart added a commit that referenced this issue Jul 22, 2021
reckart added a commit that referenced this issue Jul 25, 2021
…enization-editable

#1778 Make tokenization editable
reckart added a commit that referenced this issue Jul 28, 2021
* main: (54 commits)
  #2517 - Avoid multiple property conditions for websocket-related features
  No issue. Update version.
  [maven-release-plugin] prepare for next development iteration
  [maven-release-plugin] prepare release inception-20.0
  #2511 - Progress info for indexing
  #2379 - Push recommender errors via websockets to user
  #2379 - Push recommender errors via websockets to user
  [maven-release-plugin] prepare for next development iteration
  [maven-release-plugin] prepare release inception-0.20.0-rc-8
  #2506 - Cross-platform docker build
  #1778 - Make tokenization editable
  #2492 - Deleting annotation with link features does not redraw
  #2493 - Switch to a JSASS version that runs on Apple Silicon
  #2503 - Reset admin password
  #2503 - Reset admin password
  #2493 - Switch to a JSASS version that runs on Apple Silicon
  #2493 - Switch to a JSASS version that runs on Apple Silicon
  #2379 - Push recommender errors via websockets to user
  #2499 - Stop bundling MySQL driver
  #2497 - Exceptions during task selection and accept best
  ...

% Conflicts:
%	inception/inception-search-mtas/src/test/java/de/tudarmstadt/ukp/inception/search/index/mtas/MtasDocumentIndexTest.java
reckart added a commit that referenced this issue Oct 18, 2021
- Sentences are currently expected to be the "first" unit - that is because we define that tokens respect sentence boundaries. Maybe that's going to change again, i.e. such that tokens do not respect sentence boundaries but sentences lock to token boundaries
- Add sentence initializer as a dependency to the token initializer
- Change the sentence anchoring mode to characters
reckart added a commit that referenced this issue Oct 18, 2021
…enization-editable

#1778 - Make tokenization editable
reckart added a commit that referenced this issue Oct 18, 2021
- Provide a clearer error message if the cross-sentence status of an annotation cannot be determined because the CAS is lacking sentences
reckart added a commit that referenced this issue Oct 18, 2021
- No need to merge tokens and sentences in CasMerge since they are preserved anyway
reckart added a commit that referenced this issue Oct 18, 2021
- Load SentenceLayerInitializer via auto-config instead of component scanning
reckart added a commit that referenced this issue Oct 19, 2021
* main:
  #2653 - ReferenceError: fragment is not defined
  #2655 - Curated CASes are exported to the wrong spot
  #2654 - Programmatically created AnnotationLayers are not curatable
  No issue. Mini-optimizatioin: load workload settings only once.
  No issue. Add JavaDoc.
  No issue. Fix potential race condition when creating workload management settings by synchronizing the loadOrCreateWorkloadManagerConfiguration method.
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  No issue. Bunch of small fixes
  #2656 - Constructor injection for most project exporters
reckart added a commit that referenced this issue Apr 15, 2022
- Slight cleaning up
reckart added a commit that referenced this issue Apr 15, 2022
- Add documentation for editable sentences
- Disable changing anchoring mode and overlap mode for sentences / tokens
- Disable coloring rules for sentences
reckart added a commit that referenced this issue Apr 15, 2022
- Use a line-oriented editor instead of a sentence-oriented editor while the sentence layer is not read-only
reckart added a commit that referenced this issue Apr 15, 2022
…enization-editable

#1778 - Make tokenization editable
reckart added a commit that referenced this issue Apr 15, 2022
…t-view' into feature/2869-Hugging-Face-Recommender-Prototype

* feature/2872-Allow-calling-a-recommender-for-the-current-view: (62 commits)
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #2987 - Error when pressing "add" when no user is selected
  #2912 - Upgrade dependencies (24.0)
  #2912 - Upgrade dependencies (24.0)
  #2983 - Upgrade dependencies (23.2)
  #2942 - Multi-value concept feature
  #2942 - Multi-value concept feature
  #2942 - Multi-value concept feature
  #2942 - Multi-value concept feature
  ...
reckart added a commit that referenced this issue Apr 15, 2022
* main: (161 commits)
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #2987 - Error when pressing "add" when no user is selected
  #2912 - Upgrade dependencies (24.0)
  #2912 - Upgrade dependencies (24.0)
  #2983 - Upgrade dependencies (23.2)
  #2942 - Multi-value concept feature
  #2942 - Multi-value concept feature
  #2942 - Multi-value concept feature
  #2942 - Multi-value concept feature
  ...
reckart added a commit that referenced this issue Apr 16, 2022
- Disable sentence layer by default
- Better checking if sentence/token layer are editable
- Better handling of curation CAS initialization when sentence layer is present but not editable
reckart added a commit that referenced this issue Apr 16, 2022
- Disable token layer by default
reckart added a commit that referenced this issue Apr 16, 2022
- Make sentence layer depend on tokens because sentences have token-granularity.
reckart added a commit that referenced this issue Apr 16, 2022
- Make sentence layer depend on tokens because sentences have token-granularity.
reckart added a commit that referenced this issue Apr 16, 2022
…enization-editable

#1778 - Make tokenization editable
reckart added a commit that referenced this issue Apr 16, 2022
…take-stacked-annotations

* main:
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
  #1778 - Make tokenization editable
reckart added a commit that referenced this issue Apr 16, 2022
- Allow auto-merging stacked annotations if stacking is enabled on the target layer
reckart added a commit that referenced this issue Jun 12, 2022
- Do not offer adding a token or sentence layer via the layer initializers menu on the project layers panel unless token/sentence editing is actually enabled
reckart added a commit that referenced this issue Jun 12, 2022
…ssion' of https://github.com/inception-project/inception into feature/3120-Improve-persistence-time-using-fast-compression

* 'feature/3120-Improve-persistence-time-using-fast-compression' of https://github.com/inception-project/inception:
  #1778 - Make tokenization editable
@reckart reckart added this to Kanban Aug 7, 2024
@reckart reckart moved this to 🔖 To do in Kanban Aug 7, 2024
reckart added a commit that referenced this issue Nov 23, 2024
- Allow splitting token (done as if doing a zero-width annotation)
- Allow deleting token (with other tokens expanding to take its place)
- Note that no adjustments of dependent tokens are performed yet!
reckart added a commit that referenced this issue Nov 23, 2024
- Pull segementation-handling code out into a separate class
reckart added a commit that referenced this issue Nov 23, 2024
reckart added a commit that referenced this issue Nov 24, 2024
- Modularize the context menu items
reckart added a commit that referenced this issue Nov 24, 2024
- Modularize the context menu items
reckart added a commit that referenced this issue Nov 24, 2024
…enization-editable

#1778 - Make tokenization editable
reckart added a commit that referenced this issue Nov 25, 2024
- Adjust token-attached annotations when a token is moved
reckart added a commit that referenced this issue Nov 26, 2024
- Added tests for TokenAttachedSpanChangeListener
- Refactor a bit to facilitate writing the test
- Cleaning up a bit
reckart added a commit that referenced this issue Nov 26, 2024
- Added tests for TokenAttachedSpanChangeListener
- Refactor a bit to facilitate writing the test
- Cleaning up a bit
reckart added a commit that referenced this issue Nov 26, 2024
…enization-editable

#1778 - Make tokenization editable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🔖 To do
Development

No branches or pull requests

3 participants