-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Japanese: suggestion for simple Negation expansion (for Adjectival Verbs ない & its conjugated forms) #33
Comments
In IRIS, Entity Vectors are emitted as sentence attributes (see the RAW data output), and no Path information is present. I have changed that in iKnow standalone, for simplicity, and used the Path output for emitting Entity Vectors in Japanese. It would not be that hard to generate Path data, and the corresponding path-expansion mechanism, together with Entity Vectors. The latter would become sentence attributes, the former replace the current EV's. This would mean an incompatible API change for Japanese of course. |
Thanks, @JosDenysGitHub. That was my next question, i.e., is it possible to emit both EV and Path data, so it's great to hear that it is possible. EVs can still be used to calculate Proximity, correct? |
EVs are the base for calculating proximity, I guess this should not change ? |
Correct. Proximity calculation should stay as is. |
Getting Japanese back in line with the other languages to return "regular" paths and emit EVs through a separate mechanism sounds the desirable long-term thing to do. The standalone engine nor the IRIS integration itself would be that much impacted, but applications built on top of IRIS that were expecting EVs from the PathAPI will get something different until they adapt to the new channel for EVs. @JosDenysGitHub : how does the "sentence attribute" representation of an EV look? |
In the RAW output, the EV looks like: |
Adding @JosDenysGitHub as an assignee for required engine change - to be worked on after higher priority issues. |
Hi @makorin0315, I’m really sorry for the late posting, and appreciate your kind support. I’m Rei Noguchi at Gunma University Hospital, and am researching about analysis of mainly medical text with iKnow. As discussed above, I strongly expect the function of “negation assignment” (identification of a word modified by a negation), which makes iKnow more powerful. As introduced above, I have preliminary implemented a negation assignment algorithm using "iknowpy" as noted in the image attached. This remains just a hypothetical level and is very simple algorithm, but superficially works well in my situation at this time.
Based on the preliminary results, in the proposal of #1 from @makorin0315, “3 entities (C-R-C) w/ Negation Marker“ seems to be able to cover the above-mentioned cases, and be acceptable for my situation. I'm looking forward to an implementation, and please let me know if you need a help in validation or discussion. |
Thank you @Rei-hub for your comment & response. The team has decided that the second approach, i.e., attribute expansion by enabling use of the CRC Path, would be do-able and better. For this approach, we first require some code changes by our developer, after which I can start making the linguistic updates. It make take sometime, but we will keep you posted on our progress. |
Thank you @makorin0315 for your quick response, and that is really good news for me and all users. The second approach sounds better and reasonable. That will be a major update and need much time, but I look forward to the release. I would be happy if there is anything I could help with. Thank you. |
The language model now outputs PathRelevant entities and simple spans for Negation attributes.
Entity Vectors are emitted as sentence attributes. Paths are supported like in other languages.
Paths with attributes are now emitted for Japanese, but there were compilation problems with Japanese paths on Linux.
PathConstruction is changed to PR in metadata.csv.
GenXML.py generates XML output for language model development. It uses iKnowXML.xsl for visualisation.
The updated script emits both Entity Vectors and Paths for Japanese.
The language model now outputs PathRelevant entities and simple spans for Negation attributes.
Update of ref_testing.py to comply with new output for Japanese. Update of raw output.
Un-doing Close until unit test issue is resolved and fix is validated. |
@makorin0315 and all |
NOTE: this is a suggestion/request that came from Dr. Rei Noguchi @ Gunma University Hospital.
BACKGROUND
In iKnow, Negation expansion is normally done using the Path, which for non-Japanese language is the word order in the Sentence. Since we developed Entity Vector as a special-case Path for Japanese, the order of entities within the Path is mostly different from how they appear within the Sentence. For this reason, we have not yet implemented Negation expansion beyond the boundaries of the entity that includes the Negation marker.
For example:
今週はレッスンはない。- There is no lesson this week.
Entity Vector - レッスン ない 今週
The two particles は are NonRelevant.
Because of the sentence structure, the word ない, which is present form of the Adjectival Verb meaning "doesn't exist" and a Negation marker, does not expand beyond itself. This is a problem, since it's no possible to know "what" is being negated without reading the entire sentence.
SIMPLE EXPANSION EXPERIMENT
Dr. Noguchi used the current iKnow Python interface to experiment with his medical data, which often uses simple sentence structures that almost resembles the format: XXX は (or が) ない (or なかった - past form of the same Adjectival Verb meaning "didn't exist").
EXPERIMENT:
In cases like above, expand Negation to the left to the Concept before the particle は or が, i.e., in above examples would be "XXX".
In addition, there are some sentences where XXX are replaced by "XXX1やXXX2”, meaning "XXX1 and/or XXX2". In such case, expand Negation to the left, all the way to the Concept before the particle や, i.e., "XXX1" (the first Concept).
His experiment suggested that, at least for his data, such expansion implementation is normally semantically correct and would give more meaningful result to his machine learning work, since it is clearer what exists and what doesn't exist. (For example: There was no fever vs. Patient had fever.)
INITIAL DISCUSSION
TECHNICAL APPROACHES
There are two different ways Negation expansion can be implemented.
The first approach is quicker, but may not be as useful longer-term. Any comment or additional consideration that I'm missing? @ISC-SDE @bdeboe @JosDenysGitHub @woodfinisc
The text was updated successfully, but these errors were encountered: