Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Exclusion tables #32

Merged
merged 8 commits into from
Jul 27, 2022
Merged

Feature: Exclusion tables #32

merged 8 commits into from
Jul 27, 2022

Conversation

joeflack4
Copy link
Contributor

@joeflack4 joeflack4 commented Jun 21, 2022

commit be60334abcc55cbee3308d34b1e87cbf8fcacf95
    Feature: Exclusion tables
    - Update: TODOs
    - Update: Utilizing `base_prefix_map` now instead of `prefix_map` from ontology metadata `.yml` files.
    - Update: get-terms-children.sparql.jinja2: Always using rdfs:subClassOf* instead of a + at the end. Updated Python code to reflect.
    - Update: Makefile target: Now explicitly passes params again.
    - Added: New exclusion files: ORDO, DO. Still pending NCIT after resolving Java heap space issue.
    - Update: Robot template output: 'AI rdfs:seeAlso' --> 'AI MONDOREL:has_exclusion_reason'
commit 6df216a9c319194f4e4faaf571d77fce8ecdfa2b
    Feature: Exclusion tables
    - Added new command: 'python-install-dependencies'
    - Updated makefile commands to use new dependency 'python-install-dependencies'
    General
    - Some changes to makefile and a docs file related to running 'update_repo'
commit 4ffa8a00b2eaf53f760fa36a310a0ced62c4a5af
    Feature: Exclusion tables
    - Optionalized params: Now any path params will get populated using the required --onto-name param should they be absent.
    - CodeStyle: PEP8 compliance update
    - Script location: Simplified
    - Ontology config .yml updates: Added missing prefix_map items
    - Script now utilizes config.yml files for each ontology in order to filter out relevant prefixes.
    - doid_relevant_signature.sparql: added missing prefix
    - added warning for excluded terms that appear in relevant_signature file
    - Now includes non-inclusions, utilizing mirror_signature and component_signat
ure.
commit 3f9edcb97816215b0e7924345442d09ccad21af0
    Feature: Exclusion tables
    - Add: ORDO exclusions
    - Update: refactor (i) comments, (ii) TODOs
    - Update: SPARQL: (i) removed the returning of term labels, (ii) utilized rdfs:label instead of skos:prefLabel, (iii) removed unnecessary prefixes.
    - Update: Used tmp/ dir for cache.

    Feature: ORDO Mapping annotations
    - Update: Use tmp/ dir for cache.
commit a22d82f60f8975760c517e72ce5b67032cb75192
    Feature: Exclusion tables
    - Update: Make goal: Fixed .owl file dependency using 'relevant signature' file.
    - Update: ICD10CM exclusion table TSV: Replaced arbitrarily expanded list back to original intensional list given by curators.
    - Update: Script now uses --relevant-signature param
    - Update: Script now reports raises an error when conflict in `exclude_children` between parent and child classes.
    - Bugfix: Trimmed whitespace programmatically from input exclusion file.
    - Bugfix: Trimmed whitespace manually in input exclusion file.
    - Comments/todo's: (i) added a todo for possible edge case issue w/ exclude_children
    - Update: Added param `--onto-name` to script. This is good for future proofing. Previously, I was
extracting the ontology name from one of the file names that was passed as a param. However, as our
 pipeline changes, those file names are also subject to change. As such, it is more stable to pass
the ontology name explicitly.
commit ae1c3950f909a8134cde331bda5694a44dc6929d
    Feature: Exclusion tables
    - Updates to existing exclusion query tables: Added header, including missing columns, but did not yet populate data in those columns.
    - Added new exclusion query tables for the rest of the ontologies.
    - Added exclusion results for ICD10CM.
    - Added Python script to work w/ these tables: term_range_expansion.py
    - Added SPARQL Jinja2 scripts to be called by Python script.
    - Added make goal to call Python script.
    - requirements: Updated Python requirements files to include OAK.

@joeflack4 joeflack4 marked this pull request as draft June 21, 2022 21:38
@joeflack4 joeflack4 requested a review from matentzn June 21, 2022 21:38
@joeflack4 joeflack4 self-assigned this Jun 21, 2022
@joeflack4 joeflack4 added the enhancement New feature or request label Jun 21, 2022
@joeflack4 joeflack4 force-pushed the exclusion-terms branch 3 times, most recently from 571b416 to d96a4d2 Compare June 22, 2022 00:09
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
src/scripts/term_range_expansion.py Outdated Show resolved Hide resolved
src/scripts/term_range_expansion.py Outdated Show resolved Hide resolved
src/scripts/term_range_expansion.py Outdated Show resolved Hide resolved
@joeflack4 joeflack4 force-pushed the exclusion-terms branch 2 times, most recently from 7b9b5c2 to b2c42d2 Compare June 22, 2022 01:38
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
src/sparql/get-terms-children.sparql.jinja2 Outdated Show resolved Hide resolved
src/sparql/get-terms.sparql.jinja2 Outdated Show resolved Hide resolved
@matentzn
Copy link
Member

Nice looking work. I think we should have a reports/term-metadata-%.tsv goal which has ids, labels & definitions. We need that for statistics later on anyways. If you feel its unnecessary now, its ok.

@joeflack4
Copy link
Contributor Author

Re: reports/term-metadata-%.tsv @matentzn : Hmm, yeah I don't mind working on this stuff, but this seems not necessarily related to exclusion tables. I'm assuming you want to maintain a full list of all of the term IDs, labels, and definitions in the TSV for every ontology. Can you make a new GitHub issue, list out the requirements for that, and assign it to me?

@joeflack4 joeflack4 force-pushed the exclusion-terms branch 4 times, most recently from 675cc90 to e9807bb Compare June 23, 2022 01:58
@joeflack4 joeflack4 marked this pull request as ready for review June 23, 2022 02:03
src/ontology/config/icd10cm_exclusion_reasons.tsv Outdated Show resolved Hide resolved
src/ontology/config/icd10cm_term_exclusions.txt Outdated Show resolved Hide resolved
src/ontology/config/icd10cm_exclusions.tsv Outdated Show resolved Hide resolved
src/sparql/get-terms-by-regex.sparql.jinja2 Outdated Show resolved Hide resolved
@joeflack4 joeflack4 force-pushed the exclusion-terms branch 5 times, most recently from b5da25f to dc40b8f Compare June 29, 2022 13:26
@matentzn
Copy link
Member

Looked at this today, @joeflack4 still working on adding the relevant signature logic discussed above.

@matentzn
Copy link
Member

matentzn commented Jun 29, 2022

@joeflack4 joeflack4 force-pushed the exclusion-terms branch 2 times, most recently from 6ba68b0 to c779ef4 Compare June 29, 2022 14:47
@joeflack4 joeflack4 force-pushed the exclusion-terms branch 3 times, most recently from 8765217 to 7598248 Compare July 22, 2022 16:58
src/ontology/Makefile Outdated Show resolved Hide resolved
- Updates to existing exclusion query tables: Added header, including missing columns, but did not yet populate data in those columns.
- Added new exclusion query tables for the rest of the ontologies.
- Added exclusion results for ICD10CM.
- Added Python script to work w/ these tables: term_range_expansion.py
- Added SPARQL Jinja2 scripts to be called by Python script.
- Added make goal to call Python script.
- requirements: Updated Python requirements files to include OAK.
- Update: Make goal: Fixed .owl file dependency using 'relevant signature' file.
- Update: ICD10CM exclusion table TSV: Replaced arbitrarily expanded list back to original intensional list given by curators.
- Update: Script now uses --relevant-signature param
- Update: Script now reports raises an error when conflict in `exclude_children` between parent and child classes.
- Bugfix: Trimmed whitespace programmatically from input exclusion file.
- Bugfix: Trimmed whitespace manually in input exclusion file.
- Comments/todo's: (i) added a todo for possible edge case issue w/ exclude_children
- Update: Added param `--onto-name` to script. This is good for future proofing. Previously, I was extracting the ontology name from one of the file names that was passed as a param. However, as our pipeline changes, those file names are also subject to change. As such, it is more stable to pass the ontology name explicitly.
- Update: Removed .owl -> .ttl hack, as now we have .owl files for ICD10CM and ICD10WHO / all ontologies.
- Add: ORDO exclusions
- Update: refactor (i) comments, (ii) TODOs
- Update: SPARQL: (i) removed the returning of term labels, (ii) utilized rdfs:label instead of skos:prefLabel, (iii) removed unnecessary prefixes.
- Update: Used tmp/ dir for cache.

Feature: ORDO Mapping annotations
- Update: Use tmp/ dir for cache.
- Optionalized params: Now any path params will get populated using the required --onto-name param should they be absent.
- CodeStyle: PEP8 compliance update
- Script location: Simplified
- Ontology config .yml updates: Added missing prefix_map items
- Script now utilizes config.yml files for each ontology in order to filter out relevant prefixes.
- doid_relevant_signature.sparql: added missing prefix
- added warning for excluded terms that appear in relevant_signature file
- Now includes non-inclusions, utilizing mirror_signature and component_signature.
- Added new command: 'python-install-dependencies'
- Updated makefile commands to use new dependency 'python-install-dependencies'

General
- Some changes to makefile and a docs file related to running 'update_repo'
@joeflack4 joeflack4 force-pushed the exclusion-terms branch 6 times, most recently from be60334 to 7574313 Compare July 25, 2022 22:41
@joeflack4 joeflack4 marked this pull request as ready for review July 25, 2022 22:42
@joeflack4
Copy link
Contributor Author

joeflack4 commented Jul 25, 2022

As of just now, I have went through and confirmed all open review threads have been resolved. I've now unmarked it as a draft.


NCIT Java heap space issue

There is a new issue, but it is not related to the code in this PR. As @matentzn mentioned, I might need to increase my Java memory to resolve this issue:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

make config/ncit_term_exclusions.txt
if [ true  = true ]; then robot --catalog catalog-v001.xml merge -I http://purl.obolibrary.org/obo/ncit.owl \
	annotate --ontology-iri http://purl.obolibrary.org/obo/mondo-ingest/component-download-ncit.owl annotate -V http://purl.obolibrary.org/obo/mondo-ingest/releases/2022-07-25/component-download-ncit.owl --annotation owl:versionInfo 2022-07-25 -o tmp/component-download-ncit.owl.owl; fi
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
make: *** [component-download-ncit.owl] Error 1

What I tried:
I looked through the mondo-ingest.makefile code to try to find anywhere to add something like java -Xmx2048m (or higher), but I couldn't find anywhere to do that. I think it will be more efficient if @matentzn can instruct me on this.

FYI for my future self:
If it says 'killed', that means I gave it too much Java space for too little Docker. In run.sh, ODK_JAVA_OPTS=-Xmx20G

Can fix this via:
a. ODK_JAVA_OPTS=-Xmx20G sh run.sh <command>
b. better: sh run.sh make IMP=false <command>

- Update: TODOs
- Update: Utilizing `base_prefix_map` now instead of `prefix_map` from ontology metadata `.yml` files.
- Update: get-terms-children.sparql.jinja2: Always using rdfs:subClassOf* instead of a + at the end. Updated Python code to reflect.
- Update: Makefile target: Now explicitly passes params again.
- Added: New exclusion files: ORDO, DO. Still pending NCIT after resolving Java heap space issue.
- Update: Robot template output: 'AI rdfs:seeAlso' --> 'AI MONDOREL:has_exclusion_reason'
@matentzn matentzn merged commit 070ca6e into main Jul 27, 2022
@matentzn matentzn deleted the exclusion-terms branch July 27, 2022 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move exclusion term logic to mondo-ingest repo
3 participants