Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurp pipeline: Initial design #36

Merged
merged 8 commits into from
Sep 30, 2022
Merged

Slurp pipeline: Initial design #36

merged 8 commits into from
Sep 30, 2022

Conversation

matentzn
Copy link
Member

@matentzn matentzn commented Jul 17, 2022

Updates

commit e5575265457308e40626870dbdee9de791cdcac0

    Slurp pipeline
    - Update: direct_owned_parents methods: Temporarily commented out, pending OAK fixes. Currently doing this in batch using SPARQL for now.
    - Add: Param --onto-exclusions-path: Now we no longer consider exluded terms to be possible slurp candidates.
    - Bugfix: Labels: Now they are successfully being fetched using OAK.
    - Add: New columns to output

    Exclusions
    - Bugfix: Sometimes, when exclude_children was left empty in input config/%_exclusions.tsv, yielded empty generated exclusion tables.
    - Update: Updated exclusion tables for: DOID, OMIM. These had problematic source files with no values for exclude_children, and needed to be re-run after bugfix.

    Misc
    - Update: run.sh: Reverted back to what it was before, undoing deletion of --rf, which was done to address pip installs not being persistent within the ODK docker container between runs.
commit 5fd00c4d3eaa5741b1d2c229166a686ac4000076

    Slurp pipeline
    - Delete: Initialized a variables for 'all relationships' and 'term parent map', but did a more OAK-reliant refactor and removed these.
    - Add: Term class
    - Update: Big refactor to utilize Term class
    - Add: Slurp output TSV files
    - Add: utils.py, which includes Term class
    - Add: New param: max_id

    Misc
    - Update: run.sh: This change will allow any Python package updates to be retained.
    - Add: SPARQL jinja query to get parents. Used by slurp pipeline.
commit d3e1c743e15f0e76a060793fc075e799041561cf

Slurp pipeline
- Delete: Initialized a variables for 'all relationships' and 'term parent map', but did a more OAK-reliant refactor and removed these.
commit f50d5809e71cb0bb02147e1568ab443bb0ee7438

Slurp pipeline
- Delete: `mondo-ingest.Makefile`: (i) `python-install-dependencies` phony target. This has been removed in favor of expediently updating the dependencies in ODK directly. (ii) Removed unused, commented out mondo component target.
- Update: `.gitignore`: Simplified to ignoring whole dir `src/ontology/components/*`
- Update: Python requirements: Removed `semsql` w/ specific version, due to some recent dependency fixes in OAK.

Term exclusions
- Update: Rename: reports/term_exclusions.txt -> reports/excluded_terms.txt
commit e5fab07c1638f8055a5f73396aa38f575675980a

Some fixes to make files

1. Never depend on a PHONY goal, because they force the whole pipeline to be rebuild - just add them to each goal, installing a few python dependencies that are already installed is fast enough.
2. PHONY goals cannot have wildcard dependencies: https://stackoverflow.com/questions/7887343/makefile-wildcard-static-rule-with-phony
3. Indentation is critical with Make targets, your `# TODO: Temporary debugging:` was wrongly intented to the level of the target name, not the body.

commit 1e7929bdef740e55223b4b32cb116ca5cbe800d4
Slurp pipeline
- Add: makefile: Missing make goals for dependencies for slurp goal.
- Add: Python: CLI
- Update: makefile: Slurp goal: (i) named keys/vals for all params, (ii) standardization in file/path params.
(before above commits)
- Add: makefile goals
- Add: initial python script

Related to

@joeflack4 joeflack4 force-pushed the slurpdraft branch 4 times, most recently from 5b87e9b to 384652c Compare July 28, 2022 21:16
@joeflack4 joeflack4 changed the title Skeleton for simple slurp Slurp pipeline: Initial design Aug 3, 2022
@joeflack4 joeflack4 marked this pull request as draft August 3, 2022 20:05
@joeflack4 joeflack4 self-assigned this Aug 3, 2022
@joeflack4 joeflack4 added the enhancement New feature or request label Aug 3, 2022
@joeflack4 joeflack4 linked an issue Aug 3, 2022 that may be closed by this pull request
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Show resolved Hide resolved
@joeflack4 joeflack4 force-pushed the slurpdraft branch 2 times, most recently from c5e08e0 to 3356948 Compare August 3, 2022 21:00
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Outdated Show resolved Hide resolved
src/ontology/mondo-ingest.Makefile Show resolved Hide resolved
@joeflack4 joeflack4 force-pushed the slurpdraft branch 3 times, most recently from 23d49b4 to e54f171 Compare August 3, 2022 21:40
src/scripts/migrate.py Outdated Show resolved Hide resolved
@joeflack4 joeflack4 force-pushed the slurpdraft branch 2 times, most recently from de81b23 to 630d6d8 Compare August 5, 2022 01:26
@joeflack4 joeflack4 force-pushed the slurpdraft branch 2 times, most recently from 6d4460c to 04d5379 Compare September 30, 2022 02:12
src/ontology/run.sh Outdated Show resolved Hide resolved
@joeflack4 joeflack4 force-pushed the slurpdraft branch 3 times, most recently from 1f3ee92 to d2eee94 Compare September 30, 2022 18:29
matentzn and others added 7 commits September 30, 2022 15:56
- Update: Basic pseudo code in Python updated
- Update: Makfile: Updating formatting.
- Add: makefile: Missing make goals for dependencies for slurp goal.
- Add: Python: CLI
- Update: makefile: Slurp goal: (i) named keys/vals for all params, (ii) standardization in file/path params.
- Update: Python: Completed script, inspired by initial psuedo code. (WIP)
- Add: Makefile: reports/mirror_signature-mondo.tsv: This depends on different prereqs than reports/mirror_signature-%.txv.
- Add: Makefile: .db creation command(s).

Misc updates
- Update: makefile: python-install-dependencies: Now upgrades all dependencies to latest version.
- .gitignore: (i) Added .db, (ii) src/ontology/components/*relation-graph*
- requirements*.txt: (i) Added semsql, (ii) added curies, (iii) Upgraded various dependencies / transitive dependencies
1. Never depend on a PHONY goal, because they force the whole pipeline to be rebuild - just add them to each goal, installing a few python dependencies that are already installed is fast enough.
2. PHONY goals cannot have wildcard dependencies: https://stackoverflow.com/questions/7887343/makefile-wildcard-static-rule-with-phony
3. Indentation is critical with Make targets, your `# TODO: Temporary debugging:` was wrongly intented to the level of the target name, not the body.
- Delete: `mondo-ingest.Makefile`: (i) `python-install-dependencies` phony target. This has been removed in favor of expediently updating the dependencies in ODK directly. (ii) Removed unused, commented out mondo component target.
- Update: `.gitignore`: Simplified to ignoring whole dir `src/ontology/components/*`
- Update: Python requirements: Removed `semsql` w/ specific version, due to some recent dependency fixes in OAK.

Term exclusions
- Update: Rename: reports/term_exclusions.txt -> reports/excluded_terms.txt
- Delete: Initialized a variables for 'all relationships' and 'term parent map', but did a more OAK-reliant refactor and removed these.
- Add: Term class
- Update: Big refactor to utilize Term class
- Add: Slurp output TSV files
- Add: utils.py, which includes Term class
- Add: New param: max_id

Misc
- Update: run.sh: This change will allow any Python package updates to be retained.
- Add: SPARQL jinja query to get parents. Used by slurp pipeline.
@joeflack4 joeflack4 force-pushed the slurpdraft branch 5 times, most recently from 62b153d to fd6f901 Compare September 30, 2022 21:56
- Update: direct_owned_parents methods: Temporarily commented out, pending OAK fixes. Currently doing this in batch using SPARQL for now.
- Add: Param --onto-exclusions-path: Now we no longer consider exluded terms to be possible slurp candidates.
- Bugfix: Labels: Now they are successfully being fetched using OAK.
- Add: New columns to output

Exclusions
- Bugfix: Sometimes, when exclude_children was left empty in input config/%_exclusions.tsv, yielded empty generated exclusion tables.
- Update: Updated exclusion tables for: DOID, OMIM. These had problematic source files with no values for exclude_children, and needed to be re-run after bugfix.

Misc
- Update: run.sh: Reverted back to what it was before, undoing deletion of --rf, which was done to address pip installs not being persistent within the ODK docker container between runs.
@joeflack4
Copy link
Contributor

@matentzn This afternoon, did: (1) adding of genes to OMIM exclusions, re-ran exclusions pipeline and slurp, (2) re-ran slurp for all ontologies, following our addition/changes to columns.

As you suggested, gonna merge this now. I imagine I'll be making continuous tweaks to the slurp pipeline very soon, but I'll make new (hopefully a lot smaller) PRs for that.

@joeflack4 joeflack4 merged commit e72fc9b into main Sep 30, 2022
@joeflack4 joeflack4 deleted the slurpdraft branch September 30, 2022 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Very basic slurping pipeline: initial design
2 participants