Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurp pipeline: Initial design #36

Merged
merged 8 commits into from
Sep 30, 2022
Merged

Slurp pipeline: Initial design #36

merged 8 commits into from
Sep 30, 2022

Commits on Sep 30, 2022

  1. Draft skeleton pipeline

    matentzn authored and joeflack4 committed Sep 30, 2022
    Configuration menu
    Copy the full SHA
    2e119ec View commit details
    Browse the repository at this point in the history
  2. Draft skeleton pipeline

    matentzn authored and joeflack4 committed Sep 30, 2022
    Configuration menu
    Copy the full SHA
    e9a7d85 View commit details
    Browse the repository at this point in the history
  3. Feature: Basic slurp pipeline

    - Update: Basic pseudo code in Python updated
    - Update: Makfile: Updating formatting.
    joeflack4 committed Sep 30, 2022
    Configuration menu
    Copy the full SHA
    3c7f908 View commit details
    Browse the repository at this point in the history
  4. Slurp pipeline

    - Add: makefile: Missing make goals for dependencies for slurp goal.
    - Add: Python: CLI
    - Update: makefile: Slurp goal: (i) named keys/vals for all params, (ii) standardization in file/path params.
    - Update: Python: Completed script, inspired by initial psuedo code. (WIP)
    - Add: Makefile: reports/mirror_signature-mondo.tsv: This depends on different prereqs than reports/mirror_signature-%.txv.
    - Add: Makefile: .db creation command(s).
    
    Misc updates
    - Update: makefile: python-install-dependencies: Now upgrades all dependencies to latest version.
    - .gitignore: (i) Added .db, (ii) src/ontology/components/*relation-graph*
    - requirements*.txt: (i) Added semsql, (ii) added curies, (iii) Upgraded various dependencies / transitive dependencies
    joeflack4 committed Sep 30, 2022
    Configuration menu
    Copy the full SHA
    273a10c View commit details
    Browse the repository at this point in the history
  5. Some fixes to make files

    1. Never depend on a PHONY goal, because they force the whole pipeline to be rebuild - just add them to each goal, installing a few python dependencies that are already installed is fast enough.
    2. PHONY goals cannot have wildcard dependencies: https://stackoverflow.com/questions/7887343/makefile-wildcard-static-rule-with-phony
    3. Indentation is critical with Make targets, your `# TODO: Temporary debugging:` was wrongly intented to the level of the target name, not the body.
    matentzn authored and joeflack4 committed Sep 30, 2022
    Configuration menu
    Copy the full SHA
    83e5a10 View commit details
    Browse the repository at this point in the history
  6. Slurp pipeline

    - Delete: `mondo-ingest.Makefile`: (i) `python-install-dependencies` phony target. This has been removed in favor of expediently updating the dependencies in ODK directly. (ii) Removed unused, commented out mondo component target.
    - Update: `.gitignore`: Simplified to ignoring whole dir `src/ontology/components/*`
    - Update: Python requirements: Removed `semsql` w/ specific version, due to some recent dependency fixes in OAK.
    
    Term exclusions
    - Update: Rename: reports/term_exclusions.txt -> reports/excluded_terms.txt
    joeflack4 committed Sep 30, 2022
    Configuration menu
    Copy the full SHA
    5d93feb View commit details
    Browse the repository at this point in the history
  7. Slurp pipeline

    - Delete: Initialized a variables for 'all relationships' and 'term parent map', but did a more OAK-reliant refactor and removed these.
    - Add: Term class
    - Update: Big refactor to utilize Term class
    - Add: Slurp output TSV files
    - Add: utils.py, which includes Term class
    - Add: New param: max_id
    
    Misc
    - Update: run.sh: This change will allow any Python package updates to be retained.
    - Add: SPARQL jinja query to get parents. Used by slurp pipeline.
    joeflack4 committed Sep 30, 2022
    Configuration menu
    Copy the full SHA
    95552cf View commit details
    Browse the repository at this point in the history
  8. Slurp pipeline

    - Update: direct_owned_parents methods: Temporarily commented out, pending OAK fixes. Currently doing this in batch using SPARQL for now.
    - Add: Param --onto-exclusions-path: Now we no longer consider exluded terms to be possible slurp candidates.
    - Bugfix: Labels: Now they are successfully being fetched using OAK.
    - Add: New columns to output
    
    Exclusions
    - Bugfix: Sometimes, when exclude_children was left empty in input config/%_exclusions.tsv, yielded empty generated exclusion tables.
    - Update: Updated exclusion tables for: DOID, OMIM. These had problematic source files with no values for exclude_children, and needed to be re-run after bugfix.
    
    Misc
    - Update: run.sh: Reverted back to what it was before, undoing deletion of --rf, which was done to address pip installs not being persistent within the ODK docker container between runs.
    joeflack4 committed Sep 30, 2022
    Configuration menu
    Copy the full SHA
    c27c449 View commit details
    Browse the repository at this point in the history