Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest #6

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from
Draft

Ingest #6

wants to merge 12 commits into from

Commits on Jun 20, 2023

  1. copy zika ingest

    j23414 authored and j23414 committed Jun 20, 2023
    Configuration menu
    Copy the full SHA
    55e13da View commit details
    Browse the repository at this point in the history
  2. adapt to ebola

    j23414 authored and j23414 committed Jun 20, 2023
    Configuration menu
    Copy the full SHA
    0233699 View commit details
    Browse the repository at this point in the history
  3. update ebola build to pull new ingest data

    j23414 authored and j23414 committed Jun 20, 2023
    Configuration menu
    Copy the full SHA
    5600aa7 View commit details
    Browse the repository at this point in the history
  4. refactor: flag intermediate files as temp

    Flags most of the intermediate representations of sequences and metadata as temporary files such that we only keep the final compressed outputs. 
    
    Deleting the intermediate files reduced the data directory size from 221MB to 584 KB.
    j23414 authored and j23414 committed Jun 20, 2023
    Configuration menu
    Copy the full SHA
    477979b View commit details
    Browse the repository at this point in the history
  5. refactor: serotypes

    Since serotype is annotated as a column in metadata, simplify intermediate filenames like `data/sequences_{serotype}.fasta` and `data/metadata_{serotype}.tsv` to `data/sequences.fasta` and `data/metadata.tsv`.
    j23414 authored and j23414 committed Jun 20, 2023
    Configuration menu
    Copy the full SHA
    e80815f View commit details
    Browse the repository at this point in the history
  6. docs: ingest directory

    In ingest/README.md, assume user is within ingest directory.
    j23414 authored and j23414 committed Jun 20, 2023
    Configuration menu
    Copy the full SHA
    a17b2f0 View commit details
    Browse the repository at this point in the history
  7. edit: build accepts and decompresses zst inputs

    j23414 authored and j23414 committed Jun 20, 2023
    Configuration menu
    Copy the full SHA
    fad2af9 View commit details
    Browse the repository at this point in the history
  8. docs: included a data provisioning section

    j23414 authored and j23414 committed Jun 20, 2023
    Configuration menu
    Copy the full SHA
    5c060a0 View commit details
    Browse the repository at this point in the history
  9. refactor: used explicit paths instead of references to the rules vari…

    …ables
    j23414 authored and j23414 committed Jun 20, 2023
    Configuration menu
    Copy the full SHA
    a1f36e2 View commit details
    Browse the repository at this point in the history
  10. edit: Use a permalink for each script

    This would allow us to version the software we use in this workflow without being affected by upstream changes until we want to bump the version. This design adds more maintenance to this workflow, but it also protects users against unexpected issues that are outside of their control.
    j23414 authored and j23414 committed Jun 20, 2023
    Configuration menu
    Copy the full SHA
    a157df7 View commit details
    Browse the repository at this point in the history
  11. fixup: change dengue to ebola

    j23414 authored and j23414 committed Jun 20, 2023
    Configuration menu
    Copy the full SHA
    5a2c5c4 View commit details
    Browse the repository at this point in the history
  12. Pick curl or wget based on availability

    j23414 authored and j23414 committed Jun 20, 2023
    Configuration menu
    Copy the full SHA
    6ad1636 View commit details
    Browse the repository at this point in the history