-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moving the WDL for importing array manifest to BQ #6860
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one change needed
# checking for != "build37Flag" skips the header row (we don't want that numbered) | ||
# only process rows with 29 fields - this skips some header info fields | ||
# also skip entries that are flagged, not matched or have index conflict | ||
awk -F ',' 'NF==29 && ($29!="ILLUMINA_FLAGGED" && $29!="INDEL_NOT_MATCHED" && $29!="INDEL_CONFLICT" && $29!="build37Flag") { flag=$29; if ($29=="PASS") flag="null"; print id++","$2","$9","$23","$24","$25","$26","$27","flag }' $TMP_SORTED > $TMP_PROC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id++ should be changed to ++id
this was in a recent PR of mine, but the rest looks correct.
Eventually we should add a test that verifies 1. there is no probe 0 and 2. that 2 different runs to this code yield the same probe->id map (these were both bugs in the past. and don't feel like you have to do that for this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely considering it a TODO to add tests that actually make sure the data that was imported is correct. I haven't figured out the best way to do this though. Maybe I need a GATK tool that pulls down a table and compares to an expected file? Or maybe by looking at the extracted VCF from the full "end-to-end" test we'll have enough coverage? I'm not sure if that part of the pipeline pulls down information in the manifest.
{ | ||
"ImportArrayManifest.extended_manifest_csv":"/home/travis/build/broadinstitute/gatk/src/test/resources/org/broadinstitute/hellbender/tools/variantdb/arrays/tiny_manifest.csv", | ||
"ImportArrayManifest.manifest_schema_json":"/home/travis/build/broadinstitute/gatk/scripts/variantstore_wdl/schemas/manifest_schema.json", | ||
"ImportArrayManifest.project_id":"broad-dsde-dev", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just copied from elsewhere? Seems like there should be a gatk-test google project...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the GATK test project AFAIK. It's used in the BigQueryUtils tests (in GATK).
String? docker | ||
} | ||
|
||
String docker_final = select_first([docker, "us.gcr.io/broad-gatk/gatk:4.1.7.0"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you overriding this? Or are we always testing with this static GATK version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not overriding it in this case because we're not using any GATK tools in this WDL. But maybe we should still be testing the current branch regardless? I know we'll need to for future WDLs (Ingest, calculate metrics, and extract will all use GATK tools that will need to be in the docker from the current branch)
* Copying wdl from variantstore repo * Adding tests and changes to WDL * addressing comments * adding readme
* Copying wdl from variantstore repo * Adding tests and changes to WDL * addressing comments * adding readme
* Copying wdl from variantstore repo * Adding tests and changes to WDL * addressing comments * adding readme
* Copying wdl from variantstore repo * Adding tests and changes to WDL * addressing comments * adding readme
* Copying wdl from variantstore repo * Adding tests and changes to WDL * addressing comments * adding readme
* Copying wdl from variantstore repo * Adding tests and changes to WDL * addressing comments * adding readme
I moved the WDL for importing the array manifest from the variantstore repo and added a test. The test here only checks that the WDL succeeded, it doesn't look a the results (yet). It's ingesting the manifest to a dataset with a 7 day TTL, so the tables eventually get cleaned up. That might be too long for this case, since it adds a table each time the test is run (so on push and PR).
I plan to add more of the "end-to-end" pipeline with more testing in the future using a similar scheme, so welcome feedback on the structure.