-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SA support and consistent naming for all GVS WDLs #7205
Conversation
filters: | ||
branches: | ||
- master | ||
- ah_var_store | ||
- name: funcotator | ||
- mmt_SA_support_and_wdl_renaming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: remove branch references before merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually will wait for a future PR so it doesn't break for anyone using these WDLs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great -- very thorough
--local-sort-max-records-in-ram ~{local_sort_max_records_in_ram} \ | ||
--sample-table ~{fq_sample_table} \ | ||
--cohort-extract-table ~{fq_cohort_extract_table} \ | ||
-L ~{intervals} \ | ||
~{"-XL " + excluded_intervals} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
just a few questions and items for discussion
df -h | ||
|
||
gatk --java-options "-Xmx9g" \ | ||
ExtractCohort \ | ||
--mode GENOMES --ref-version 38 --query-mode LOCAL_SORT \ | ||
-R "~{reference}" \ | ||
-O local.vcf.gz \ | ||
-O ~{output_file} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious about the difference between using double quotes in the line above and this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it doesn't matter - the same way you can enter command line string arguments either with or without quotes. but being consistent would be good!
"GvsExtractCallset.reference_index": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai", | ||
"GvsExtractCallset.reference_dict": "gs://gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.dict", | ||
"GvsExtractCallset.wgs_intervals": "gs://gcp-public-data--broad-references/hg38/v0/wgs_calling_regions.hg38.interval_list", | ||
"GvsExtractCallset.scatter_count": 50, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the filter extract uses a scatter count of 20. is the difference on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the difference was not on purpose! 20 and 50 were defaults from what i found and left unchanged. would love to think deliberately about what these should be.
@@ -0,0 +1,17 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name of the wdl is not plural. we should probably rename to GvsExtractCallset.example.inputs.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great catch! going to make a separate PR to fix this.
@@ -348,7 +349,9 @@ task CreateImportTsvs { | |||
~{for_testing_only} | |||
|
|||
if [ ~{has_service_account_file} = 'true' ]; then | |||
export GOOGLE_APPLICATION_CREDENTIALS=~{service_account_json} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i briefly saw something about adding this on one of the slack channels. I'd like to have a brief discussion so we can set up some conventions. Doing this has an impact on being able to stream inputs. Briefly if the tool runs as the service account, it should be able to stream files in service account buckets, but it can't stream files from the workspace bucket. when we don't set the environment variable, we specifically set localization_optional to true for the files in the service account buckets and then copy them to the vm after auth-ing as the service account. if we are going to set the env var, we should test that it can stream from the service account bucket and then not do the explicit copy. (and we also need to make sure that it is not trying to stream anything from the workspace bucket)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for bringing this up, I wondered about this as well - I did test this WDL with this change and using a SA and it worked fine - https://app.terra.bio/#workspaces/broad-dsp-spec-ops-fc/gvs_sa_testing/job_history/78d0610c-256a-46e4-9f81-fbc10cc2be06 - that had the gvcfs in a separate (non-workspace) bucket, but I did keep your gsutil cp
step in there, which is likely why we didn't encounter a streaming issue?
GvsImportGenomes.wdl
- renamed WDL, fixed issue with poorly returned bq load stringGvsCreateFilterSet.wdl
- renamed WDL, added SA support, fixed a lot of issues in ExtractFeatures tool surrounding permissions - BQ projectIDs are now being properly passed through, freq_table UDF defined in repo rather than in BQGvsPrepareCallset.wdl
(done in previous PR)GvsExtractCallset.wdl
- renamed WDL, added SA supportall 4 tested with SA in this Terra workspace: https://app.terra.bio/#workspaces/broad-dsp-spec-ops-fc/gvs_sa_testing
testing also without SA in this workspace: https://app.terra.bio/#workspaces/broad-dsp-spec-ops-fc/gvs_testing_no_sa