-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incident VS 365 clinvar classification fix #7769
Conversation
c409998
to
0cf009a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented on the parts I felt I understood but there's a lot I still don't understand. 🙂
export GOOGLE_APPLICATION_CREDENTIALS=local.service_account.json | ||
gcloud auth activate-service-account --key-file=local.service_account.json | ||
|
||
gsutil cp ~{inputFileofFileNames} ~{updated_input_files} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks this line is meant to be outside the if
? If has_service_account_file
is not true
this command block will do nothing and the output expression for input_jsons
might not be happy about that. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 77 should save me in this scenario, but soon, no SA!
export GOOGLE_APPLICATION_CREDENTIALS=local.service_account.json | ||
gcloud auth activate-service-account --key-file=local.service_account.json | ||
|
||
gsutil cp ~{annotation_json} ~{updated_annotation_json} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar "should be outside if" situation here
String dataset_name | ||
File? vat_schema_json_file = "gs://broad-dsp-spec-ops/scratch/rcremer/Nirvana/schemas/vat_schema.json" | ||
File? variant_transcript_schema_json_file = "gs://broad-dsp-spec-ops/scratch/rcremer/Nirvana/schemas/vt_schema.json" | ||
File? genes_schema_json_file = "gs://broad-dsp-spec-ops/scratch/rcremer/Nirvana/schemas/genes_schema.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a stable location for these files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope---we need to have a conversation about where all the Nirvana stuff and VAT schemas should live
# ------------------------------------------------ | ||
# Runtime settings: | ||
runtime { | ||
docker: "us.gcr.io/broad-dsde-methods/variantstore:rc_vat_update_2022_05_06" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a heads up this date is in the future 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well if I keep getting distracted by other work....
echo "Loading data into a pre-vat table ~{dataset_name}.~{variant_transcript_table}" | ||
echo ~{vt_path} | ||
echo ~{genes_path} | ||
bq --location=US load --project_id=~{project_id} --source_format=NEWLINE_DELIMITED_JSON ~{dataset_name}.~{variant_transcript_table} ~{vt_path} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it okay to proceed with loading data into the variant transcript table if it already existed? The seemingly similar logic for the vat table on line 248 rm
s the table if it already existed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes---the logic was originally created such that we could keep adding data to the two temp tables--the Variant-Transcript table and the Genes table--and then a fresh VAT would be created based on that.
Now that the usage has changed a bit, reworking the logic and failsafes here would all be helpful
fi | ||
|
||
echo "Loading data into a pre-vat table ~{dataset_name}.~{genes_table}" | ||
bq --location=US load --project_id=~{project_id} --source_format=NEWLINE_DELIMITED_JSON ~{dataset_name}.~{genes_table} ~{genes_path} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar question here with genes table
# note: tab delimiter and compression creates tsv.gz files | ||
bq query --nouse_legacy_sql --project_id=~{project_id} \ | ||
'EXPORT DATA OPTIONS( | ||
uri="~{export_path}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going straight to GCS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes---and we'll need to create a new step that merges them all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the specs have changed, so it looks like we wont need to loop over chrs either)
bq --location=US mk --project_id=~{project_id} ~{dataset_name}.~{vat_table} ~{nirvana_schema} | ||
else | ||
bq rm -t -f --project_id=~{project_id} ~{dataset_name}.~{vat_table} | ||
bq --location=US mk --project_id=~{project_id} ~{dataset_name}.~{vat_table} ~{nirvana_schema} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mk
line could be outside the if
/else
to avoid duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes--I think this whole gate should be flipped actually
ff38131
to
f6f0d48
Compare
Codecov Report
@@ Coverage Diff @@
## ah_var_store #7769 +/- ##
================================================
Coverage ? 86.308%
Complexity ? 35194
================================================
Files ? 2170
Lines ? 164837
Branches ? 17775
================================================
Hits ? 142267
Misses ? 16248
Partials ? 6322 |
gcloud config set project ~{query_project_id} | ||
gsutil cp ~{service_account_json_path} local.service_account.json | ||
gcloud auth activate-service-account --key-file=local.service_account.json | ||
gcloud config set project ~{query_project_id} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preexisting issue but it seems like we mix 2 and 4 space indents in the command blocks of the same WDLs
memory: "8 GB" | ||
preemptible: 5 | ||
cpu: "1" | ||
disks: "local-disk 250 SSD" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you need SSD here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No---I'd ideally like to parameterize a bunch of these values
In the python file that transformed the annotation jsons into the jsons for big query ingest, there were two typos, preventing "pathogenic" and "likely pathogenic" from being used as values