-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WDL to extract Avro files for Hail import [VS-579] #7981
Conversation
Codecov Report
@@ Coverage Diff @@
## ah_var_store #7981 +/- ##
================================================
Coverage ? 86.247%
Complexity ? 35205
================================================
Files ? 2173
Lines ? 165016
Branches ? 17792
================================================
Hits ? 142321
Misses ? 16368
Partials ? 6327 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me. Example run anywhere?
Successful run linked in the description. |
Thanks - missed that. |
I missed exporting the tranche data 🙈, fixes incoming |
bq query --nouse_legacy_sql --project_id=~{project_id} " | ||
EXPORT DATA OPTIONS( | ||
uri='${avro_prefix}/vqsr_tranche/vqsr_tranche_*.avro', format='AVRO', compression='SNAPPY') AS | ||
SELECT model, truth_sensitivity, min_vqslod, filter_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're going to want to drop filter_name here, but I'm adding it to the questions for Tim this afternoon
} | ||
|
||
# Superpartitions have max size 4000. The inner '- 1' is so the 4000th (and multiples of 4000) sample lands in the | ||
# appropriate partition, the outer '+ 1' is to iterate over the correct number of partitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only thing that I'd want to see tested out (before the tie out) and I'm not sure of the best way to do this...maybe this can wait until we do this with 10k samples
Successful run here.