-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Callset statistics [VS-560] #8018
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## ah_var_store #8018 +/- ##
================================================
Coverage ? 86.226%
Complexity ? 35201
================================================
Files ? 2173
Lines ? 165004
Branches ? 17792
================================================
Hits ? 142277
Misses ? 16393
Partials ? 6334 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do the outputs go? I don't see anything in the ~{extract_prefix}_statistics table that got created in my test run.
Also, I kind of think a text file as output would be very useful for analysis/reporting.
exit 1 | ||
fi | ||
|
||
# Schemas extracted programatically: https://stackoverflow.com/a/66987934 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool!
singleton, | ||
pass_qc | ||
) | ||
SELECT "~{filter_set_name}" filter_set_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that none of the explanations are in the code that you are looking at to write this wdl, but I think getting Lee to add some context about what is being calculated would be really helpful. I'm fine with that being a future ticket
9a22b7a
to
99c60af
Compare
Reran it here and it succeeded, data looks good as far as I can tell. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, this workflow creates the statistics table but does not output the contents into a TSV or CSV, which is what we deliver along with the callset. Would it be possible to add an export to TSV to a specified GCS location to the CollectStatistics task (or a new one)?
On the good news front, I compared an export of the "statistics_table" to the callset stats file I generated for Beta and they matched! 👍🏻 (if you're curious, the run is https://app.terra.bio/#workspaces/allofus-drc-wgs-dev/AoU_DRC_WGS_12-6-21_beta_ingest/job_history/45a7764c-9f8f-49e3-b1f6-2bf28ac16b4b) |
6052b1d
to
d8b7470
Compare
now with export to CSV |
command <<< | ||
set -o errexit -o nounset -o xtrace -o pipefail | ||
|
||
bq query --nouse_legacy_sql --project_id=~{project_id} --format=csv ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably need to include a --max_rows
with the number of samples, otherwise the file will be limited to 100 rows (see https://stackoverflow.com/questions/34215311/how-bq-query-can-get-10000-rows)
513244f
to
0c53018
Compare
Successful Quickstart run here, has not yet been run on larger datasets.