Consolidate various docs for AoU callset generation into one to rule them all [VS-553] #7971

rsasch · 2022-08-02T20:16:15Z

No description provided.

codecov · 2022-08-02T20:36:25Z

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@3e62331). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff                @@
##             ah_var_store     #7971   +/-   ##
================================================
  Coverage                ?   86.247%           
  Complexity              ?     35200           
================================================
  Files                   ?      2173           
  Lines                   ?    165016           
  Branches                ?     17793           
================================================
  Hits                    ?    142321           
  Misses                  ?     16368           
  Partials                ?      6327

gbggrant · 2022-08-02T21:16:00Z

scripts/variantstore/AOU_DELIVERABLES.md

+   - To optimize the GVS internal queries, each sample must have a unique and consecutive integer ID assigned. Running the `GvsAssignIds` will create a unique GVS ID for each sample (`sample_id`) and update the BQ `sample_info` table (creating it if it doesn't exist). This workflow takes care of creating the BQ `vet_*`, `ref_ranges_*` and `cost_observability` tables needed for the sample IDs generated.
+   - Run at the `sample set` level ("Step 1" in workflow submission) with a sample set of all the new samples to be included in the callset (created by the "Fetch WGS metadata for samples from list" notebook mentioned above).
+   - You will want to set the `external_sample_names` input based on the column in the workspace Data table, e.g. "this.samples.research_id".
+   - If new controls are being added, they need to be done in a separate run, with the `samples_are_controls` input set to "true" (the referenced Data columns may also be different, e.g. "this.control_samples.control_sample_id" instead of "this.samples.research_id").


Slightly confused as what this means - the external_sample_name may be pulled from the control_sample_id?

The external_sample_name workflow input is an array of strings, usually grabbed from the Terra Data store in the workspace. The columns in that store can be called anything, and in the past, control samples and participant samples were in different tables with different column names.

scripts/variantstore/AOU_DELIVERABLES.md

mcovarr · 2022-08-02T21:28:21Z

scripts/variantstore/AOU_DELIVERABLES.md

+  - **TBD VDS Extract WDL/notebook/??**
+- Run the "Fetch WGS metadata for samples from list" notebook after you have placed the file with the list of the new samples to ingest in a GCS location the notebook (running with your @pmi-ops account) will have access to.  This will grab the samples from the workspace where they were reblocked and bring them into this callset workspace.
+  - Set the `sample_list_file_path` variable in that notebook to the path of the file
+  - Run the "now that the data have been copied, you can make sample sets if you wish" step if you want to automatically break up the new samples into smaller sample sets.  Set the `SUBSET_SIZE` and `set_name` variables to customize.


Under what circumstances would we want to do this?

If you don't want to throw all 100K (or whatever number) samples at GvsImportGenomes at once.

scripts/variantstore/AOU_DELIVERABLES.md

mcovarr · 2022-08-02T21:43:28Z

scripts/variantstore/AOU_DELIVERABLES.md

+   - To optimize the GVS internal queries, each sample must have a unique and consecutive integer ID assigned. Running the `GvsAssignIds` will create a unique GVS ID for each sample (`sample_id`) and update the BQ `sample_info` table (creating it if it doesn't exist). This workflow takes care of creating the BQ `vet_*`, `ref_ranges_*` and `cost_observability` tables needed for the sample IDs generated.
+   - Run at the `sample set` level ("Step 1" in workflow submission) with a sample set of all the new samples to be included in the callset (created by the "Fetch WGS metadata for samples from list" notebook mentioned above).
+   - You will want to set the `external_sample_names` input based on the column in the workspace Data table, e.g. "this.samples.research_id".
+   - If new controls are being added, they need to be done in a separate run, with the `samples_are_controls` input set to "true" (the referenced Data columns may also be different, e.g. "this.control_samples.control_sample_id" instead of "this.samples.research_id").


How will we know if new controls are being added?

In the past, Lee/AoU has let us know that we should add additional (or different) controls to a callset.

mcovarr · 2022-08-02T21:44:31Z

scripts/variantstore/AOU_DELIVERABLES.md

+   - Run at the `sample set` level ("Step 1" in workflow submission).  You can either run this on a sample_set of all the samples and rely on the workflow logic to break it up into batches (or manually set the `load_data_batch_size` input) or run it on smaller sample_sets created by the "Fetch WGS metadata for samples from list" notebook mentioned above.  
+   - You will want to set the `external_sample_names`, `input_vcfs` and `input_vcf_indexes` inputs based on the columns in the workspace Data table, e.g. "this.samples.research_id", "this.samples.reblocked_gvcf_v2" and "this.samples.reblocked_gvcf_index_v2".
+3. `GvsWithdrawSamples` workflow
+   - Run if there are any samples to withdraw from the last callset.


How do we know if there are samples to withdraw?

We compare the sample list for the callset we are creating (which we get from Lee/Aou) with the samples already in the database.

mcovarr

one nomenclatural thing otherwise lgtm 👍

scripts/variantstore/AOU_DELIVERABLES.md

rsasch added 9 commits August 1, 2022 17:00

adding workspace setup steps

4f81630

finished setup, started on workflows

6e040de

first stab at the rest

7bf65b7

initial cleanup

51cf590

GvsCalculatePrecisionAndSensitivity stuff

23f06d7

anchor

8b30bdf

spit and polish

345c84e

consistency, please

50918db

more

455d056

rsasch requested review from gbggrant and mcovarr August 2, 2022 20:17

gbggrant approved these changes Aug 2, 2022

View reviewed changes

mcovarr reviewed Aug 2, 2022

View reviewed changes

PR feedback

0e3059a

rsasch requested a review from mcovarr August 3, 2022 14:09

Merge branch 'ah_var_store' into rsa_vs553_aou_docs

ffaf9ec

mcovarr approved these changes Aug 3, 2022

View reviewed changes

scripts/variantstore/AOU_DELIVERABLES.md Outdated Show resolved Hide resolved

PR feedback and dockstore yaml cleanup

674fd5c

rsasch merged commit 798d4e8 into ah_var_store Aug 3, 2022

rsasch deleted the rsa_vs553_aou_docs branch August 3, 2022 18:04

This was referenced Mar 17, 2023

lb merge gvs branch #8248

Closed

testing something, please ignore #8251

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate various docs for AoU callset generation into one to rule them all [VS-553] #7971

Consolidate various docs for AoU callset generation into one to rule them all [VS-553] #7971

rsasch commented Aug 2, 2022

codecov bot commented Aug 2, 2022 •

edited

Loading

gbggrant Aug 2, 2022

rsasch Aug 3, 2022

mcovarr Aug 2, 2022

rsasch Aug 3, 2022

mcovarr Aug 2, 2022

rsasch Aug 3, 2022 •

edited

Loading

mcovarr Aug 2, 2022

rsasch Aug 3, 2022

mcovarr left a comment

Consolidate various docs for AoU callset generation into one to rule them all [VS-553] #7971

Consolidate various docs for AoU callset generation into one to rule them all [VS-553] #7971

Conversation

rsasch commented Aug 2, 2022

codecov bot commented Aug 2, 2022 • edited Loading

Codecov Report

gbggrant Aug 2, 2022

Choose a reason for hiding this comment

rsasch Aug 3, 2022

Choose a reason for hiding this comment

mcovarr Aug 2, 2022

Choose a reason for hiding this comment

rsasch Aug 3, 2022

Choose a reason for hiding this comment

mcovarr Aug 2, 2022

Choose a reason for hiding this comment

rsasch Aug 3, 2022 • edited Loading

Choose a reason for hiding this comment

mcovarr Aug 2, 2022

Choose a reason for hiding this comment

rsasch Aug 3, 2022

Choose a reason for hiding this comment

mcovarr left a comment

Choose a reason for hiding this comment

codecov bot commented Aug 2, 2022 •

edited

Loading

rsasch Aug 3, 2022 •

edited

Loading