Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat withdrawn samples in sub-cohort prepare correctly [VS-772] #8156

Merged
merged 10 commits into from
Jan 23, 2023
1 change: 0 additions & 1 deletion .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,6 @@ workflows:
branches:
- master
- ah_var_store
- rsa_vs_749
- name: GvsCallsetStatistics
subclass: WDL
primaryDescriptorPath: /scripts/variantstore/wdl/GvsCallsetStatistics.wdl
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ task Add_AS_MAX_VQSLOD_ToVcf {
File input_vcf
String output_basename

String docker = "us.gcr.io/broad-dsde-methods/variantstore:2022-11-17-alpine"
String docker = "us.gcr.io/broad-dsde-methods/variantstore:2023-01-23-alpine"
Int cpu = 1
Int memory_mb = 3500
Int disk_size_gb = ceil(2*size(input_vcf, "GiB")) + 50
Expand Down
2 changes: 1 addition & 1 deletion scripts/variantstore/wdl/GvsCallsetCost.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ task WorkflowComputeCosts {
>>>

runtime {
docker: "us.gcr.io/broad-dsde-methods/variantstore:2022-11-17-alpine"
docker: "us.gcr.io/broad-dsde-methods/variantstore:2023-01-23-alpine"
}

output {
Expand Down
4 changes: 2 additions & 2 deletions scripts/variantstore/wdl/GvsCreateVATAnnotations.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ task ExtractAnAcAfFromVCF {
# ------------------------------------------------
# Runtime settings:
runtime {
docker: "us.gcr.io/broad-dsde-methods/variantstore:2022-11-17-alpine"
docker: "us.gcr.io/broad-dsde-methods/variantstore:2023-01-23-alpine"
maxRetries: 3
memory: "16 GB"
preemptible: 3
Expand Down Expand Up @@ -291,7 +291,7 @@ task PrepAnnotationJson {
# ------------------------------------------------
# Runtime settings:
runtime {
docker: "us.gcr.io/broad-dsde-methods/variantstore:2022-11-17-alpine"
docker: "us.gcr.io/broad-dsde-methods/variantstore:2023-01-23-alpine"
memory: "8 GB"
preemptible: 5
cpu: "1"
Expand Down
2 changes: 1 addition & 1 deletion scripts/variantstore/wdl/GvsCreateVATfromVDS.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@ task RemoveDuplicatesFromSitesOnlyVCF {
# ------------------------------------------------
# Runtime settings:
runtime {
docker: "us.gcr.io/broad-dsde-methods/variantstore:2022-10-25-alpine"
docker: "us.gcr.io/broad-dsde-methods/variantstore:2023-01-23-alpine"
maxRetries: 3
memory: "16 GB"
preemptible: 3
Expand Down
2 changes: 1 addition & 1 deletion scripts/variantstore/wdl/GvsExtractAvroFilesForHail.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ task GenerateHailScripts {
File hail_create_vat_inputs_script = 'hail_create_vat_inputs.py'
}
runtime {
docker: "us.gcr.io/broad-dsde-methods/variantstore:2022-11-17-alpine"
docker: "us.gcr.io/broad-dsde-methods/variantstore:2023-01-23-alpine"
disks: "local-disk 500 HDD"
}
}
4 changes: 3 additions & 1 deletion scripts/variantstore/wdl/GvsExtractCallset.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,8 @@ task ExtractTask {
String intervals_name = basename(intervals)
String cost_observability_line = if (write_cost_to_db == true) then "--cost-observability-tablename ~{cost_observability_tablename}" else ""

String inferred_reference_state = if (drop_state == "NONE") then "ZERO" else drop_state

command <<<
set -e
export GATK_LOCAL_JAR="~{default="/root/gatk.jar" gatk_override}"
Expand All @@ -292,7 +294,7 @@ task ExtractTask {
-O ~{output_file} \
--local-sort-max-records-in-ram ~{local_sort_max_records_in_ram} \
--sample-table ~{fq_samples_to_extract_table} \
~{"--inferred-reference-state " + drop_state} \
~{"--inferred-reference-state " + inferred_reference_state} \
-L ~{intervals} \
--project-id ~{read_project_id} \
~{true='--emit-pls' false='' emit_pls} \
Expand Down
2 changes: 1 addition & 1 deletion scripts/variantstore/wdl/GvsPopulateAltAllele.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,7 @@ task PopulateAltAlleleTable {
done
>>>
runtime {
docker: "us.gcr.io/broad-dsde-methods/variantstore:2022-11-17-alpine"
docker: "us.gcr.io/broad-dsde-methods/variantstore:2023-01-23-alpine"
memory: "3 GB"
disks: "local-disk 10 HDD"
cpu: 1
Expand Down
3 changes: 2 additions & 1 deletion scripts/variantstore/wdl/GvsPrepareRangesCallset.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,11 @@ task PrepareRangesCallsetTask {
>>>
output {
String fq_cohort_extract_table_prefix = "~{fq_destination_dataset}.~{destination_cohort_table_prefix}" # implementation detail of create_ranges_cohort_extract_data_table.py
String fq_cohort_extract_table_prefix = "~{fq_destination_dataset}.~{destination_cohort_table_prefix}" # implementation detail of create_ranges_cohort_extract_data_table.py
}

runtime {
docker: "us.gcr.io/broad-dsde-methods/variantstore:2022-11-17-alpine"
docker: "us.gcr.io/broad-dsde-methods/variantstore:2023-01-23-alpine"
memory: "3 GB"
disks: "local-disk 100 HDD"
bootDiskSizeGb: 15
Expand Down
2 changes: 1 addition & 1 deletion scripts/variantstore/wdl/GvsUtils.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -352,7 +352,7 @@ task ScaleXYBedValues {
}

runtime {
docker: "us.gcr.io/broad-dsde-methods/variantstore:2022-11-17-alpine"
docker: "us.gcr.io/broad-dsde-methods/variantstore:2023-01-23-alpine"
maxRetries: 3
memory: "7 GB"
preemptible: 3
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,12 +89,12 @@ def get_all_sample_ids(fq_destination_table_samples, only_output_vet_tables, fq_


def create_extract_samples_table(control_samples, fq_destination_table_samples, fq_sample_name_table,
fq_sample_mapping_table, honor_withdrawn: bool):
fq_sample_mapping_table, honor_withdrawn):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there a reason to remove the type ascription?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand why it was there and nowhere else, and it worked without it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah IMHO we probably should use these type hints liberally; I'm also unsure why this was the exception rather than the rule. Perhaps something to discuss going forward for coding standards.

These type hints are not enforced by the Python runtime so code will work the same with or without them. They're intended mostly for the benefit of developers so a Python-aware IDE can warn if an actual parameter of an incompatible type is being passed.


sql = f"""

CREATE OR REPLACE TABLE `{fq_destination_table_samples}` AS (
SELECT m.sample_id, m.sample_name, m.is_loaded, m.withdrawn, m.is_control FROM `{fq_sample_name_table}` s JOIN
SELECT m.sample_id, m.sample_name, m.is_loaded, {"m.withdrawn," if honor_withdrawn else "NULL as withdrawn,"} m.is_control FROM `{fq_sample_name_table}` s JOIN
`{fq_sample_mapping_table}` m ON (s.sample_name = m.sample_name) WHERE
m.is_loaded IS TRUE AND m.is_control = {control_samples}
{"AND m.withdrawn IS NULL" if honor_withdrawn else ""}
Expand Down