Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert gCNV WDLs to tar calls from all samples. #5225

Merged
merged 2 commits into from
Sep 27, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions scripts/cnv_wdl/cnv_common_tasks.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -287,8 +287,8 @@ task PostprocessGermlineCNVCalls {
calls_args=""
for index in ${dollar}{!gcnv_calls_tar_array[@]}; do
gcnv_calls_tar=${dollar}{gcnv_calls_tar_array[$index]}
mkdir -p CALLS_$index/SAMPLE_${sample_index}
tar xzf $gcnv_calls_tar -C CALLS_$index/SAMPLE_${sample_index}
mkdir CALLS_$index
tar xzf $gcnv_calls_tar -C CALLS_$index
cp ${dollar}{calling_configs_array[$index]} CALLS_$index/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to pass as input or untar any of these anymore:

   Array[File] calling_configs
   Array[File] denoising_configs
   Array[File] gcnvkernel_version
   Array[File] sharded_interval_lists

Look at the diff for cnv_common_tasks.wdl f0dddbc#diff-f3a83ecdd107a7c42aee06dc41842fd2

Copy link
Contributor Author

@samuelklee samuelklee Sep 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I removed them from the PostprocessGermlineCNVCalls task but left them as outputs of the GermlineCNVCaller tasks. Might be useful for debugging and we will need it for when we do address #4397.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added the counts as outputs of the scattered case workflow. Perhaps at some point we can flatten the outputs there. This is trivial for all outputs except the ploidy calls, which need to be uncompressed, concatenated, and compressed again.

cp ${dollar}{denoising_configs_array[$index]} CALLS_$index/
cp ${dollar}{gcnvkernel_version_array[$index]} CALLS_$index/
Expand Down Expand Up @@ -334,4 +334,4 @@ task PostprocessGermlineCNVCalls {
File genotyped_intervals_vcf = genotyped_intervals_vcf_filename
File genotyped_segments_vcf = genotyped_segments_vcf_filename
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ workflow CNVGermlineCaseScatteredWorkflow {

output {
Array[File] contig_ploidy_calls_tars = CNVGermlineCaseWorkflow.contig_ploidy_calls_tar
Array[Array[Array[File]]] gcnv_calls_tars = CNVGermlineCaseWorkflow.gcnv_calls_tars
Array[Array[File]] gcnv_calls_tars = CNVGermlineCaseWorkflow.gcnv_calls_tars
Array[Array[File]] gcnv_tracking_tars = CNVGermlineCaseWorkflow.gcnv_tracking_tars
Array[Array[File]] genotyped_intervals_vcf = CNVGermlineCaseWorkflow.genotyped_intervals_vcf
Array[Array[File]] genotyped_segments_vcf = CNVGermlineCaseWorkflow.genotyped_segments_vcf
Expand Down
17 changes: 5 additions & 12 deletions scripts/cnv_wdl/germline/cnv_germline_case_workflow.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -211,8 +211,6 @@ workflow CNVGermlineCaseWorkflow {
}
}

Array[Array[File]] call_tars_sample_by_shard = transpose(GermlineCNVCallerCaseMode.gcnv_call_tars)

scatter (sample_index in range(length(normal_bams))) {
call CNVTasks.PostprocessGermlineCNVCalls {
input:
Expand All @@ -221,7 +219,7 @@ workflow CNVGermlineCaseWorkflow {
gcnvkernel_version = GermlineCNVCallerCaseMode.gcnvkernel_version_json,
sharded_interval_lists = GermlineCNVCallerCaseMode.sharded_interval_list,
entity_id = CollectCounts.entity_id[sample_index],
gcnv_calls_tars = call_tars_sample_by_shard[sample_index],
gcnv_calls_tars = GermlineCNVCallerCaseMode.gcnv_calls_tar,
gcnv_model_tars = gcnv_model_tars,
allosomal_contigs = allosomal_contigs,
ref_copy_number_autosomal_contigs = ref_copy_number_autosomal_contigs,
Expand All @@ -238,7 +236,7 @@ workflow CNVGermlineCaseWorkflow {
Array[File] read_counts_entity_id = CollectCounts.entity_id
Array[File] read_counts = CollectCounts.counts
File contig_ploidy_calls_tar = DetermineGermlineContigPloidyCaseMode.contig_ploidy_calls_tar
Array[Array[File]] gcnv_calls_tars = GermlineCNVCallerCaseMode.gcnv_call_tars
Array[File] gcnv_calls_tars = GermlineCNVCallerCaseMode.gcnv_calls_tar
Array[File] gcnv_tracking_tars = GermlineCNVCallerCaseMode.gcnv_tracking_tar
Array[File] genotyped_intervals_vcf = PostprocessGermlineCNVCalls.genotyped_intervals_vcf
Array[File] genotyped_segments_vcf = PostprocessGermlineCNVCalls.genotyped_segments_vcf
Expand Down Expand Up @@ -364,7 +362,6 @@ task GermlineCNVCallerCaseMode {
# If optional output_dir not specified, use "out"
String output_dir_ = select_first([output_dir, "out"])

Int num_samples = length(read_count_files)
command <<<
set -e
mkdir ${output_dir_}
Expand Down Expand Up @@ -415,11 +412,7 @@ task GermlineCNVCallerCaseMode {
--caller-external-admixing-rate ${default="1.00" caller_external_admixing_rate} \
--disable-annealing ${default="false" disable_annealing}

CURRENT_SAMPLE=0
while [ $CURRENT_SAMPLE -lt ${num_samples} ]; do
tar czf case-gcnv-shard-${scatter_index}-sample-$CURRENT_SAMPLE-gcnv-calls.tar.gz -C ${output_dir_}/case-calls/SAMPLE_$CURRENT_SAMPLE .
let CURRENT_SAMPLE=CURRENT_SAMPLE+1
done
tar czf case-gcnv-calls-${scatter_index}.tar.gz -C ${output_dir_}/case-calls .
tar czf case-gcnv-tracking-${scatter_index}.tar.gz -C ${output_dir_}/case-tracking .
>>>

Expand All @@ -432,11 +425,11 @@ task GermlineCNVCallerCaseMode {
}

output {
File gcnv_calls_tar = "case-gcnv-calls-${scatter_index}.tar.gz"
File gcnv_tracking_tar = "case-gcnv-tracking-${scatter_index}.tar.gz"
File calling_config_json = "${output_dir_}/case-calls/calling_config.json"
File denoising_config_json = "${output_dir_}/case-calls/denoising_config.json"
File gcnvkernel_version_json = "${output_dir_}/case-calls/gcnvkernel_version.json"
File sharded_interval_list = "${output_dir_}/case-calls/interval_list.tsv"
Array[File] gcnv_call_tars = glob("*-gcnv-calls.tar.gz")
File gcnv_tracking_tar = "case-gcnv-tracking-${scatter_index}.tar.gz"
}
}
17 changes: 5 additions & 12 deletions scripts/cnv_wdl/germline/cnv_germline_cohort_workflow.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -259,8 +259,6 @@ workflow CNVGermlineCohortWorkflow {
}
}

Array[Array[File]] call_tars_sample_by_shard = transpose(GermlineCNVCallerCohortMode.gcnv_call_tars)

scatter (sample_index in range(length(CollectCounts.entity_id))) {
call CNVTasks.PostprocessGermlineCNVCalls {
input:
Expand All @@ -269,7 +267,7 @@ workflow CNVGermlineCohortWorkflow {
gcnvkernel_version = GermlineCNVCallerCohortMode.gcnvkernel_version_json,
sharded_interval_lists = GermlineCNVCallerCohortMode.sharded_interval_list,
entity_id = CollectCounts.entity_id[sample_index],
gcnv_calls_tars = call_tars_sample_by_shard[sample_index],
gcnv_calls_tars = GermlineCNVCallerCohortMode.gcnv_calls_tar,
gcnv_model_tars = GermlineCNVCallerCohortMode.gcnv_model_tar,
contig_ploidy_calls_tar = DetermineGermlineContigPloidyCohortMode.contig_ploidy_calls_tar,
allosomal_contigs = allosomal_contigs,
Expand All @@ -288,7 +286,7 @@ workflow CNVGermlineCohortWorkflow {
File contig_ploidy_model_tar = DetermineGermlineContigPloidyCohortMode.contig_ploidy_model_tar
File contig_ploidy_calls_tar = DetermineGermlineContigPloidyCohortMode.contig_ploidy_calls_tar
Array[File] gcnv_model_tars = GermlineCNVCallerCohortMode.gcnv_model_tar
Array[Array[File]] gcnv_calls_tars = GermlineCNVCallerCohortMode.gcnv_call_tars
Array[File] gcnv_calls_tars = GermlineCNVCallerCohortMode.gcnv_calls_tar
Array[File] gcnv_tracking_tars = GermlineCNVCallerCohortMode.gcnv_tracking_tar
Array[File] genotyped_intervals_vcfs = PostprocessGermlineCNVCalls.genotyped_intervals_vcf
Array[File] genotyped_segments_vcfs = PostprocessGermlineCNVCalls.genotyped_segments_vcf
Expand Down Expand Up @@ -426,7 +424,6 @@ task GermlineCNVCallerCohortMode {

# If optional output_dir not specified, use "out"
String output_dir_ = select_first([output_dir, "out"])
Int num_samples = length(read_count_files)

command <<<
set -e
Expand Down Expand Up @@ -487,11 +484,7 @@ task GermlineCNVCallerCohortMode {
--disable-annealing ${default="false" disable_annealing}

tar czf ${cohort_entity_id}-gcnv-model-${scatter_index}.tar.gz -C ${output_dir_}/${cohort_entity_id}-model .
CURRENT_SAMPLE=0
while [ $CURRENT_SAMPLE -lt ${num_samples} ]; do
tar czf ${cohort_entity_id}-shard-${scatter_index}-sample-$CURRENT_SAMPLE-gcnv-calls.tar.gz -C ${output_dir_}/${cohort_entity_id}-calls/SAMPLE_$CURRENT_SAMPLE .
let CURRENT_SAMPLE=CURRENT_SAMPLE+1
done
tar czf ${cohort_entity_id}-gcnv-calls-${scatter_index}.tar.gz -C ${output_dir_}/${cohort_entity_id}-calls .
tar czf ${cohort_entity_id}-gcnv-tracking-${scatter_index}.tar.gz -C ${output_dir_}/${cohort_entity_id}-tracking .
>>>

Expand All @@ -505,11 +498,11 @@ task GermlineCNVCallerCohortMode {

output {
File gcnv_model_tar = "${cohort_entity_id}-gcnv-model-${scatter_index}.tar.gz"
File gcnv_calls_tar = "${cohort_entity_id}-gcnv-calls-${scatter_index}.tar.gz"
File gcnv_tracking_tar = "${cohort_entity_id}-gcnv-tracking-${scatter_index}.tar.gz"
File calling_config_json = "${output_dir_}/${cohort_entity_id}-calls/calling_config.json"
File denoising_config_json = "${output_dir_}/${cohort_entity_id}-calls/denoising_config.json"
File gcnvkernel_version_json = "${output_dir_}/${cohort_entity_id}-calls/gcnvkernel_version.json"
File sharded_interval_list = "${output_dir_}/${cohort_entity_id}-calls/interval_list.tsv"
Array[File] gcnv_call_tars = glob("*-gcnv-calls.tar.gz")
File gcnv_tracking_tar = "${cohort_entity_id}-gcnv-tracking-${scatter_index}.tar.gz"
}
}