Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #47

Merged
merged 28 commits into from
Jan 13, 2020
Merged

Dev #47

Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
733d7e5
Updated to gatk4.1, Update VariantRecal syntax
Feb 22, 2019
3f5f6e8
Merge branch 'master' of github.com:gatk-workflows/gatk4-germline-snp…
Feb 22, 2019
fe77d80
added samtools path variable
Feb 22, 2019
dd632f0
Updated wording
Feb 23, 2019
bff1267
add a version of joint-discovery that takes in an array for gvcfs
Feb 24, 2019
082df3b
Update README.md
Feb 28, 2019
05ce4ef
Merge branch 'master' into dev
Feb 28, 2019
093e478
increased mem for SNPsVariantRecalibrator for papiv2
May 3, 2019
329f152
increased mem for SNPsVariantRecalibrator for papiv2
May 3, 2019
7184b70
increased memory to run on papiv2
May 6, 2019
6304e56
correction to variable names in json
Jul 16, 2019
447d0b7
Merge branch 'master' into dev
Oct 30, 2019
411c494
corrected parameter syntax for SNPsVariantRecalibrator task
Oct 30, 2019
37b019e
Merge branch 'master' of github.com:gatk-workflows/gatk4-germline-snp…
Oct 31, 2019
043e443
Updated haplotypecaller to WDL 1.0, removed comments from haplotypeca…
Nov 1, 2019
c4ce8f6
Added WDL 1.0 version of JointGenotyping to soon replace joint discov…
Nov 1, 2019
e189a70
Update genotype2develop (#44)
Dec 2, 2019
0e2d9c2
Update genotype2develop (#45)
Dec 2, 2019
481a177
Update genotype2develop (#46)
Dec 2, 2019
c61c128
Update genotype2develop (#48)
Dec 3, 2019
cd1b265
removed unnecessary optional override variables
Dec 6, 2019
7da5824
replaced '$' with '~'
Dec 18, 2019
976988f
updated broad reference bucket path
Jan 8, 2020
30d95c1
Add important notes regarding JointGenotype workflow to Readme
Jan 9, 2020
dae5566
minor update to Readme, renamed input file name in json
Jan 9, 2020
87b6198
minor spelling
Jan 9, 2020
3416bbd
Added allele-specific annotations to HC command
Jan 13, 2020
fec5975
Updated import url to the next release tag
Jan 13, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
582 changes: 582 additions & 0 deletions JointGenotyping-terra.wdl

Large diffs are not rendered by default.

41 changes: 41 additions & 0 deletions JointGenotyping.hg38.wgs.inputs.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
{
"JointGenotyping.sample_name_map": "gs://gatk-test-data/joint_discovery/1kg_50_hg38/gvcf/hg38_1kg_50.sample_map",
"JointGenotyping.callset_name": "hg38_1kg_50",
"JointGenotyping.unbounded_scatter_count_scale_factor": 2.5,
"JointGenotyping.SplitIntervalList.scatter_mode": "INTERVAL_SUBDIVISION",

"JointGenotyping.unpadded_intervals_file": "gs://broad-references/hg38/v0/hg38.even.handcurated.20k.intervals",
"JointGenotyping.ref_fasta": "gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta",
"JointGenotyping.ref_fasta_index": "gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai",
"JointGenotyping.ref_dict": "gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dict",
"JointGenotyping.eval_interval_list": "gs://broad-references/hg38/v0/wgs_evaluation_regions.hg38.interval_list",
"JointGenotyping.haplotype_database": "gs://broad-references/hg38/v0/Homo_sapiens_assembly38.haplotype_database.txt",

"JointGenotyping.axiomPoly_resource_vcf": "gs://broad-references/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz",
"JointGenotyping.axiomPoly_resource_vcf_index": "gs://broad-references/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz.tbi",
"JointGenotyping.dbsnp_vcf": "gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf",
"JointGenotyping.dbsnp_vcf_index": "gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx",
"JointGenotyping.hapmap_resource_vcf": "gs://broad-references/hg38/v0/hapmap_3.3.hg38.vcf.gz",
"JointGenotyping.hapmap_resource_vcf_index": "gs://broad-references/hg38/v0/hapmap_3.3.hg38.vcf.gz.tbi",
"JointGenotyping.mills_resource_vcf": "gs://broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
"JointGenotyping.mills_resource_vcf_index": "gs://broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi",
"JointGenotyping.omni_resource_vcf": "gs://broad-references/hg38/v0/1000G_omni2.5.hg38.vcf.gz",
"JointGenotyping.omni_resource_vcf_index": "gs://broad-references/hg38/v0/1000G_omni2.5.hg38.vcf.gz.tbi",
"JointGenotyping.one_thousand_genomes_resource_vcf": "gs://broad-references/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz",
"JointGenotyping.one_thousand_genomes_resource_vcf_index": "gs://broad-references/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi",

"JointGenotyping.SNP_VQSR_downsampleFactor": 10,
"JointGenotyping.snps_variant_recalibration_threshold": 20000,
"JointGenotyping.snp_filter_level": 99.7,
"JointGenotyping.snp_recalibration_annotation_values": ["QD", "MQRankSum", "ReadPosRankSum", "FS", "MQ", "SOR", "DP"],
"JointGenotyping.snp_recalibration_tranche_values": ["100.0", "99.95", "99.9", "99.8", "99.6", "99.5", "99.4", "99.3", "99.0", "98.0", "97.0", "90.0" ],

"JointGenotyping.indel_filter_level": 99.0,
"JointGenotyping.indel_recalibration_annotation_values": ["FS", "ReadPosRankSum", "MQRankSum", "QD", "SOR", "DP"],
"JointGenotyping.indel_recalibration_tranche_values": ["100.0", "99.95", "99.9", "99.5", "99.0", "97.0", "96.0", "95.0", "94.0", "93.5", "93.0", "92.0", "91.0", "90.0"],

"JointGenotyping.small_disk": 100,
"JointGenotyping.medium_disk": 200,
"JointGenotyping.large_disk": 1000,
"JointGenotyping.huge_disk": 2000
}
502 changes: 502 additions & 0 deletions JointGenotyping.wdl

Large diffs are not rendered by default.

33 changes: 19 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,16 @@
Workflows for [germline short variant discovery](https://software.broadinstitute.org/gatk/best-practices/workflow?id=11145) with GATK4.

### haplotypecaller-gvcf-gatk :
The haplotypecaller-gvcf-gatk4 workflow runs the HaplotypeCaller tool
from GATK4 in GVCF mode on a single sample according to GATK Best Practices.
When executed the workflow scatters the HaplotypeCaller tool over a sample
using an intervals list file. The output file produced will be a
single gvcf file which can be used by the joint-discovery workflow.
The haplotypecaller-gvcf-gatk4 workflow runs the GATK4 HaplotypeCaller tool
in GVCF mode on a single sample according to GATK Best Practices. When
executed the workflow scatters the HaplotypeCaller tool over the input bam sample
using an interval list file. The output produced by the workflow will be a single GVCF
file which can then be provided to the JointGenotyping workflow along with several other
GVCF files to call for variants simultaneously, producing a multisample VCF.
The haplotypecaller-gvcf-gatk4 workflows default GVCF mode is useful when calling variants
for several samples efficiently. However, for instances when calling variants for one or a
few samples it is possible to have the workflow directly call variants and output a VCF file by
setting the `make_gvcf` input variable to `false`.

#### Requirements/expectations
- One analysis-ready BAM file for a single sample (as identified in RG:SM)
Expand All @@ -17,22 +22,22 @@ single gvcf file which can be used by the joint-discovery workflow.
#### Outputs
- One GVCF file and its index

### joint-discovery-gatk :
### JointGenotyping.wdl :
This WDL implements the joint calling and VQSR filtering portion of the
GATK Best Practices for germline SNP and Indel discovery
in human whole-genome sequencing (WGS).
in human whole-genome sequencing (WGS). The workflow accept a sample map
bshifaw marked this conversation as resolved.
Show resolved Hide resolved
file with 50 or more GVCFs and produces a multisample VCF.

*NOTE:*
*- joint-discovery-gatk4-local.wdl is a slightly modified version of the
original to support users interested in running the workflow locally.*
*- joint-discovery-gatk4-fc.wdl is a slightly modified version of the
original to support users interested in running the workflow firecloud with and
using an array of gvcfs as input.*
*- JointGenotyping-terra.wdl is a slightly modified version of the
bshifaw marked this conversation as resolved.
Show resolved Hide resolved
original workflow to support users interested in running the
workflow on Terra. The changes include variables for dockers and disksize, making
it easier to configure the workflow.*


#### Requirements/expectations
- One or more GVCFs produced by HaplotypeCaller in GVCF mode
- Bare minimum 1 WGS sample or 30 Exome samples. Gene panels are not supported.
- Bare minimum 50 samples. Gene panels are not supported.
- When determining disk size in the JSON, use the guideline below
- small_disk = (num_gvcfs / 10) + 10
- medium_disk = (num_gvcfs * 15) + 10
Expand All @@ -45,7 +50,7 @@ using an array of gvcfs as input.*
in the FILTER field.

### Software version requirements :
- GATK 4.1
- GATK 4.1.4.0
- Samtools 1.3.1
- Python 2.7
- Cromwell version support
Expand Down
254 changes: 0 additions & 254 deletions haplotypecaller-gvcf-gatk4-nio.wdl

This file was deleted.

Loading