Holmes

Pipeline for structural variation detection from liWGS libraries

Copyright (c) 2017 Ryan L. Collins, Harrison Brand, and the laboratory of Michael E. Talkowski
Massachusetts General Hospital, The Broad Institute, and Harvard Medical School
Contact: Ryan L. Collins [email protected] or Harrison Brand [email protected]

If you use this software, please cite:
Collins RL, Brand H, et al. Defining the spectrum of large inversions, complex structural variation, and chromothripsis in the morbid genome. Genome Biol. (2017)

Code development credits: Ryan L. Collins, Harrison Brand, Matthew R. Stone, Vamsee Pillalamarri, Joseph T. Glessner, Claire Redin, Colby Chiang, Ian Blumenthal, Adrian Heilbut

Overview
Modules
Module 1: Cohort QC
Module 2: Physical Depth Analysis
Module 3: Per-Sample Read Pair Clustering
Module 4: Physical Depth-Based CNV Discovery
Module 5: Cohort-Wide Joint Read Pair Clustering & Classification
Module 6: Consensus CNV Categorization
Module 7: Balanced & Complex SV Resolution
Module 8: Call Consolidation & Reformatting
Module 9: SV Annotation
Miscellanea & FAQs

Overview

Holmes is a pipeline for SV discovery and annotation in cohorts (n>20-30) of human liWGS libraries. This tool can detect, resolve, and classify all known forms of SV, from canonical copy number variants (CNVs) to balanced rearrangements (e.g. inversions, reciprocal translocations) to more complex chromosomal rearrangements. Currently, this tool only supports human samples and the GRCh37 reference assembly.

The pipeline is run in nine independent modules split into five sequential stages, as shown below:

Note: the codebase for this tool has hard-coded variable paths and other dependencies particular to the Partners Healthcare computing cluster. These dependencies are in the process of being resolved so the tool can be more easily deployed on other clusters. For now, please contact [email protected] with any questions or issues.

Running Holmes

Execution command:

Holmes is invoked by the top-level master script, runHolmes.sh:

runHolmes.sh samples.list parameters_info.sh

Input:

samples.list: three columns, tab delimited
col 1) sample ID
col 2) full path to sample bam
col 3) expected sex (M=XY, F=XX, O=other, U=unknown (will defer to predicted sex by sexcheck))
parameters_info.sh: shell script to export all parameters for pipeline run

Pre-module 1 preparation steps

Symlinks & indexes all bams
Creates working and output directory trees
Loads necessary modules

Module 1: QC

Runs the following:

Picard EstimateLibraryComplexity
Picard CollectAlignmentSummaryMetrics
Picard CollectInsertSizeMetrics
Picard CollectWgsMetrics
Samtools flagstat
Bamtools stats
Sex Check
WGS Dosage Bias Check
Checks for nominal QC values, reports errors to ${OUTDIR}/${COHORT_ID}_WARNINGS.txt
Writes master QC table to ${OUTDIR}/QC/cohort/${COHORT_ID}.QC.metrics

Module 2: Physical Depth Analysis

Runs binCov to generate 1kb binned physical depth for each library
BGZips & tabix indexes each coverage file (for classifier)

Module 3: Per-Sample Read Pair Clustering

If ${pre_bamstat} isn't set as TRUE, bamstat is run at min cluster size = 3 for each sample
If ${pre_bamstat}="TRUE", bamstat clusters and stats.file are copied from preexisting paths to ${WRKDIR}
Removes *pairs.txt and *pairs.sorted.txt to save space

Physical Depth-Based CNV Discovery

Runs cnMOPS on autosomes on all samples
Runs cnMOPS on allosomes on samples split by M/F. "Other" sex samples pooled with either M or F depending on value of ${other_assign}
Merges cnMOPS calls per sample

Module 5: Cohort-Wide Joint Read Pair Clustering & Classification

Runs classifier
Patches clusters
Reclassifies patched clusters
Applies final classification labels & sets coordinate reporting to be 1st or 3rd quartile of reads, respectively (to avoid overclustering/negative sizes)

Consensus CNV Categorization

Runs in one of two modes: with or without genotyping information
Mode chosen by parameter ${min_geno}, set in module6.sh, which corresponds to the minimum number of samples in the cohort to use genotyping

Consensus Groups with Genotyping:

A [HIGH]: Valid cluster, cnMOPS or genotyping support, <30% blacklist
B [HIGH]: cnMOPS call, ≥50kb, <30% blacklist, genotyping pass, no clustering overlap
C [MED]: cnMOPS call, <50kb, genotyping pass, <30% blacklist
D [MED]: valid cluster, genotyping or cnMOPS support, ≥30% blacklist
E [MED]: cnMOPS call, ≥50kb, genotyping pass, ≥30% blacklist
F [LOW]: cnMOPS call, ≥50kb, no clustering support, no genotyping support
G [LOW]: cnMOPS call, <50kb, genotyping pass, ≥30% blacklist
H [LOW]: valid cluster, <25kb, no cnMOPS or genotyping support

Consensus Groups without Genotyping:

A [HIGH]: Valid cluster, cnMOPS support, <30% blacklist
B [MED]: cnMOPS call, ≥50kb, <30% blacklist, no clustering overlap
C [MED]: valid cluster, cnMOPS support, ≥30% blacklist
D [LOW]: cnMOPS call, ≥50kb, ≥30% blacklist
E [LOW]: valid cluster, <25kb, no cnMOPS support

Returns single merged file each for consensus dels and consensus dups.

Module 7: Balanced & Complex SV Resolution

Runs inversion classification script
Runs translocation classification script
Runs complex linking script
Runs complex parsing script

Module 8: Call Consolidation & Reformatting

Outputs the following seven variant files:

Deletion
Duplication
Inversion
Insertion
Translocation
Complex
Unresolved

Module 9: SV Annotation

Annotates SVs for genic disruption and overlap with ENCODE enhancers, promoters, and TADs

Name		Name	Last commit message	Last commit date
Latest commit History 231 Commits
classifier		classifier
data		data
pycluster		pycluster
readpaircluster		readpaircluster
scripts		scripts
LICENSE		LICENSE
README.md		README.md
example_samples.list		example_samples.list
liWGS-SV_docs.txt		liWGS-SV_docs.txt
parameters_info.ex.sh		parameters_info.ex.sh
runHolmes.sh		runHolmes.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Holmes

Table of Contents

Overview

Running Holmes

Execution command:

Input:

Pre-module 1 preparation steps

Module 1: QC

Module 2: Physical Depth Analysis

Module 3: Per-Sample Read Pair Clustering

Physical Depth-Based CNV Discovery

Module 5: Cohort-Wide Joint Read Pair Clustering & Classification

Consensus CNV Categorization

Module 7: Balanced & Complex SV Resolution

Module 8: Call Consolidation & Reformatting

Module 9: SV Annotation

About

Releases

Packages

Languages

License

talkowski-lab/Holmes

Folders and files

Latest commit

History

Repository files navigation

Holmes

Table of Contents

Overview

Running Holmes

Execution command:

Input:

Pre-module 1 preparation steps

Module 1: QC

Module 2: Physical Depth Analysis

Module 3: Per-Sample Read Pair Clustering

Physical Depth-Based CNV Discovery

Module 5: Cohort-Wide Joint Read Pair Clustering & Classification

Consensus CNV Categorization

Module 7: Balanced & Complex SV Resolution

Module 8: Call Consolidation & Reformatting

Module 9: SV Annotation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages