-
Notifications
You must be signed in to change notification settings - Fork 0
/
liWGS-SV_docs.txt
92 lines (76 loc) · 3.63 KB
/
liWGS-SV_docs.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
Holmes (liWGS-SV) Documentation
Contact: Ryan Collins ([email protected])
Update: August 2015
Execution command:
runHolmes.sh samples.list parameters_info.sh
##INPUT##
samples.list: three columns, tab delimited, col 1) sample ID, col 2) full path to sample bam, col 3) expected sex (M=XY, F=XX, O=other, U=unknown (will defer to predicted sex by sexcheck))
parameters_info.sh: shell script to export all parameters for pipeline run
##PRE-MODULE STEPS##
Symlinks & indexes all bams
Creates working and output directory trees
Loads necessary modules
##MODULE 1: QC##
Runs the following:
Picard EstimateLibraryComplexity
Picard CollectAlignmentSummaryMetrics
Picard CollectInsertSizeMetrics
Picard CollectWgsMetrics
Samtools flagstat
Bamtools stats
Sex Check
WGS Dosage Bias Check
Checks for nominal QC values, reports errors to ${OUTDIR}/${COHORT_ID}_WARNINGS.txt
Writes master QC table to ${OUTDIR}/QC/cohort/${COHORT_ID}.QC.metrics
##MODULE 2: PHYSICAL DEPTH ANALYSES##
Runs binCov to generate 1kb binned physical depth for each library
BGZips & tabix indexes each coverage file (for classifier)
##MODULE 3: PER-SAMPLE CLUSTERING##
**Rate-limiting step of entire pipeline**
If ${pre_bamstat} isn't set as "TRUE", bamstat is run at min cluster size = 3 for each sample
If ${pre_bamstat}="TRUE", bamstat clusters and stats.file are copied from preexisting paths to ${WRKDIR}
Removes *pairs.txt and *pairs.sorted.txt to save space
##MODULE 4: PHYSICAL DEPTH CNV CALLING##
Runs cnMOPS on autosomes on all samples
Runs cnMOPS on allosomes on samples split by M/F. "Other" sex samples pooled with either M or F depending on ${other_assign}
Merges cnMOPS calls per sample
Runs Serkan's log2R DNAcopy large CNV caller. Allosomes not split by sex; maybe include this functionality in future updates
##MODULE 5: JOINT RECLUSTERING & CLASSIFICATION##
Runs classifier
Patches clusters
Reclassifies patched clusters
Applies final classification labels & sets coordinate reporting to be 1st or 3rd quartile of reads, respectively (to avoid overclustering/negative sizes)
##MODULE 6: CONSENSUS CNV CALLING##
Runs in one of two modes: with or without genotyping information
Mode chosen by parameter ${min_geno}, set in module6.sh, which corresponds to the minimum number of samples in the cohort to use genotyping
***NEED TO ADD GENOTYPING***
Consensus Groups with Genotyping:
A [HIGH]: Valid cluster, cnMOPS or genotyping support, <30% blacklist
B [HIGH]: cnMOPS call, ≥50kb, <30% blacklist, genotyping pass, no clustering overlap
C [MED]: cnMOPS call, <50kb, genotyping pass, <30% blacklist
D [MED]: valid cluster, genotyping or cnMOPS support, ≥30% blacklist
E [MED]: cnMOPS call, ≥50kb, genotyping pass, ≥30% blacklist
F [LOW]: cnMOPS call, ≥50kb, no clustering support, no genotyping support
G [LOW]: cnMOPS call, <50kb, genotyping pass, ≥30% blacklist
H [LOW]: valid cluster, <25kb, no cnMOPS or genotyping support
Consensus Groups without Genotyping:
A [HIGH]: Valid cluster, cnMOPS support, <30% blacklist
B [MED]: cnMOPS call, ≥50kb, <30% blacklist, no clustering overlap
C [MED]: valid cluster, cnMOPS support, ≥30% blacklist
D [LOW]: cnMOPS call, ≥50kb, ≥30% blacklist
E [LOW]: valid cluster, <25kb, no cnMOPS support
Returns single merged file each for consensus dels and consensus dups
##MODULE 7: COMPLEX SV CATEGORIZATION##
Runs inversion classification script
Runs translocation classification script
Runs complex linking script
Runs complex parsing script
##MODULE 8: VARIANT CONSOLIDATION & REFORMATTING
Outputs the following seven variant files:
-Deletion
-Duplication
-Inversion
-Insertion
-Translocation
-Complex
-Unresolved