DeCovA requires at least R and bedtools/GATK softwares to be installed; additionnally, it can use picard-tools (for deduplication), samtools (for mapq filter), and GATK (alternatively to bedtools; required if a base-q filter is needed; GATK is also aware of pair reads overlap). The script will first attempt to run programs installed as root with the following names: samtools, bedtools, picard-tools, GenomeAnalysisTK; if not found, it will try to find them according to the paths provided in the command-line.
DeCovA also requires perl modules: IO::Compress::Gzip.
An annotation file needs to be provided (-r option), for all the options that use gene coordinates: UCSC refgene.txt or Ensembl .gtf/gff files are OK.
ex:
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz
wget ftp://ftp.ensembl.org/pub/grch37/release-92/gff3/homo_sapiens/Homo_sapiens.GRCh37.87.gff3.gz
wget ftp://ftp.ensembl.org/pub/grch37/release-92/gtf/homo_sapiens/Homo_sapiens.GRCh37.87.gtf.gz
DeCovA can be executed via command-line execution of the main perl script:
perl path/to/DeCovA/bin/DeCovA [options]
the script can be changed to executable:
chmod 755 path/to/DeCovA/bin/DeCovA
then the command is :
./path/to/DeCovA/bin/DeCovA [options]
The DeCovA script directory can be added to the $PATH :
echo 'export PATH=$PATH:/home/me/path/to/DeCovA/bin/DeCovA' >> /home/me/.bashrc
then the command is only :
DeCovA [options]
DeCovA can also be installed:
cd path/to/DeCovA/
perl Makefile.PL
make
then, as root:
sudo make install
then just enter
DeCovA [options]
β -f / --file [file]: list of bam files (comma separated, or set several times)
β -F / --fList [file]: file with such a list of bam files (one bam per line)
β -d / --dir [dir]: directory(ies) where to find bam files (comma separated, or set several times)
β -s / --suffix [str]: suffix to add before opening bam files
β -r / --ref [file]: gene annotation file (can be .gz)
β --fmt [gtf/gff3/ucsc] : gene annotation file format (ucsc <=> UCSC refGene) ; if not provided, determined from extension (txt => UCSC refGene)
β -b / --bed [file]: bed file, used to analyse depth coverage
β -m / --mut [file]: mut file, used to plot known mutations ; format: "chrpos(1-based)info" (vcf files are ok ; can be .gz)
β -i / --id [str]: list of of genes/transcripts ids (comma separated, or set several times)
β -I / --idList [file]: file with a list of of genes/transcripts ids (one id per line)
β -g / --genome [file]: path to genome.fa file, if available (required if using GATK)
β --sex_file [file]: format: patientsex
β --raw_cov [file]: use this coverage tool output .cov file (to skip bam analysis)
β --bed_cov [file]: use this DeCovA's output .cov.txt file (to skip cov bed analysis in CNV detect)
β -O / --outdir [dir]: out directory (default: folder named with date)
β -S / --graphSum : will perform graphSums (sum of covered samples by position)
β -A / --allSample : will perform graphAllSample (depthline by gene and by sample, all samples graph on same .png file)
β -X / --bySample : will perform graphBySample (depthline by gene and by sample, one sample by .png file)
β -M / --noDepthMut : does not print, foreach file, depth at known mutations provided by opt -m (default: yes if -opt m)
β -P / --covPlot : will perform covPlots
β -B / --covBed : will output cov of bed intervals
β -C / --CNV : will output CNV foreach bed intervals
β --Reseq : [float,0-1] : print bed interval if cov < value (def: do not print)
β --geneReport : will print all uncovered genomic intervals (within gene region) in 1 txt file per sample (default: no)
β --bedReport : will print all uncovered intervals (within bed intervals) in 1 txt file per sample (default: no)
β --summary [Y/N] : to print summary txt file (default: yes if -S -A -X)
β -k / --keepCov : do not erase coverage file at the end of the process
β -K / --keepBed : do not erase bed file inferred from gene list, at the end of the process (and eventually rename)
*gene/transcript regions analysis param.:
β -N / --nonCoding : analyse also Non coding transcripts (default: no)
β -U / --noUTR : does not take into account UTR regions, for graphs (default: yes)
β -u / --noUTRinTxt : does not take into account UTR regions, for summary txt file and plots (default: yes)
β -t / --depthThreshold [int]: depth thresholds (comma separated, or set several times)
β -T / --printThreshold [int]: depth threshold used for txt outputs (must be one of those in opt -t; default : the smallest one)
β --noGraphThreshold : all graphs will be printed, whatever the coverage (default: only the genes not fully covered at threshold in -opt -T will be drawn)
β --noAllTranscripts : does not print All transcripts on same file, in graphBySample (default: yes)
β --maxDepth [int]: max depth value when printing graph (optional)
β -l / --expand2val [int]: length to add at each ends of exons, on graphs (default: 0) ; or [int1,int2] : lengths to add in 5' and 3'
β --UDstream [int]: length to add at each ends of genes, on graphs ; or [int1,int2] : lengths to add upstram and downstream
β --splitBedFromId : if padding creates overlapping exons, take the mid between them (for report)
β --mergeBedFromId : merge overlapping exons
β -L / --expand2bed : expand length of gene analysed regions to bed coord, if -l < bed , on graphs (default: no)
β --Ltxt [+/-int]: does take into account expanded length (from -l and -L) for txt outputs (default: no), or add a different length
β --UDtxt [+/-int]: does take into account up/downStream length for txt outputs (default: no), or add a different length
β -R / --noReverse : does not reverse regions if sens of transcript = (-) (default: yes)
β --nGraph : max nber of graphs per sheet (default : all samples or all transcripts)
*plot param:
β --binPlot [int]: bin width for covPlot (default=10)
β --maxPlot [int]: max depth for covPlot (default=100)
β --genePlot : will perform plots for regions extracted from genes coord, not only for bed intervals (default: no)
β --interPlot : will produce intersection covPlot (default: no)
*bam filters
β --dedup : do not take in account dup reads (default keep all reads; enter "do" to perform Picard deduplication)
β --mbq : minimum base quality (default 0; requires gatk)
β --mmq : minimum mapping quality (default 0)
*cov_bed param:
β --cov_fields [min/max/tot/mean/median/cov]: fields foreach intervals in covBed (comma separated) (default: min,mean,cov)
β --Lbed [int]: length added out of bed interval ends (default: 0)
β --split_bed : splits overlapping bed intervals for Cov and CNV analyses
β --no_overlap_bed : removes overlapping bed intervals for Cov and CNV analyses
β --cut_bed [+/-cutL:x,minL:y,maxL:z,keepLast:s]: cut bed intervals in shorter fragments:
β cutL : length of segmentation (def: 150)
β minL : min length required to keep the last interval, after segmentation (def: --cutL/2)
β maxL : length above which bed intervals will be segmented, in N segments of "cutL" length (def: as --cutL)
β keepLast : if last interval shorter than minL :
β enter m (merge) if want that last two ones are simply merged
β enter h (half) if want that last two ones are output with length = half of their sum
β enter n if want to through it out
β --reAnnot_bed : removes and replaces 4th column of bed file with gene info (optional args: g,t,e,i,o : indicates to annotate with gene/transcript/exon/intron/intergenic infos; default: all)
CNV_detect param:
β --level2 : "avg"/"med" : use average/median as center of depths of a region (def: med)(if spread2 is set, level2 is unset, unless explicitedly)
β --spread2 : "std"/"qtile" : use standard deviation/deviation from quartile as dispersion of depths of a region (def: none)(std forces avg, qtile forces med)
β --level_del [float [0-1]] (def: 0.8)
β --level_dup [float >1] (def: 1.2)
β --spread_del [float <0] (def: none)
β --spread_dup [float >0] (def: none)
β --range [float]: samples kept for avg-std calculation if within mediane+/-range*quartile (def: none, ie all samples used)
β --highQual [li:float/ls:float/si:float/ss:float/c:int]: flag as high qual if one of following criteria, comma separated : li=level inf, ls=level sup, si=spread inf, ss=spread sup, c=consecutive ; ex : li:0.25,ls:1.75,si:-5,ss:5,c:2
β --ex_region [float [0-1]] : region excluded from analysis if CNVs/N_samples >value (def: 1)
β --ex_sample [float [0-1]] : sample excluded from analysis if CNVs/N_regions >value (def: 1)
β --ex_cov [float [0-1]] : region excluded from analysis if none of the samples have cov >=value (def: 0)
β --ex_DP [int] : region excluded from analysis if avg depth <=value (def: 0)
β --max_nonCNVcons [int]: max nber of non-CNV consecutive intervals tolerated within a CNV (def: 0)
β --max_nonCNVrate [int]: max rate of non-CNV intervals tolerated within a CNV (def: 0)
β --ratioByGender [a/g/no]: enter "a" : foreach region from all chrom, depth ratio computed separately for F and M ; enter "g" : foreach region from gonosomes only, depth ratio computed separately for F and M. def: no (depth ratio for F and M together)
β --normAllChr : total depth used to norm sample depths = sum on all chr, whatever the sex (def: double the depth for chrX if male, and skip chrY in the sum)
β --normDepth [mean/tot] : total depth used to norm sample depths = sum of total depths of each region or sum of mean depths of each region (def)
β --graph_byGene : to enable graph for gene affected by a CNV (def: no)
β --graph_byChr : to enable graph by chromosome (def: no)
β --graph_byCNV : to enable graph around each CNV (def: yes)
β --CNV_fields [min/max/med/avg/std/Q1/Q3]: list of fields foreach region (comma separated) (default: none)
*external tools path:
β --bedtools [dir/file]: enter path to executable, if not installed as root or not in path
β --samtools [dir/file]: enter path to executable, if not installed as root
β --picard [dir/file]: enter path to executable .jar, if not installed as root
β --gatk [+/-dir/file]: cov analysis will be performed by gatk (default:bedtools; enter path to executable, if not installed as root)
*general:
β -x / --ram [int]: memory for gatk (in Go)
β --cpu [int]: multi-thread for gatk (def: 1)
β -v / --version : current version
β -h / --help : help
./path/to/DeCovA -d path/to/bam_dir/ -r path/to/Refseq.txt -b path/to/targets.bed -M path/to/mut.list -t 20,50,100 -A -S -P -B -C
./path/to/DeCovA -f path/to/file1.bam -f path/to/file2.bam -r path/to/Refseq.txt -b path/to/targets.bed -M path/to/mut.list -t 20,50,100 -A -S -P -B -C
./path/to/DeCovA -f path/to/file.list -r path/to/Refseq.txt -b path/to/targets.bed -M path/to/mut.list -t 20,50,100 -A -S -P -B -C
./path/to/DeCovA -d path/to/bam_dir/ -r path/to/Refseq.txt -i GENE1,GENE2,NM_xxx1,NM_xxx2 -M path/to/mut.list -t 20,50,100 -A -S -P -B -C
./path/to/DeCovA -d path/to/bam_dir/ -r path/to/Refseq.txt -i genes.list -b path/to/targets.bed -t 20,50,100 -A -S -P -B -C