All command line options

For instructions on using deepTools 2.0 or newer, please go here. This page only applies to deepTools 1.5

deepTools version: 1.5.9 (September 2014)

Oct 2014: bamCoverage now has two new options: --centerReads and --missingDataAsZero

Here, you will find all the options available for the command line (almost all of them are also available in Galaxy, perhaps named slightly different).

You can always see all available command-line options via --help:

$ /deepTools/bin/bamCoverage --help

A typical deepTools command could look like this:

$ /deepTools/bin/bamCoverage --bam myAlignedReads.bam \
--outFileName myCoverageFile.bigWig \
--outFileFormat bigwig \
--fragmentLength 200 \
--ignoreDuplicates \
--scaleFactor 0.5

general principles

output format of plots should be indicted by the file ending, e.g. MyPlot.pdf will return a pdf, MyPlot.png a png-file
all tools that produce plots can also output the underlying data - this can be useful in case you don’t like the deepTools visualization as you can then use the data matrices produced by deepTools with your favorite plotting module, e.g. R or Excel

Parameters to decrease the run time

numberOfProcessors
region - in case you're testing whether a certain plot works and gives you the output you're hoping for, you can speed things up by focusing on a certain genome region, e.g. chr4 or chr2:100000200000

filtering BAMs while processing

ignoreDuplicates
minMappingQuality

If you use bamCoverage or bamCompare on samples that have many duplicates or many reads of low quality, it might be better to filter the BAM files beforehand as the filtering by deepTools is done after the scaling factors are calculated. If you know that your files will be strongly affected by the filtering, this might lead to non-optimal scaling factors!

To tell a program to use a certain option (e.g. to ignore duplicate reads), you will have to give the option name preceded by two hyphens (e.g. --ignoreDuplicates). In the tables on this page, we try to list:

the option name as recognized by the program
the kind of value that is sometimes expected after the option name (see the annotated figure below)
a verbose explanation of what the option actually does

The texts here are adjusted for readability, they might not match the help text that you see in the command line word by word.

Table of Content

bamCorrelate

Mandatory parameters
Optional parameters
Additional output parameters
Parameters for read processing
Options for the heatmap

bamFingerprint

Mandatory parameters
Output parameters
Optional parameters
Parameters for read processing

computeGCBias

Mandatory parameters
Optional parameters
Output parameters

correctGCbias

Mandatory parameters
Optional parameters
Output parameters

bamCoverage

Mandatory parameters
Output parameters
Optional parameters
Parameters for read processing

bamCompare

Mandatory parameters
Output parameters
Optional parameters
Parameters for read processing

computeMatrix

Mandatory parameters
Optional parameters
Output parameters

heatmapper

Mandatory parameters
Optional parameters
- Clustering options
Output parameters

bamCorrelate

bamCorrelate can be run in two modes: bins and BED-file.

A typical command would like this:

$ /deepTools/bin/bamCorrelate BED-file \
--BED myRegionsOfInterest.bed \
--bamfiles myAlignedReads_Sample1.bam myAlignedReads_Sample2.bam \
--plotFile correlation_plot.png \
--corMethod spearman

For details on bamCorrelate, see the tool details.

Mandatory arguments

Command	Expected Input	Explanation
--bamfiles	FILENAMES	List of indexed BAM files separated by space (default: None)
--plotFile	FILENAME	File name to save the file containing the heatmap of the correlation. The file ending will be used to determine the image format, for example, if correlation.pdf is provided, the heatmap will be saved in pdf format. Available file format options are: .png, .emf, .eps, .pdf and .svg. (default: None)
--corMethod	{spearman, pearson}	Choose the method for the correlation calculation (default: None)

bamCorrelate: optional arguments

only for BED-file mode:

| Command | Expected Input | Explanation | |:----:|:----:|:----| | --BED | FILENAME | If the comparison of read counts should be limited to certain regions, a BED file can be given. If this is the case, then the correlation is computed for the number of reads that overlap such regions. (default: None) |

only for bins mode:

| Command | Expected Input | Explanation | |:----:|:----:|:----| | --binSize | INTEGER | Length in base pairs for a window used to sample the genome. (default: 10000) | |--distanceBetweenBins | INTEGER | By default, bamCorrelate considers consecutive bins of the specified bin size (--binSize option). This means that the sampling is not random! However, to reduce the computation time, a larger distance between bins can by given. Larger distances result in less bins being considered. (default: 0)| |--doNotRemoveOutliers | |By default, bins with very large counts are removed. By setting this option outliers will not be removed. Bins with abnormally high reads counts artificially increase pearson correlation; that's why, by default, bamCorrelate tries to remove outliers using the median absolute deviation (MAD) method applying a threshold of 200 to only consider extremely large deviations from the median. ENCODE blacklist page contains useful information about regions with unusually high counts.

Command	Expected Input	Explanation
--includeZeros		If set, then zero counts that happen for all BAM files are included. The default behavior is to ignore those cases (default: False)
--region	CHR:START:END	Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example: --region chr10 or --region chr10:456700:891000. (default: None)
--numberOfProcessors	INTEGER	Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (default: max/2)
--verbose		Set to see processing messages. (default: False)

bamCorrelate: optional arguments for processing of the reads

Command	Expected Input	Explanation
--fragmentLength	INTEGER	Length of the average fragment size. Reads will be extended to match this length unless they are paired-end, in which case they will be extended to match the fragment length. If this value is set to the read length or smaller, the read will not be extended. Warning the fragment length affects the normalization to 1x (see --normalizeTo1x). The formula to normalize using the sequencing depth is genomeSize/(number of mapped reads * fragmentLength). NOTE: If the BAM files contain mated and unmated paired-end reads, unmated reads will be extended to match the --fragmentLength. (default: 200)
--doNotExtendPairedEnds		If set, reads are not extended to match the fragment length reported in the BAM file, instead they will be extended to match the --fragmentLength. Default is to extend the reads if paired-end information is available. (default: False)
--ignoreDuplicates		If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate position also has to coincide to ignore a read. (default: False)
--minMappingQuality	INTEGER	If set, only reads that have a mapping quality score higher than --minMappingQuality are considered. (default: None)

bamCorrelate: optional arguments concerning the output

Command	Expected Input	Explanation
--outFileCorMatrix	FILENAME	Output file name for the correlation matrix (tabulated text file). (default: None)
--outRawCounts	FILENAME	Output file name to save the bin counts (tabulated text file). (default: None)
--plotFileFormat	{png, emf, eps, pdf, svg}	If given, this option overrides the image format based on the ending given in --plotFile. (default: None)

bamCorrelate: arguments for the heatmap display

Command	Expected Input	Explanation
--labels	NAMES	List of labels to use in the image. If no labels are given, the file names will be used instead. Separate the labels by space, e.g. --labels sample1 sample2 sample3 (default: None)
--zMin	NUMBER	Minimum value for the heatmap intensities. If not specified the value is set automatically (default: None) See this FAQ here for details.
--zMax	NUMBER	Maximum value for the heatmap intensities. If not specified the value is set automatically (default: None)
--colorMap	{Spectral, summer, coolwarm, Set1, Set2, Set3, Dark2, hot, RdPu, YlGnBu, RdYlBu, gist_stern, cool, gray, GnBu, gist_ncar, gist_rainbow, CMRmap, bone, RdYlGn, spring, terrain, PuBu, spectral, gist_yarg, BuGn, bwr, cubehelix, YlOrRd, Greens, PRGn, gist_heat, Paired, hsv, Pastel2, Pastel1, BuPu, copper, OrRd, brg, gnuplot2, jet, gist_earth, Oranges, PiYG, YlGn, Accent, gist_gray, flag, BrBG, Reds, RdGy, PuRd, Blues, Greys, autumn, pink, binary, winter, gnuplot, RdBu, prism, YlOrBr, rainbow, seismic, Purples, ocean, PuOr, PuBuGn, afmhot}	Color map to use for the heatmap. For exemplary plots for all the color schemes, click here (default: Reds)

bamFingerprint

Mandatory argument

Command	Expected Input	Explanation
--bamfiles	FILENAME	List of sorted BAM files (default: None)

bamFingerprint: output options

Command	Expected Input	Explanation
--plotFile	FILENAME	File name to save the image file containing a plot of the fingerprint, for example MyPlot.png (default: None)
--outRawCounts	FILENAME	Output file name to save the bin counts (default: None)
--plotFileFormat	{png, emf, eps, pdf, svg}	image format type. If given, this option overrides the image format based on the plotFile ending. (default: None)

bamFingerprint: optional arguments

Command	Expected Input	Explanation
--labels	LIST	List of labels to use in the output. If not given the file names will be used instead. Separate the labels by space. (default: None)
--binSize	INTEGER	Length in base pairs for a window used to sample the genome. (default: 500)
--fragmentLength	INTEGER	Length of the average fragment size. (default: 200)
--numberOfSamples	INTEGER	Number of bins, sampled from the genome to compute the average number of reads. (default: 500000)
--skipZeros		If set, then zero counts that happen for all BAM files given are ignored. This will result in a reduced number of read counts than the the specified in --numberOfSamples (default: False)
--region	CHR:START:END	Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000. (default: None)
--numberOfProcessors	INTEGER	Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (default: max/2)
--verbose		Set to see processing messages. (default: False)

bamFingerprint: Processing options

Command	Expected Input	Explanation
--doNotExtendPairedEnds		If set, reads are not extended to match the fragment length reported in the BAM file, instead they will be extended to match the --fragmentLength. Default is to extend the reads if paired end information is available. (default: False)
--ignoreDuplicates		If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate position also has to coincide to ignore a read. (default: False)
--minMappingQuality	INTEGER	If selected and accompanied by a number, only reads that have a mapping quality score higher than the given number are considered (e.g. --minMappingQuality 10)(default: None)

computeGCBias

Mandatory arguments

Command	Expected Input	Explanation
--bamfile	FILENAME	Sorted BAM file. (default: None)
--effectiveGenomeSize	INTEGER	The effective genome size is the portion of the genome that is mappable. Large fractions of the genome are stretches of NNNN that should be discarded. This value is needed to detect enriched regions that, if not discarded, can bias the results. If repetitive regions were not included in the mapping of reads, the effective genome size needs to be adjusted accordingly. Common values are: mm9: 2150570000, hg19:2451960000, dm3:121400000 and ce10:93260000. See Table 2 of this article or here for several effective genome sizes. (default: None)
--genome	FILENAME	Genome in 2bit format. Most genomes can be found here. Search for the .2bit ending. Otherwise, FASTA files can be converted to 2bit using the UCSC programm called faToTwoBit available for different plattforms at http://hgdownload.cse.ucsc.edu/admin/exe/ (default: None)
--fragmentLength	INT EGER	Fragment length used for the sequencing. If paired-end reads are used the fragment length is computed from the BAM file (default: None)

computeGCBias: output parameters

Command	Expected Input	Explanation
--GCbiasFrequenciesFile	FILE NAME	Indicate a file name where the observed and expected read frequencies per GC content can be saved in. This file will be needed to run the correctGCBias tool. This is a text file. (default: None)
--plotFileFormat	{png, emf, eps, pdf, svg}	image format type. If given, this option overrides the image format based on the plotFile ending. (default: None)
--biasPlot	FILENAME	found on the sample will be saved. (default: None)
--regionSize	INT EGER	To plot the reads per GC over a region, the size of the region is required. By default, the region size is set to 300 bp, which is close to the standard fragment size for Illumina machines. However, if the depth of sequencing is low, a larger bin size will be required, otherwise many bins will not overlap with any read (default: 300)

computeGCBias: optional arguments

Command	Expected Input	Explanation
--sampleSize	INTEGER	Define how many regions in the genome should be sampled for the calculation of the read distributions. (default: 50000000)
--filterOut	FILENAME	In some cases, it will make sense to exclude certain regions from the calculation of the read distributions. To tell computeGCBias where not to sample, please provide the path to a BED file of regions that should be excluded. See the entry on computeGCBias for further details. (default: None)
--extraSampling	FILENAME	BED file containing genomic regions for which extra sampling is required because they are underrepresented in the genome. This could be regions of extreme GC contents. (default: None)
--version	show program's version number and exit
--region	CHR:START:END	Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000. (default: None)
--numberOfProcessors	INTEGER	Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (default: max/2)
--verbose		Set to see processing messages. (default: False)

correctGCBias

Mandatory arguments

Command	Expected Input	Explanation
--bamfile	FILENAME	Sorted BAM file whose read counts should be corrected for GCbias according to the expected GC profile of the reference genome (calculated with computeGCBias). (default: None)
--effectiveGenomeSize	INTEGER	The effective genome size is the portion of the genome that is mappable. Large fractions of the genome are stretches of NNNN that should be discarded. Also, if repetitive regions were not included in the mapping of reads, the effective genome size needs to be adjusted accordingly (i.e. reduce the mappable genome size). Common values are: mm9: 2150570000, hg19:2451960000, dm3:121400000 and ce10:93260000. See Table 2 of http://www.plosone.org/article/info:doi/10. 1371/journal.pone.0030377 or http://www.nature.com/nbt /journal/v27/n1/fig_tab/nbt.1518_T1.html for several effective genome sizes. This value is needed to detect enriched regions that, if not discarded can bias the results. (default: None)
--genome	FILENAME	Genome sequence in 2bit format. Most genomes can be found here:. Search for the .2bit ending. Otherwise, fasta files can be converted to 2bit using the UCSC programm called faToTwoBit available for different plattforms at http://hgdownload.cse.ucsc.edu/admin/exe/ (default: None)
--GCbiasFrequenciesFile	FILENAME	Indicate the output file from computeGCBias that contains the observed and expected read frequencies per GC content. (default: None)

correctGCBias: optional arguments

Command	Expected Input	Explanation
--binSize	INTEGER	If you choose to output a bedGraph or bigWig file instead of a BAM file, then here is the place to define the size of the genomic bins (in bp) for which the overlapping reads should be counted. The information about the fragment length is stored in the frequencies table.
--region	CHR:START:END	Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000. (default: None)
--numberOfProcessors	INTEGER	Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (default: max/2)
--verbose		Set to see processing messages. (default: False)

correctGCBias: output parameters

Command	Expected Input	Explanation
--correctedFile	FILENAME	Name of the corrected file. The ending will be used to decide the output file format. The options are ".bam", ".bw" for a bigWig file, ".bg" for a bedGraph file. NOTE: If you choose to output a bigWig or bedGraph file, be aware that these files will not be normalized for sequencing depth! If you would like to further normalize your read coverage (in addition to GC), we recommend to obtain a BAM file. (default: None)

bamCoverage

Mandatory argument

Command	Expected Input	Explanation
--bam	FILENAME	BAM file to process (default: None)

bamCoverage: output options

Command	Expected Input	Explanation
--outFileName	FILENAME	Output file name. (default: None)
--outFileFormat	{bigwig,bedgraph}	Output file type.

bamCoverage: optional arguments

Command	Expected Input	Explanation
--bamIndex	FILENAME	Index for the BAM file. Default is to consider the path of the BAM file adding the .bai suffix. (default: None)
--scaleFactor	NUMBER	Indicate a number that you would like to use as a fixed scaling factor instead of the scaling factor calculated by the --normalizeTo1x option. (default: 1)
--normalizeTo1x	NUMBER	Report read coverage normalized to 1x sequencing depth (also known as Reads Per Genomic Content (RPGC)). Sequencing depth is defined as: (total number of mapped reads * fragment length) / effective genome size. To use this option, the effective genome size has to be indicated after the command*. The effective genome size is the portion of the genome that is mappable. Large fractions of the genome are stretches of NNNN that should be discarded. Also, if repetitive regions were not included in the mapping of reads, the effective genome size needs to be adjusted accordingly. Common values are: mouse/mm9: 2150570000, human/hg19:2451960000, D.melanogaster/dm3:121400000 and C.elegans*/ce10:93260000. See Table 2 of this article or here for several effective genome sizes. (default: None)
--normalizeUsingRPKM		Use Reads Per Kilobase per Million reads to normalize the number of reads per bin. The formula is: *RPKM (per bin) = number of reads per bin / ( number of mapped reads ( in millions) bin length (kb) )** (default: False)
--ignoreForNormalization	LIST	A list of chromosome names separated by comma and limited by quotes, containing those chromosomes that want to be excluded for computing the normalization. For example, --ignoreForNormalization "chrX, chrM" (default: None)
--centerReads		By adding this option reads are centered with respect to the fragment length. For paired-end data the read is centered at the fragment length defined by the two fragment ends. For single-end data, the given fragment length is used. This option is useful to get a sharper signal around enriched regions. (default: False)
--missingDataAsZero	{yes,no}	This parameter determines if missing data should be treated as zeros. If set to "no", missing data will be ignored and not included in the output file. Missing data is defined as those bins for which no overlapping reads are found. (default: yes)
--binSize	INTEGER	Size of the bins in bp for the output of the bigWig/bedGraph file. (default: 50)
--region	CHR:START:END	Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000. (default: None)
--numberOfProcessors	INTEGER	Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (default: max/2)
--verbose		Set to see processing messages. (default: False)

bamCoverage: BAM to bedGraph/bigWig processing options

Command	Expected Input	Explanation
--fragmentLength	INTEGER	Length of the average fragment size. Reads will be extended to match this length unless they are paired- end, in which case they will be extended to match the fragment length. If this value is set to the read length or smaller, the read will not be extended. Warning the fragment length affects the normalization to 1x (see --normalizeTo1x). The formula to normalize using the sequencing depth is genomeSize/(number of mapped reads * fragmentLength). NOTE: If the BAM files contain mated and unmated paired-end reads, unmated reads will be extended to match the --fragmentLength. (default: 200)
--smoothLength	INTEGER	The smooth length defines a window, larger than the binSize, to average the number of reads. For example, if the --binSize is set to 20 bp and the --smoothLength is set to 60 bp, then, for each binSize the average of it and its left and right neighbors is considered. Any value smaller than the --binSize will be ignored and no smoothing will be applied. (default: None)
--doNotExtendPairedEnds		If set, reads are not extended to match the fragment length reported in the BAM file, instead they will be extended to match the --fragmentLength. Default is to extend the reads if paired end information is available. (default: False)
--ignoreDuplicates		If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate position also has to coincide to ignore a read. (default: False)
--minMappingQuality	INTEGER	If set, only reads that have a mapping quality score higher than --minMappingQuality are considered. (default: None)

bamCompare

Mandatory arguments

Command	Expected Input	Explanation
--bamfile1	FILENAME	Sorted BAM file 1. Usually the BAM file for the treatment. (default: None)
--bamfile2	FILENAME	Sorted BAM file 2. Usually the BAM file for the control. (default: None)

bamCompare: output options

Command	Expected Input	Explanation
--outFileName	FILENAME	Output file name. (default: None)
--outFileFormat	{bigwig, bedgraph}	Output file type.

bamCompare: optional arguments

Command	Expected Input	Explanation
--bamIndex1	FILENAME	Index for the BAM file 1. Default is to consider the path of the BAM file and adding the .bai suffix. (default: None)
--bamIndex2	FILENAME	Index for the BAM file 1. Default is to consider the path of the BAM file adding the .bai suffix. (default: None)
--scaleFactorsMethod	{readCount,SES}	Method for scaling the samples. SES is only recommended when signal and noise are well separated which can be seen in the bamFingerprint plot. (default: readCount)
--sampleLength	INTEGER	only relevant when SES is chosen for the scaleFactorsMethod. To compute the SES, specify the length of the regions (in bp) which will be randomly sampled to calculate the scaling factors. If you do not have a good sequencing depth for your samples, consider to increase the sampling regions' size. This will minimize the probability that zero-coverage regions are used. (default: 1000)
--numberOfSamples	INTEGER	Only relevant when SES is chosen for the scaleFactorsMethod. How many times the genome should be sampled to compute the scaling factors (default: 100000)
--scaleFactors	NUMBER:NUMBER	Set this parameter to avoid the computation of scaleFactors. The format is scaleFactor1:scaleFactor2. For example 0.7:1 to scale the first BAM file by 0.7 while not scaling the second BAM file (default: None)
--pseudocount	NUMBER	small number to avoid log2(x/0) (default: 1)
--ratio	{log2, ratio, subtract, add, reciprocal_ratio}	The default is to output the log2ratio between the two samples. The reciprocal ratio returns the the negative of the inverse of the ratio if the ratio is less than 0. The resulting values are interpreted as negative fold changes. (default: log2)
--normalizeUsingRPKM	only to be set when --ratio subtract is selected	If you would like to get the difference between 2 BAM files (and hence set --ratio subtract), the final score will be normalized for sequencing depth so that the file can be compared to other samples from different sequencing runs, too. The default normalization method is __R__eads __P__er __K__ilobase per __M__illion reads. The formula is: RPKM (per bin) = #reads per bin / ( # of mapped reads (millions) x bin length (KB)). Example usage: --normalizeUsingRPKM
--normalizeTo1x	NUMBER (only when --ratio subtract)	The difference will be reported normalized for sequencing depth. The default method is RPKM (see above). If you would like to report the normalized coverage to 1x sequencing depth, set this option and indicate the effective genome size. Sequencing depth is defined as the total number of mapped readsfragment length / effective genome size. To use this option, the effective genome size has to be given. Common values are: mouse/mm9: 2150570000, human/hg19:2451960000, D.melanogaster/dm3:121400000 and C.elegans*/ce10:93260000. Example usage: --normalizeTo1x 2150570000
--missingDataAsZero	{yes,no}	This parameter determines if missing data should be treated as zeros. If set to "no", missing data will be ignored and not included in the output file. Missing data is defined as those regions for which both BAM files have 0 reads. (default: yes)
--ignoreForNormalization	LIST	A list of chromosome names separated by comma and limited by quotes, containing those chromosomes that you want to be excluded for computing the normalization. For example, --ignoreForNormalization "chrX, chrM" (default: None)
--binSize	INTEGER	Size of the bins in bp for the output of the bigWig/bedGraph file. (default: 50)
--region	CHR:START:END	Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000. (default: None)
--numberOfProcessors	INTEGER	Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (default: max/2)
--verbose		Set to see processing messages.

bamCompare: BAM to bedGraph/bigWig processing options

Command	Expected Input	Explanation
--fragmentLength	INTEGER	Length of the average fragment size. Reads will be extended to match this length unless they are paired- end, in which case they will be extended to match the fragment length. If this value is set to the [read][] length or smaller, the read will not be extended. Warning the fragment length affects the normalization to 1x (see --normalizeTo1x). The formula to normalize using the sequencing depth is genomeSize/(number of mapped reads * fragmentLength). NOTE: If the BAM files contain mated and unmated paired-end reads, unmated reads will be extended to match the --fragmentLength. (default: 200)
--smoothLength	INTEGER	The smooth length defines a window, larger than the binSize, to average the number of reads. For example, if the --binSize is set to 20 bp and the --smoothLength is set to 60 bp, then, for each binSize the average of it and its left and right neighbors is considered. Any value smaller than the --binSize will be ignored and no smoothing will be applied. (default: None)
--doNotExtendPairedEnds		If set, reads are not extended to match the fragment length reported in the BAM file, instead they will be extended to match the --fragmentLength. Default is to extend the reads if paired end information is available. (default: False)
--ignoreDuplicates		If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate position also has to coincide to ignore a read. (default: False)
--minMappingQuality	INTEGER	If selected and accompanied by a number, only reads that have a mapping quality score higher than the given number are considered (e.g. --minMappingQuality 10)(default: None)

computeMatrix

computeMatrix can be run in two modes: scale-regions and reference-point.

For details on the differences between the two modes, please check the tool details: computeMatrix

Mandatory arguments

Command	Expected Input	Explanation
--regionsFileName	FILENAME	File should and contain the regions to plot in BED format. (default: None)
--scoreFileName	FILENAME	bigWig file with the scores to be visualized. BigWig files can be obtained by using the bamCoverage or bamCompare tools.

computeMatrix: Optional arguments

only for scale-regions mode:

| Command | Expected Input | Explanation | |:----:|:----:|:----| | --regionBodyLength | INTEGER | Distance in bp to which all regions are going to be fitted. (default: 1000) | | --startLabel | NAME | Label shown in the plot for the start of the region. Default is region start site, but this could be changed to anything, e.g. "peak start". (default: TSS) | | --endLabel | NAME | Label shown in the plot for the region end. Default is the region end site. (default: TES) |

only for reference-point mode:

| Command | Expected Input | Explanation | |:----:|:----:|:----| | --referencePoint | {TSS;TES;center} | The reference point for the plotting could be either the region start (TSS), the region end (TES) or the center of the region. (default: TSS) | | --nanAfterEnd | | If set, any values after the region end are discarded. This is useful to visualize the region end when not using the scale-regions mode and when the reference-point is set to the start of the region. (default: False) |

Command	Expected Input	Explanation
--beforeRegionStartLength	INTEGER	Distance upstream of the reference-point selected. (default: 500)
--afterRegionStartLength	INTEGER	Distance downstream of the reference-point selected. (default: 1500)
--binSize	INTEGER	Length, in base pairs, of the non-overlapping bin for averaging the score over the regions length. (default: 10)
--sortRegions	{descend,ascend,no}	Whether the output file should present the regions sorted. The default is to sort in descending order based on the mean value per region. (default: no)
--sortUsing	{mean,median,max,min,sum,region_length}	Indicate which method should be used for sorting. The value is computed for each row. (default: mean)
--averageTypeBins	{mean,median,min,max,std,sum}	Define the type of statistic that should be used over the bin size range. The options are: "mean", "median", "min", "max", "sum" and "std". The default is "mean". (default: mean) should be indicated as zeros. Default is to ignore such cases which will be depicted as black areas in the heatmap. (see --missingDataColor argument of the heatmapper for additional options). (default: False)
--skipZeros		Whether regions with only scores of zero should be included or not. Default is to include them. (default: False)
--minThreshold	NUMBER	Numeric value. Any region containing a value that is equal or less than this numeric value will be skipped. This is useful to skip, for example, genes where the read count is zero for any of the bins. This could be the result of unmappable areas and can bias the overall results. (default: None)
--maxThreshold	NUMBER	Numeric value. Any region containing a value that is equal or higher that this numeric value will be skipped. The maxThreshold is useful to skip those few regions with very high read counts (e.g. major satellites) that may bias the average values. (default: None)
--quiet		Set to remove any warning or processing messages. (default: False)
--scale	NUMBER	If set, all values are multiplied by this number. (default: 1)
--numberOfProcessors	INTEGER	Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (default: max/2)

computeMatrix: output options

Command	Expected Input	Explanation
--outFileName	FILENAME	File name to save the gzipped matrix file which will be used by heatmapper and profiler. (default: None)
--outFileNameData	OUTFILENAMEDATA	Name to save the averages per matrix column into a text file. This corresponds to the underlying data used to plot a summary profile. Example: myProfile.tab (default: None)
--outFileNameMatrix	FILE	If this option is given, then the matrix of values underlying the heatmap will be saved using the indicated name, e.g. IndividualValues.tab.This matrix can easily be loaded into R or other programs. (default: None)
--outFileSortedRegions	BED	File name in which the regions are saved after skiping zeros or min/max threshold values. The order of the regions in the file follows the sorting order selected. This is useful, for example, to generate other heatmaps keeping the sorting of the first heatmap. Example: Heatmap1sortedRegions.bed (default: None)

heatmapper

To run heatmapper in default mode, all you need is a zipped table of values calculated with computeMatrix. You cannot change from reference-point to scale-regions mode within heatmapper as this influences the value calculation and would have to be done with computeMatrix. However, there are numerous heatmapper options to optimize the graphical output.

To get a better understanding of the power of the different heatmapper parameters, do pay a visit to our Gallery.

Mandatory arguments

Command	Expected Input	Explanation
--matrixFile	FILENAME	Matrix file (gzipped table of values) from the computeMatrix tool. (default: None)
--outFileName	FILENAME	File name to save the image. The file ending will be used to determine the image format. The available options are: "png", "emf", "eps", "pdf" and "svg", e. g. MyHeatmap.png. (default: None)

heatmapper: Optional arguments

Command	Expected Input	Explanation
--sortRegions	{descend, ascend, no}	Whether (and how) the heatmap should present the regions in a aprticular order. The default is to sort in descending order based on the mean value per region. (default: descend)
--sortUsing	{mean, median, max, min, sum, region_length}	Indicate which method should be used for sorting. The method is computed for each row. (default: mean)
--averageTypeSummaryPlot	{mean, median, min, max, std, sum}	Define the type of statistic that should be visualized in the summary plot above the heatmap. (default: mean)
--missingDataColor	COLOR	Oftentimes, some regions in the genome will not have a score attached to them within the bigWig file used with computeMatrix. If --missingDataAsZero is not set, these cases will be colored in black by default. Depending on your intent, changing the color of these uncovered regions to a more (or less) compelling color might be useful (see this example from the Example workflows or this heatmap in the Gallery). Using this parameter, you can specify a different color using 3 approaches: 1) A value between 0 and 1 will be used for a *gray scale (black is 0); 2) You can give a color name listed [here](http://packages.python.or g/ete2/reference/reference_svgcolors.html). or 3) specify a color using the #rrggbb notation. (default: black)
--colorMap	{Spectral, summer, coolwarm, Set1, Set2, Set3, Dark2, hot, RdPu, YlGnBu, RdYlBu, gist_stern, cool, gray, GnBu, gist_ncar, gist_rainbow, CMRmap, bone, RdYlGn, spring, terrain, PuBu, spectral, gist_yarg, BuGn, bwr, cubehelix, YlOrRd, Greens, PRGn, gist_heat, Paired, hsv, Pastel2, Pastel1, BuPu, copper, OrRd, brg, gnuplot2, jet, gist_earth, Oranges, PiYG, YlGn, Accent, gist_gray, flag, BrBG, Reds, RdGy, PuRd, Blues, Greys, autumn, pink, binary, winter, gnuplot, RdBu, prism, YlOrBr, rainbow, seismic, Purples, ocean, PuOr, PuBuGn, afmhot}	Choose the color scheme for the heatmap. Check them out here. (default: RdYlBu)
--zMin	NUMBER	Minimum value for the heatmap intensities. (default: None)
--zMax	NUMBER	Maximum value for the heatmap intensities. (default: None)
--heatmapHeight	NUMBER	Height of the heatmap in cm. The minimum value is 3 and the maximum is 100. (default: 25)
--heatmapWidth	NUMBER	Width of the heatmap in cm. The minimum value is 1 and the maximum is 100. (default: 7.5)
--whatToShow	{"plot and heatmap", "heatmap only", "colorbar only", "heatmap and colorbar", "plot, heatmap and colorbar"}	Configure the panels that should be included in the output figure. The default is to include a summary plot on top of the heatmap and a colorbar indicating which scores got which colors next to the heatmap. (default: plot, heatmap and colorbar)
--startLabel	LABEL	(only when scale-regions mode was used with computeMatrix) Label shown in the plot for the start of the region. Default is TSS (transcription start site), but could be changed to anything, e.g. "peak start". Same for the --endLabel option. See below. (default: TSS)
--endLabel	LABEL	(only when scale-regions mode was used with computeMatrix) Label shown in the plot for the region end. Default is TES (transcription end site). (default: TES)
--refPointLabel	LABEL	(only when reference-point mode was used with computeMatrix) Label shown in the plot for the reference-point. Default is the same as the reference point selected (e.g. TSS), but could be anything, e.g. "peak start". (default: TSS)
--regionsLabel	LABEL	Labels for the regions plotted in the heatmap. If more than one region is being plotted a list of lables separated by comma and limited by quotes, is requires. For example, --regionsLabel "label1, label2". Default is "genes". (default: genes)
--plotTitle	LABEL	Title of the plot, to be printed on top of the generated image. Leave blank for no title. (default: )
--xAxisLabel	LABEL	Description for the x-axis label. (default: gene distance (bp))
--yAxisLabel	LABEL	Description for the y-axis label for the top panel. (default: )
--yMin	NUMBER	Minimum value for the Y-axis of the summary plot. (default: None)
--yMax	NUMBER	Maximum value for the Y-axis of the summary plot. (default: None)
--onePlotPerGroup		When the region file contains groups separated by "#", the default is to plot the averages for the distinct plots in one plot. If this option is set, each group will get its own plot, stacked on top of each other. (default: False)
--plotFileFormat	{png, emf, eps, pdf, svg}	Image format type. If given, this option overrides the image format based on the plotFile ending. (default: None)
--verbose		If set, warning messages and addition information are given. (default: False)

heatmapper: clustering parameters

Please use the clustering only if you supplied just 1 group of regions to computeMatrix.

Command	Expected Input	Explanation
--kmeans	INTEGER	When this option is set, then the matrix is split into the number of indicated clusters using the kmeans algorithm. This will only work for data that is not grouped, otherwise only the first group will be clustered. If more specific clustering methods are required, it is advisable to save the underlying matrix (--outFileNameMatrix) and run the clustering using other software. The plotting of the clustering may fail (Error: Segmentation fault) if a cluster has very few members compared to the total number or regions. (default: None)

heatmapper: Output parameters

Command	Expected Input	Explanation
--outFileNameData	FILENAME	File name to save the data underlying the values for the average profile, e.g. myProfile.tab. This could be used to recreate the summary plot that is usually shown on top of the heatmap (or produced via profiler. (default: None)
--outFileSortedRegions	FILENAME	File name in which the regions are saved after skipping zeros or min/max threshold values (will be a BED file). The order of the regions in the file is the same as in the heatmap. This is useful, for example, to generate other heatmaps that should have the same sorting of the first heatmap (you could run computeMatrix using this sorted BED file and another bigWig file). Example: Heatmap1sortedRegions.bed (default: None)
--outFileNameMatrix	FILE	If this option is given, then the matrix of values underlying the heatmap will be saved using this name, e.g. MyMatrix.tab. (default: None)

[read]: https://github.com/fidelram/deepTools/wiki/Glossary#terminology "the DNA piece that was actually sequenced ("read") by the sequencing machine (usually between 30 to 100 bp long, depending on the read-length of the sequencing protocol)" [input]: https://github.com/fidelram/deepTools/wiki/Glossary#terminology "confusing, albeit commonly used name for the 'no-antibody' control sample for ChIP experiments"

deepTools is developed by the Bioinformatics Facility at the Max Planck Institute for Immunobiology and Epigenetics, Freiburg. For troubleshooting, see our FAQ and get in touch: [email protected]

WIKI-START > deepTools technical documentation

INSTALLING deepTools:

deepTools USAGE:

All command line options

FAQ || Glossary

Tool details

All command line options

For instructions on using deepTools 2.0 or newer, please go here. This page only applies to deepTools 1.5

general principles

Parameters to decrease the run time

filtering BAMs while processing

Table of Content

bamCorrelate

Mandatory arguments

bamCorrelate: optional arguments

only for BED-file mode:

only for bins mode:

bamCorrelate: optional arguments for processing of the reads

bamCorrelate: optional arguments concerning the output

bamCorrelate: arguments for the heatmap display

bamFingerprint

Mandatory argument

bamFingerprint: output options

bamFingerprint: optional arguments

bamFingerprint: Processing options

computeGCBias

Mandatory arguments

computeGCBias: output parameters

computeGCBias: optional arguments

correctGCBias

Mandatory arguments

correctGCBias: optional arguments

correctGCBias: output parameters

bamCoverage

Mandatory argument

bamCoverage: output options

bamCoverage: optional arguments

bamCoverage: BAM to bedGraph/bigWig processing options

bamCompare

Mandatory arguments

bamCompare: output options

bamCompare: optional arguments

bamCompare: BAM to bedGraph/bigWig processing options

computeMatrix

Mandatory arguments

computeMatrix: Optional arguments

only for scale-regions mode:

only for reference-point mode:

computeMatrix: output options

heatmapper

Mandatory arguments

heatmapper: Optional arguments

heatmapper: clustering parameters

heatmapper: Output parameters

Clone this wiki locally