Skip to content

Build Signal Track

Tao Liu (τν) edited this page Aug 16, 2016 · 5 revisions

In this page, I describe how to use MACS v2 to generate signal track by comparing ChIP treatment against control, an essential step for calling peaks. Although MACS v2 main command 'macs2 callpeak' can generate signal tracks internally, for example p-value tracks if p-value is used as cutoff, or q-value track if q-value is used as cutoff, MACS v2 will only keep such signal tracks in memory instead of harddisk in order to save time and disk space. To generate appropriate signal track to profile transcription factor or histone modification enrichment levels over whole genome, users need to use another command 'macs2 bdgcmp' on the bedGraph files generated by 'macs2 callpeak'.

Table of Contents

Install

Currently, MACS2 is on PyPI: https://pypi.python.org/pypi/MACS2/. Installation or upgrade can be done by pip or easy_install command. The current version is 2.1.1.20160309.

Please notice that MACS2 requires Python2.7 and numpy.

Do this on *nix computer with internet connection. Make sure you have permission to write.

 $ easy_install MACS2

or if you have [SetUpVirtualEnv|VirtualEnv] set up:

 $ pip install MACS2

To update old installation:

 $ pip install -U MACS2

Here is a tutorial to process ChIP-seq data to generate signal tracks:

Run MACS2 main program

 $ macs2 callpeak -t H3K36me1_EE_rep1.bam -c Input_EE_rep1.bam  -B --nomodel --extsize 147 --SPMR -g ce -n H3K36me1_EE_rep1

Here:

 * --nomodel and --extsize 147 tell MACS2 use 147bp as fragment size to pileup sequencing reads.
 * -g ce lets MACS2 consider C elegans genome as background. Set it as 'dm' for fly or 'hs' for human
 * -B --SPMR ask MACS2 to generate pileup signal file of 'fragment pileup per million reads' in bedGraph format.

Then you will have these two bedGraph files in the same directory:

 H3K36me1_EE_rep1_treat_pileup.bdg H3K36me1_EE_rep1_control_lambda.bdg

Run MACS2 bdgcmp to generate fold-enrichment and logLR track

 $ macs2 bdgcmp -t H3K36me1_EE_rep1_treat_pileup.bdg -c H3K36me1_EE_rep1_control_lambda.bdg -o H3K36me1_EE_rep1_FE.bdg -m FE
 $ macs2 bdgcmp -t  H3K36me1_EE_rep1_treat_pileup.bdg -c H3K36me1_EE_rep1_control_lambda.bdg -o H3K36me1_EE_rep1_logLR.bdg -m logLR -p 0.00001

Here:

 * -m FE means to calculate fold enrichment. Other options can be logLR for log likelihood, subtract for subtracting noise from treatment sample.
 * -p sets pseudocount. This number will be added to 'pileup per million reads' value. You don't need it while generating fold enrichment track because control lambda will always >0. But in order to avoid log(0) while calculating log likelihood, we'd add pseudocount. Because I set precision as 5 decimals, here I use 0.00001.

Then you will have this bedGraph file for fold-enrichment and logLR.

 H3K36me1_EE_rep1_FE.bdg H3K36me1_EE_rep1_logLR.bdg

Fix the bedGraph and convert them to bigWig files.

You need UCSC tools from (if linux):

http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bedGraphToBigWig

http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bedClip

And bedTools:

http://code.google.com/p/bedtools/

Since sometimes, chromosome names in alignment and bedGraph files are not consistent with UCSC naming convention, you may need to fix the chromosomenames using awk or perl.

Then you need to convert bedGraph to bigWig using this simple script:https://gist.github.com/2469050

You have to download chromosome length file from UCSC too.

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz for human ( I renamed it as hg19.len )

http://hgdownload.cse.ucsc.edu/goldenPath/dm3/database/chromInfo.txt.gz for fly ( I renamed it as dm3.len )

Then:

 $ bdg2bw H3K36me1_EE_rep1_FE.bdg hg19.len $ bdg2bw H3K36me1_EE_rep1_logLR.bdg hg19.len

And you will have these bigwig files:

 H3K36me1_EE_rep1_FE.bw H3K36me1_EE_rep1_logLR.bw

Calculate correlation between replicates.

You need UCSC tool:

http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/wigCorrelate

Run it:

 $ wigCorrelate H3K36me1_EE_rep1_FE.bw H3K36me1_EE_rep2_FE.bw

If the correlation is good enough, you can combine replicates and run MACS2 again.

Run MACS2 combining replicates.

You don't need to use "samtools merge", just

 $ macs2 callpeak -t H3K36me1_EE_rep1.bam H3K36me1_EE_rep2.bam -c Input_EE_rep1.bam Input_EE_rep2.bam -B --nomodel --extsize 147 --SPMR -g ce -n H3K36me1_EE

Then use the similar approaches as step 2 and 3 to generate signal tracks in bigWig format.

You can easily write a script to build a pipeline from step1 to step5.