-
Notifications
You must be signed in to change notification settings - Fork 0
Download test data
For this tutorial we'll use experimental CHIP-seq data, for the transcription factor CTCF in the K562 cell line, which is available on the ENCODE data portal. There are 5 such experiments that we find in ENCODE, you can see them listed here CHIP-seq CTCF K562 . We'll restrict ourselves to one experiment ENCSR000EGM
Download the .bam files for the two replicates for the experiment ENCSR000EGM. The two replicates are isogenic replicates (biological). A more detailed explanation of the various types of replicates can be found here.
Links to the replicate bam files provided below.
wget https://www.encodeproject.org/files/ENCFF198CVB/@@download/ENCFF198CVB.bam -O rep1.bam
wget https://www.encodeproject.org/files/ENCFF488CXC/@@download/ENCFF488CXC.bam -O rep2.bam
Now download the bam files from control ENCSR000EHI for the experiment, which is available here:
wget https://www.encodeproject.org/files/ENCFF023NGN/@@download/ENCFF023NGN.bam -O control.bam
Finally, download the reference files. In the example below, some preprocessing is required to filter out unwanted chromosomes from the hg38.chrom.sizes file. Additionally, the blacklist file shown is specific to hg38, and should be replaced with a genome-specific blacklist if alternative genomes are used.
Available Blacklists:
For those interested in using the blacklists, a current version for dm3, dm6, ce10, ce11, mm10, hg19, and hg38 are available in the lists/ folder at https://github.com/Boyle-Lab/Blacklist/
Please cite:
Amemiya, H.M., Kundaje, A. & Boyle, A.P. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci Rep 9, 9354 (2019). https://doi.org/10.1038/s41598-019-45839-z
# download genome refrence
wget https://www.encodeproject.org/files/GRCh38_no_alt_analysis_set_GCA_000001405.15/@@download/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz \
-O hg38.genome.fa.gz
gunzip hg38.genome.fa.gz
# index genome reference
samtools faidx hg38.genome.fa
# download chrom sizes
wget https://www.encodeproject.org/files/GRCh38_EBV.chrom.sizes/@@download/GRCh38_EBV.chrom.sizes.tsv
# exclude alt contigs and chrEBV
grep -v -e '_' -e 'chrEBV' GRCh38_EBV.chrom.sizes.tsv > hg38.chrom.sizes
rm GRCh38_EBV.chrom.sizes.tsv
# make file with chromosomes only
awk '{print $1}' hg38.chrom.sizes > chroms.txt
# download blacklist
wget https://www.encodeproject.org/files/ENCFF356LFX/@@download/ENCFF356LFX.bed.gz -O blacklist.bed.gz
gunzip blacklist.bed.gz