Skip to content

Mapping RNA Seq Reads

Benilton Carvalho edited this page Apr 8, 2014 · 20 revisions

To map the RNA-Seq Reads as described in this document, you'll need access to a terminal where you can use samtools, bwa and tophat. For example, an SSH terminal or Shell in a Box.

As soon as you log in to your machine, go to the directory Mapping:

cd ~/Mapping

Setup files

All your configuration files are stored in the directory mentioned below:

config_dir=~/Mapping/config

The genome reference FASTA file:

genome=${config_dir}/dmel56.fa

The reference is indexed:

genome_index=${config_dir}/dmel56

If needed (you don't need), we can create the index for the genome reference (it is already created for you) as shown below:

bowtie-build ${genome} ${genome_index}

The transcript annotation:

transcript_annotation=${config_dir}/Drosophila_melanogaster.BDGP5.75.gtf

Two sample files are shown below:

reads1=data/S2-DRSC-14-PE/081121_S2-DRSC-14-1_1_sequence.txt

reads2=data/S2-DRSC-14-PE/081121_S2-DRSC-14-1_2_sequence.txt

We can operate on these files:

du -h ${reads1}

We will save the outputs at:

output=sample1

Mapping RNA-Seq Reads to the Genome

Now, we call TopHat. The call below saves the transcriptome index to ${config_dir}.

tophat -p 2 -G ${transcript_annotation} --transcriptome-index ${config_dir} -o ${output} --no-novel-juncs ${genome_index} ${reads1},${reads2}

Now, we sort the output BAM file.

samtools sort ${output}/accepted_hits.bam ${output}/accepted_hits.sorted

And create an index to speed up access.

samtools index ${output}/accepted_hits.sorted.bam

Mapping RNA-Seq Reads to the Transcriptome

The reference transcriptome:

transcriptome=${config_dir}/dmeltran56.fa

Again, if needed, create and index to the FASTA file associated to the transcriptome.

bwa index ${transcriptome}

Because we already set the output directory and input files, we can simply call BWA. We align each of the strands independently, as shown below. Then, combine the the SAI files into a SAM file. Here, we use bwa aln because our reads are short (37bp).

bwa aln -t 2 ${transcriptome} ${reads1} > ${output}/samplet1_1.sai

bwa aln -t 2 ${transcriptome} ${reads2} > ${output}/samplet1_2.sai

bwa sampe ${transcriptome} ${output}/samplet1_1.sai ${output}/samplet1_2.sai ${reads1} ${reads2} > ${output}/sample1t.sam

The BWA output is in the SAM format. We will now convert it to its binary version (BAM).

samtools view -bS ${output}/sample1t.sam > ${output}/sample1t.unsorted.bam

The BAM file is now sorted and indexed.

samtools sort ${output}/sample1t.unsorted.bam ${output}/sample1t.sorted

samtools index ${output}/sample1t.sorted.bam