Mapping RNA Seq Reads

All your configuration files are stored in the directory mentioned below:

config_dir=~/Mapping/config

The genome reference FASTA file:

reference=dmel56.fa

The reference is indexed:

reference_index=${config_dir}/dmel56

If needed, we can create the index for the genome reference as shown below:

bowtie-build ${reference} ${reference_index}

The transcript annotation:

transcript_annotation=${config_dir}/Drosophila_melanogaster.BDGP5.75.gtf

Two sample files are shown below:

reads1=data/S2-DRSC-14-PE/081121_S2-DRSC-14-1_1_sequence.txt

reads2=data/S2-DRSC-14-PE/081121_S2-DRSC-14-1_2_sequence.txt

We can operate on these files:

du -h ${reads1}

We will save the outputs at:

output=sample1

Mapping RNA-Seq Reads to the Genome

Now, we call TopHat. The call below saves the transcriptome index to ${config_dir}.

tophat -p 2 -G ${transcript_annotation} --transcriptome-index ${config_dir} -o ${output} --no-novel-juncs ${reference_index} ${reads1},${reads2}

Now, we sort the output BAM file.

samtools sort ${output}/accepted_hits.bam ${output}/accepted_hits.sorted

And create an index to speed up access.

samtools index ${output}/accepted_hits.sorted.bam

The reference transcriptome:

reference=${config_dir}/dmeltran56.fa

Again, if needed, create and index to the FASTA file associated to the transcriptome.

bwa index ${reference}

Because we already set the output directory and input files, we can simply call BWA:

bwa mem ${reference} ${reads1} ${reads2} > ${output}.sam

The BWA output is in the SAM format. We will now convert it to its binary version (BAM).

samtools import ${reference}.fai ${output}.sam ${output}.unsorted.bam

The BAM file is now sorted and indexed.

samtools sort ${output}.unsorted.bam ${output}

samtools index ${output}.bam