-
Notifications
You must be signed in to change notification settings - Fork 0
Mapping RNA Seq Reads
All your configuration files are stored in the directory mentioned below:
config_dir=~/Mapping/config
The genome reference FASTA file:
reference=dmel56.fa
The reference is indexed:
reference_index=${config_dir}/dmel56
If needed, we can create the index for the genome reference as shown below:
bowtie-build ${reference} ${reference_index}
The transcript annotation:
transcript_annotation=${config_dir}/Drosophila_melanogaster.BDGP5.75.gtf
Two sample files are shown below:
reads1=data/S2-DRSC-14-PE/081121_S2-DRSC-14-1_1_sequence.txt
reads2=data/S2-DRSC-14-PE/081121_S2-DRSC-14-1_2_sequence.txt
We can operate on these files:
du -h ${reads1}
We will save the outputs at:
output=sample1
Now, we call TopHat. The call below saves the transcriptome index to ${config_dir}
.
tophat -p 2 -G ${transcript_annotation} --transcriptome-index ${config_dir} -o ${output} --no-novel-juncs ${reference_index} ${reads1},${reads2}
Now, we sort the output BAM file.
samtools sort ${output}/accepted_hits.bam ${output}/accepted_hits.sorted
And create an index to speed up access.
samtools index ${output}/accepted_hits.sorted.bam
The reference transcriptome:
reference=${config_dir}/dmeltran56.fa
Again, if needed, create and index to the FASTA file associated to the transcriptome.
bwa index ${reference}
Because we already set the output directory and input files, we can simply call BWA:
bwa mem ${reference} ${reads1} ${reads2} > ${output}.sam
The BWA output is in the SAM format. We will now convert it to its binary version (BAM).
samtools import ${reference}.fai ${output}.sam ${output}.unsorted.bam
The BAM file is now sorted and indexed.
samtools sort ${output}.unsorted.bam ${output}
samtools index ${output}.bam