Construction of mitochondrial reference genomes of the Presbytis femoralis group using multiple sequence alignment for new species identification

ZB4171-Mandolin: Ng Jing Ting, Quek Zhi Heng, and Tay Hui Yi

This project was done under ZB4171, a module from the National University of Singapore.

Our project explores prior work done by Meier et. al (2020). Contrary to what was done in their work, we suggest the use of all available samples for P. f. percura and P. f. presbytis to construct multiple proposed reference genomes for each sample before using multiple sequence alignment to produce a final reference genome that chooses the most conserved sample instead of relying solely on coverage. A comparison between the results of this method and the paper could shed light on potential sample biases and serve to validate the results of the study.

This repository contains a the dataset and code files required to replicate the bioinformatics analysis done by Meier et. al. This workflow is conducted on AWS, using Nextflow. The detailed set-up may be found below.

EC2 specificiations and set up

Recommended instance type: c5.4xlarge
Recommended storage size: >= 250 GB

In the case where an extra volume is required, you may need to create and mount the new volume: https://devopscube.com/mount-ebs-volume-ec2-instance/

Downloading conda

Refer to the set-up here: https://docs.conda.io/projects/miniconda/en/latest/
To restart terminal session without quitting, type “source ~/.bashrc

Installing Nextflow

Refer to the set-up here: https://anaconda.org/bioconda/nextflow
To check if Nextflow is downloaded successfully, run $nextflow run hello.

Installing Docker

Refer to the set-up here: https://www.cyberciti.biz/faq/how-to-install-docker-on-amazon-linux-2/
Check if docker can pull image : $docker pull hello-world

Potential problems encountered: Configure docker (if permission denied in above)

sudo usermod -a -G docker $USER
reboot

Downloading fastq files using SRA tools

Refer to the set-up here: https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit
Follow steps 1, 2, 3, and 8 (minimally).

To bypass exporting the path every time you enter the machine, do the following

vi ~/.bashrc
Go to the bottom of the file
Press “o” to insert a new line
Add the following command at the bottom of the file
export PATH=$PATH:$PWD/sratoolkit.3.0.7-ubuntu64/bin
Press “esc” and type “:x” to save file
source ~/.bashrc

Potential problems encountered: aws pkcs7 error when using prefetch

go to aws console under actions > instance settings > modify instance metadata options > change IMDSv2 from required to optional

Carry out prefetch and fastq-dump (eg):

$prefetch SRR10277272
$fastq-dump SRR10277272 --split-3 --skip-technical --gzip

To create and attach shared volume to your own EC2 instance (note to do this before proceeding to add other volumes):

Go to AWS Console and under EC2, click “Volumes” at the side and then “Create Volume”
Select “io2” for Volume Type, enable multi-attach and ensure that the availability zone is correct (same as your instance)
Click on the link of the volume you just created and copy paste the volume ID of the newly created volume and paste it below once you are done with all the steps
Under actions select “Attach Volume”
Select your instance that you are running (if you don't see it likely the ec2 instance is not in ap-southeast-1a,change it)
For device name, change where appropriate (eg. the last letter of device name)
Once the attaching and all is done, ssh into your ec2 instance
Type the command lsblk and check that there is a new disk is there (likely it's called nvme1n1)
Type the command: sudo mkfs -t xfs /dev/nvme1n1
Type the command sudo file -s /dev/nvme1n1 and check that the output should looks something like: “SGI XFS filesystem data (blksz 4096, inosz 512, v2 dirs)”
Type the command: sudo mkdir /shared_drive_{your initials} eg.(sudo mkdir /shared_drive_zh)
Type the command: sudo chmod 777 /shared_drive_{your initials}
Type the command: sudo mount /dev/nvme1n1 /shared_drive_{your initials}
Type sudo cp /etc/fstab /etc/fstab.orig to back up the original file in case there are errors
Type sudo blkid and copy the UUID of the /dev/nvme1n1 device
Type sudo vim /etc/fstab
Add the following line to the end of the file: UUID={copied UUID from step 11} /shared_drive_{your initials} xfs defaults,nofail 0 2
Save the changes on the file and exit Vim
Type the command: sudo umount /shared_drive_{your initials}
Type the command sudo mount -a
Reboot your ec2 instance and ssh in again
Type df -h and see if the /shared_drive_{your initials} path appears in the last column

To add other members volumes to your instance:

You would require the links of the volume IDs of other members
Repeat steps 1, 2, 3, 9. For each new volume that you attach, the nvme path will change so its likely to be nvme1n2/nvme1n3, for that just change the path accordingly in the respective steps.
Replace the initials (names) where appropriate

MITOS Annotation Results

Accession Number	Sample ID	Species	MITOS Annotation Link
SRR10277267	BLM5	Presbytis femoralis femoralis	BLM5
SRR10277268	BLM4	Presbytis femoralis femoralis	BLM4
SRR10277269	BLM3	Presbytis femoralis femoralis	BLM3
SRR10277270	BLM2	Presbytis femoralis femoralis	BLM2
SRR10277271	BLM1	Presbytis femoralis femoralis	BLM1
SRR10277272	Pres2	Presbytis siamensis cf. cana	Pres2
SRR10277273	ESBL_8b	Presbytis femoralis percura	No reconstructed reference
SRR10277274	ESBL_6a	Presbytis femoralis percura	ESBL_6a
SRR10277275	BLM6	Presbytis femoralis femoralis	BLM6
SRR10277276	ESBL_5	Presbytis femoralis percura	ESBL_5
SRR10277277	ESBL_1a	Presbytis femoralis percura	ESBL_1a

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
config		config
reference		reference
src		src
LICENSE		LICENSE
README.md		README.md
TruSeq3-PE.fa		TruSeq3-PE.fa
env.yml		env.yml
nextflow.config		nextflow.config
pipeline_part_1.nf		pipeline_part_1.nf
pipeline_part_2.nf		pipeline_part_2.nf
pipeline_trace.txt		pipeline_trace.txt
samplesheet_all_v2.csv		samplesheet_all_v2.csv
trace.txt		trace.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Construction of mitochondrial reference genomes of the Presbytis femoralis group using multiple sequence alignment for new species identification

ZB4171-Mandolin: Ng Jing Ting, Quek Zhi Heng, and Tay Hui Yi

EC2 specificiations and set up

Downloading conda

Installing Nextflow

Installing Docker

Downloading fastq files using SRA tools

To create and attach shared volume to your own EC2 instance (note to do this before proceeding to add other volumes):

To add other members volumes to your instance:

MITOS Annotation Results

About

Releases

Packages

Contributors 3

Languages

License

TayHuiYi/Bioinformatics-for-mitochondria-reconstruction

Folders and files

Latest commit

History

Repository files navigation

Construction of mitochondrial reference genomes of the Presbytis femoralis group using multiple sequence alignment for new species identification

ZB4171-Mandolin: Ng Jing Ting, Quek Zhi Heng, and Tay Hui Yi

EC2 specificiations and set up

Downloading conda

Installing Nextflow

Installing Docker

Downloading fastq files using SRA tools

To create and attach shared volume to your own EC2 instance (note to do this before proceeding to add other volumes):

To add other members volumes to your instance:

MITOS Annotation Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages