MGJW Problem Set 2

Part 1

Try this command to list all environments installed on your computer (conda info --envs)
Try this command to list all packages installed in the mash environment (conda list -n mash)
Let's try using FASTQC, you need to activate its environment first. You will use files from the folder that you downloaded for Problem Set 1.

conda activate fastqc
cd MGJW/problem_set1/fastq
gunzip *.gz
fastqc genome1_R1.fastq genome1_R2.fastq

Let's explore the html file. Windows User may need to copy the file to another directory so they can open the html file in a browser. cp /home/Ubuntu_user_name/MGJW/problem_set1/fastq/file.html /mnt/c/Users/Windows_username/Desktop/ You will notice that the number of reads is really low. Some of our samples may fail quite a few quality metrics used by FastQC. However, this does not mean that our samples should be discarded as this may or may not be a problem for your downstream application.

Let's check the mockdna sample in the fastq folder from problem set 1. Mash can handle one file for the sample so we will have to concatenate our reads.

conda activate mash
cd ~/MGJW/problem_set1/fastq
cat mockdna_R1.fastq mockdna_R2.fastq > mockdna_reads.fastq
gunzip ../RefSeqSketches.msh.gz
mash screen -w -p 8 ../RefSeqSketches.msh mockdna_reads.fastq > mockdna_screen_winning.tab
less mockdna_screen_winning.tab
sort -gr mockdna_screen_winning.tab > mockdna_screen_winning_sorted.tab
less mockdna_screen_winning_sorted.tab

Let's check another sample from problem set 1 and see how this compares to previous results.

cd ~/MGJW/problem_set1/fasta
mash screen -w -p 8 ../RefSeqSketches.msh genome2.fasta > genome2_screen_winning.tab
sort -gr genome2_screen_winning.tab > genome2_screen_winning_sorted.tab
less genome2_screen_winning_sorted.tab

Part 2

cd ~/MGJW/problem_set1/fasta
try mash dist genome3.fasta genome4.fasta
Let's make our own mash database using mash paste for the three genomes in the fasta directory (MGJW/problem_set1/fasta)
- mash sketch -s 1000 -k 21 genome2.fasta
- mash sketch -s 1000 -k 21 genome3.fasta
- mash sketch -s 1000 -k 21 genome4.fasta
- mash paste genomes.msh genome2.fasta.msh genome3.fasta.msh genome4.fasta.msh
- mash info genomes.msh | head -n 20
Save your history of commands in a file called commands_notes_PS2.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem_Set2.md

Problem_Set2.md

MGJW Problem Set 2

Part 1

Part 2

Files

Problem_Set2.md

Latest commit

History

Problem_Set2.md

File metadata and controls

MGJW Problem Set 2

Part 1

Part 2