Skip to content

Commit

Permalink
added tutorials to _episodes
Browse files Browse the repository at this point in the history
  • Loading branch information
waltercostamb committed Jun 12, 2024
1 parent e3021db commit 3f443e6
Show file tree
Hide file tree
Showing 11 changed files with 238 additions and 0 deletions.
44 changes: 44 additions & 0 deletions _episodes/1.1.0_introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Introduction to the module

In this module you will learn how to analyse bacteriophage sequences with computational approaches. The theoretical part will be covered in the mornings and include a series of questions. Document the questions and answers in a lab book. These will also be considered in your final evaluation. The evaluation will be composed of:

- Performance in the dry lab, lab journal (60%)
- Presentation of the final project (40%)

To write a good lab book be attentive to:

- Write in the lab book every day, documenting your tasks;
- Structure it in cronological order, so that each section corresponds to a different day.

To keep up with a good organization, in your practical experiments, use a logical order for naming your folders. For instance: 0_download, 1_quality_evaluation, 2_assembly. Always name them in your lab book, so that you can find your files while reading the lab book in the future.

# Lab Book

Documenting your work is crucial in Computational Biology/Bioinformatics. This way you can make sure your work is reproducible,
you can transfer text to other documents, such as reports or papers and you can send it to collegues, so that they can access your work.

You are required to write a Lab Book in markdown for this module, which will count as part of your evaluation. If you want, you can start a GITHub repository for the course and write your lab book there. References:

- https://commonmark.org/help/
- https://www.markdownguide.org/basic-syntax/

You should divide the lab book into sections (days) and add subsections for different tasks.
Each subsection should have a meaningful title, e. g. "Documenting module *Viromics-Bioinformatics*", "Downloading phage sequences", "Evaluating sequence quality". Any relevant information should be included, such as websites used in searches, folders
where you can find files, tool names and versions and bibliographic references. An example can be seen in the link:
https://github.com/waltercostamb/course_viral-microbiology/blob/main/tutorials/lab-book.pdf. Start your lab book today and write on it
whenever necessary during the module. In the last day, send a copy to your advisor.

The paper "Ten Simple Rules for a Computational Biologist’s Laboratory Notebook", by Santiago Schnell of 2015 offers interesting insights: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004385 Most relevant are rules 4, 6 and 10. Keeping a hard copy of your lab book is not necessary, however make sure you have it backed up.

## Supplementary reference

- "Ten Simple Rules for Reproducible Computational Research", by Geir Sandve and collaborators, 2013: https://app.dimensions.ai/details/publication/pub.1022987921

# Access to draco

[Draco](https://wiki.uni-jena.de/pages/viewpage.action?pageId=22453002) is a high-performance cluster created and maintained by the Universitätsrechenzentrum. It is [available for members of Thuringian Universities](http://sternb.gitpages.tpi.uni-jena.de/draco-101-2023-01/#5). To log in, you can use [ssh](http://sternb.gitpages.tpi.uni-jena.de/draco-101-2023-01/#15):

```
ssh <fsuid>@login1.draco.uni-jena.de
```

26 changes: 26 additions & 0 deletions _episodes/1.1.1_introduction_viromics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Introduction to Viromics

Watch the lecture below and write down at least 3 questions and/or discussion points about it:

- Click on the image to see lecture "Viral metagenomics: predicting phage-microbe interactions in the gut", by Prof Bas E. Dutilh:

[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/xm2iEK4Jj90/0.jpg)](https://www.youtube.com/watch?v=xm2iEK4Jj90)

Additional reading: https://www.sciencedirect.com/science/article/abs/pii/B9780128145159000345

Answer the following questions:

- What is metagenomics?
- What is viromics?
- What are bacteriophages? How do they fit into metagenomics?
- How do you asses the viromic community of a biological sample?
- Describe FASTA and FASTQ formats
- What do you expect the difference in diversity between viromes and metagenomes to be?

Objectives:

- Understand what is a metagenomics study
- Understand what are bacteriophages and how they relate to prokaryotic communities
- Understand specifics of sequencing applied to the study of viromics data
- Understand FASTA and FATSQ file formats
- Understand diversity in microbial and viral communities
21 changes: 21 additions & 0 deletions _episodes/1.1.2_sequencing_quality.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Download sequences and evaluate sequencing quality

Watch the lecture below and write down at least 3 questions and/or discussion point about it:

https://www.youtube.com/watch?v=OMzshfgJ4PQ

- Click on the image to see lecture "Basic file formats in bioinformatics", by Prof Bas E. Dutilh:
[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/D4WDdAbZW1Y/0.jpg)](https://www.youtube.com/watch?v=D4WDdAbZW1Y)

Copy viromics sequences following the teacher's directions. These will be used in your project.

Answer the questions below using bash. Useful bash commands, by Varada Khot, can be found here: https://github.com/vmkhot/useful-scripts/blob/main/Linux%20Commands%20Cheat%20Sheet.md

- how many lines does your FASTQ file has?
- how many sequences are there in your FASTQ file?
- Print the first 10 lines of your file in the terminal
- Print the last 10 lines of your file in the terminal
- Describe what you see in these lines and what they mean
- What is the GC content of your files?
- Create a plot of frequency (y-axis) versus GC content (x-axis) and describe the distribution. Use python (seaborn distplot, matplotlib hist) or R (hist or ggplot geom_hist)
- Evaluate the sequence quality following: https://github.com/vmkhot/Metagenome-workflows/blob/main/Nanopore-Long-Reads/Quality%20Control.md
32 changes: 32 additions & 0 deletions _episodes/1.2.1_assembly_I.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Assembly I

Watch the lecture below and write down at least 3 questions and/or discussion point about it:

- Click on the image to see the lecture "Assembly strategies for genomics and metagenomics", by Prof Bas E. Dutilh:

[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/mHmMbPxKmn0/0.jpg)](https://www.youtube.com/watch?v=mHmMbPxKmn0)

- Click on the image to watch the video:

[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/MgdfZTA-J3o/0.jpg)](https://www.youtube.com/watch?v=MgdfZTA-J3o)

Answer the following questions:

- What is sequence assembly?
- How would you detect mutations in a model organism after an evolutionary experiment?
- How would you determine the genome sequence of an unknown organism?
- What are the strengths and weaknesses of DBG and OLC?

Objectives:

- Understand what is sequence assembly
- Understand the difference between assembly algorithms

# Supplementary reference

The book "Computational Biology: Genomes, Networks, Evolution MIT course 6.047/6.878" by Prof. Manolis Kellis is part of a course on Computational Biology and contains several topics that are relevant for Bioinformatics: https://ocw.mit.edu/ans7870/6/6.047/f15/MIT6_047F15_Compiled.pdf

Read the following sections and summarize their key points in your lab book:

- "5.2 Genome Assembly I: Overlap-Layout-Consensus Approach" and "5.3 Genome Assembly II: String graph methods" (pages 93 to 102)

12 changes: 12 additions & 0 deletions _episodes/1.2.2_assembly_II.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Assembly II

In this experimental part, use the tool [Flye](https://github.com/fenderglass/Flye) to assemble your reads following Varada Khot's directions [for Metagenomes](https://github.com/vmkhot/Metagenome-workflows/blob/main/Nanopore-Long-Reads/Nanopore%20Assembly.md#for-metagenomes).

- Assembly https://aldertzomer.github.io/Microbial-Genomics-2022/30_Assembly/index.html
- Quality of assemly: https://github.com/vmkhot/Metagenome-workflows/blob/main/Illumina-Short-Reads/Assembly.md#how-good-is-my-assembly

- Add questions to guide students
- Where do you expect the longest contigs. Take into account genome size and diversity
- Look in the files, to check format, understand output, make sense of it
- What is the longest, shortest contig
- BLAST online to see what it is
25 changes: 25 additions & 0 deletions _episodes/1.3.2_identify_viral_contigs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Identify viral contigs

In this experimental part, you will identify viral contigs, assess genome completeness andfilter only high quality ones to use for subsequent analysis.

Before running Jaeger
- (virome and metagenome (pre-assembled) should be in dir /path/to/dir
- virome (small size fraction) and metagenome (larger size fraction)

- Where do you expect the most hits? (Before running tool)

Run Jaeger
- Add command line

After running Jaegar
Questions to think about
- Take the longest contig from each dataset and blast it
- Plot from jaegar score per genome length?
- How many viral contigs are there in each assembly?
- Why are not all contigs in the virome identified as viral contigs?
- Why are viral contigs also identified in the metagenome?
- size differences between viral and non-viral contigs

- How would you rank Jaeger compared to other virus detection tools? (After running tool)

Check completeness with CheckV
7 changes: 7 additions & 0 deletions _episodes/1.4.1_gene_finding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Gene Finding

Youtube links for gene annotation
Rob: https://www.youtube.com/watch?v=ecJ1DqVvuFE&pp=ygUJcGhhbm90YXRl
Katelyn: https://www.youtube.com/watch?v=gvnPsA1S6GY&pp=ygUJcGhhbm90YXRl
Evelien Adrianssens: https://www.youtube.com/watch?v=wO1w1Z1Or1w&pp=ygUJcGhhbm90YXRl

20 changes: 20 additions & 0 deletions _episodes/1.4.2_gene_finding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Gene finding

Use [Phanotate](https://github.com/deprekate/PHANOTATE) to annotate contigs and answer the following questions:

Before running the tool:

- What is an ORF?
- How and why do you predict ORFs?
- Would you expect ORFs to overlap on the genome?
- Would a tool for predicting ORFs in bacteria work for viruses?
- How do you know an ORF is really a protein coding gene?

After running the tool:

- What does the output file look like?
- How many ORFs did the predict predict?
- What is the longest ORF?
- Did you find any overlapping ORFs in a contig?

Find similar contigs with a custering method: AAI ?
3 changes: 3 additions & 0 deletions _episodes/2.1.2_host_prediction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Host prediction

RafAH: https://www.cell.com/patterns/fulltext/S2666-3899(21)00100-8
38 changes: 38 additions & 0 deletions _episodes/2.2.2_taxonomy and Phylogeny.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Taxonomy and Phylogeny

## Morning

Check out the ICTV website: https://ictv.global/

- What is ICTV?
- What is a realm?
- How many viral genera are there?

## Afternoon

VContact2, Genomad.

You should use only high quality genomes for this analysis.

- How does Genomad work?
- Strengths vs weaknesses
- Taxonomic annotation of genomes with Genomad
- How many genomes are taxonomically annotated
- Describe the output

- How does VContact2 work
- Strengths vs weaknesses
- Make a VContact2 plot
- Add predicted host to the plot
- How many genomes are taxonomically annotated
- Describe the output

# Phylogeny

- How can you make a phylogeny of viruses?
- Pros vs cons of making phylogeny with marker gene vs full genome
- Choose a marker gene to create your phylogeny: what was the most frequently annotated PHROG?
- Align your marker genes (MAFFT, clustalw, etc.)
- Use iqtree to build the tree
- Use iTOL or iroki https://www.iroki.net/ to visualize the tree

10 changes: 10 additions & 0 deletions _episodes/2.3.1_research_question.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Research Question

When thinking of a new project and proposing scientific questions, the first step is to go to the scientific literature. You will get acquainted to the body of knowledge of that specific field and notice limitations. Based on this you can think of your own questions. Use (one of) the articles below (or chose one yourself) as inspiration to create your own research question(s).

- [A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes](https://www.nature.com/articles/ncomms5498), by Dutilh *et al* (2014)
- [Bacteriophage evolution differs by host, lifestyle and genome](https://www.nature.com/articles/nmicrobiol2017112), by Mavrich and Hatfull (2017)
- [Evolution of BACON Domain Tandem Repeats in crAssphage and Novel Gut Bacteriophage Lineages](https://www.mdpi.com/1999-4915/11/12/1085), by de Jonge *et al* (2019)
- [Virus Bioinformatics](https://www.sciencedirect.com/science/article/abs/pii/B9780128145159000345), by Pappas *et al* (2021)
- [High viral abundance and low diversity are associated with increased CRISPR-Cas prevalence across microbial ecosystems](https://www.sciencedirect.com/science/article/pii/S0960982221014615), by Meaden *et al* (2022)
- [Viruses interact with hosts that span distantly related microbial domains in dense hydrothermal mats](https://www.nature.com/articles/s41564-023-01347-5), by Hwang *et al* (2023)

0 comments on commit 3f443e6

Please sign in to comment.