metamage is a workflow for taxonomic classification, assembly, binning and annotation of short-read host-associated metagenomics datasets.
graph TD;
reads[(Short-read paired-end metagenomics data)]-->hostread(Trimming and host read removal)
hostread-->|Reads| tax(Taxonomic classification with Kaiju)
hostread-->|Reads| assem(Assembly with MEGAHIT)
assem-.->|Assembled contigs| metaq(MetaQuast evaluation)
assem-->|Assembled contigs| func(Functional annotation)
assem-->|Assembled contigs| binprep(Binning preparation)
hostread-->|Reads| binprep
assem-->|Assembled contigs| bin(Binning with MetaBAT2)
binprep-->|Depth file| bin
It's composed of:
- fastp for read trimming and other general pre-processing 1
- BowTie2 for mapping to the host genome and extracting unaligned reads 2
- Macrel for predicting Antimicrobial Peptide (AMP)-like sequences from contigs 4
- fARGene for identifying Antimicrobial Resistance Genes (ARGs) from contigs 5
- Gecco for predicting biosynthetic gene clusters (BCGs) from contigs 6
- Prodigal for protein-coding gene prediction from contigs. 7
- Kaiju for taxonomic classification 10
- KronaTools for visualizing taxonomic classification results
- |metamage
- |{sample_name}
- |{sample_name}_bt_idx - Host genome BowTie index
- |{sample_name}_bt_unaligned - Reads that didn't align to the host genome
- |fastp_results - Results from trimming with fastp
- |kaiju
- |MEGAHIT
- |MetaQuast - Assembly evaluation report
- |{sample_name}_assembly_idx - BowTie Index from assembly data
- |{sample_name}_assembly_sorted.bam - Reads aligned to assembly contigs
- |METABAT
- |fargene_results
- |gecco_results
- |macrel_results
- |prodigal_results
- |{sample_name}
-
Kaiju indexes can be generated based on a reference database but you can also find some pre-built ones in the sidebar of the Kaiju website.
-
Reference host genomes can be acquired from a variety of databases, for example Ensembl.
Footnotes
-
Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu; fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34, Issue 17, 1 September 2018, Pages i884–i890, https://doi.org/10.1093/bioinformatics/bty560 ↩
-
Langmead B, Wilks C., Antonescu V., Charles R. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics. bty648. ↩
-
Li, D., Luo, R., Liu, C.M., Leung, C.M., Ting, H.F., Sadakane, K., Yamashita, H. and Lam, T.W., 2016. MEGAHIT v1.0: A Fast and Scalable Metagenome Assembler driven by Advanced Methodologies and Community Practices. Methods. ↩
-
Santos-Júnior CD, Pan S, Zhao X, Coelho LP. 2020. Macrel: antimicrobial peptide screening in genomes and metagenomes. PeerJ 8:e10555. DOI: 10.7717/peerj.10555 ↩
-
Berglund, F., Österlund, T., Boulund, F., Marathe, N. P., Larsson, D. J., & Kristiansson, E. (2019). Identification and reconstruction of novel antibiotic resistance genes from metagenomes. Microbiome, 7(1), 52. ↩
-
Accurate de novo identification of biosynthetic gene clusters with GECCO. Laura M Carroll, Martin Larralde, Jonas Simon Fleck, Ruby Ponnudurai, Alessio Milanese, Elisa Cappio Barazzone, Georg Zeller. bioRxiv 2021.05.03.442509; doi:10.1101/2021.05.03.442509 ↩
-
Hyatt, D., Chen, GL., LoCascio, P.F. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). https://doi.org/10.1186/1471-2105-11-119 ↩
-
Twelve years of SAMtools and BCFtools Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li GigaScience, Volume 10, Issue 2, February 2021, giab008, https://doi.org/10.1093/gigascience/giab008 ↩
-
Alla Mikheenko, Vladislav Saveliev, Alexey Gurevich, MetaQUAST: evaluation of metagenome assemblies, Bioinformatics (2016) 32 (7): 1088-1090. doi: 10.1093/bioinformatics/btv697 ↩
-
Kang DD, Li F, Kirton E, Thomas A, Egan R, An H, Wang Z. 2019. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7:e7359 https://doi.org/10.7717/peerj.7359 ↩