- Provides an enrichment score for VLP viromes with respect to metagenomes
- Useful benchmark for the quality of enrichment of a virome
- Tested on Linux Ubuntu Server 16.04 LTS and on Linux Mint 19
Requires:
- Bowtie2 >= v. 2.3.4
- Samtools >= 1.3.1
- Biopython >= 1.69
- Pysam >= 0.14
- Diamond (tested on v.0.9.9 and 0.9.29)
- Python3 (tested on 3.6)
- pandas >= 0.20
Update: ViromeQC now works with newer versions of diamond (e.g. v0.9.29) Thanks to Ryan Cook (@RyanCookAMR) for the new diamond db
git clone --recurse-submodules https://github.com/SegataLab/viromeqc.git
or download the repository from the releases page
This steps downloads the database file. This needs to be done only the first time you run ViromeQC. This may require a few minutes, depending on your internet connection.
viromeQC.py --install
Alternatively, you can also download the database files from Zenodo. Once downloaded the files, create a folder named index/
in the ViromeQC installation folder and unzip all the files in this folder.
viromeQC.py -i <input_virome_file(s)> -o <report_file.txt>
Please Note: You can pass more than one file as input (e.g. for multiple runs or paired end reads). However, you can process only one sample at a time with this command. If you want to parallelize the execution, this can be easily done with Parallel or equivalent tools.
You can try the test example (test/test.sh
) which analyzes 10'000 reads from the sample SRR829034
. This should take approximately 1 or 2 minutes.
Parameters:
usage: viromeQC.py -i <input_virome_file> -o <report_file.txt>
optional arguments:
-h, --help show this help message and exit
-i [INPUT [INPUT ...]], --input [INPUT [INPUT ...]]
Raw Reads in FASTQ format. Supports multiple inputs
(plain, gz o bz2) (default: None)
-o OUTPUT, --output OUTPUT
output file (default: None)
--minlen MINLEN Minimum Read Length allowed (default: 75)
--minqual MINQUAL Minimum Read Average Phred quality (default: 20)
--bowtie2_threads BOWTIE2_THREADS
Number of Threads to use with Bowtie2 (default: 4)
--diamond_threads DIAMOND_THREADS
Number of Threads to use with Diamond (default: 4)
-w {human,environmental}, --enrichment_preset {human,environmental}
Calculate the enrichment basing on human or
environmental metagenomes. Defualt: human-microbiome
(default: human)
--bowtie2_path BOWTIE2_PATH
Full path to the bowtie2 command to use, deafult
assumes that bowtie2 is present in the system path
(default: bowtie2)
--diamond_path DIAMOND_PATH
Full path to the diamond command to use, deafult
assumes that diamond is present in the system path
(default: diamond)
--version Prints version informations (default: False)
--install Downloads database files (default: False)
--sample_name SAMPLE_NAME
Optional label for the sample to be included in the
output file (default: None)
--tempdir TEMPDIR Temporary Directory override (default is the system
temp. directory) (default: None)
ViromeQC starts from FASTQ files (compressed files are supported), and will:
- Elimitate short and low quality reads
- adjust the
minqual
andminlen
parameters if you want to change the thresholds
- adjust the
- Map the reads against a curated collection of rRNAs and single-copy bacteral markers
- Filter the reads to remove short and dlsivergent alignments
- Compute the enrichment value of the sample, compared to the median observed in human metagenomes
- use
-w environmental
for envronmental reads - reference medians for un-enriched metagenomes are taken from
medians.csv
, you can provide your own data to ViromeQC by changing this file accordingly
- use
- Produce a report file with the alignment rates and the final enrichment score (which is the minimum enrichment observed across SSU-rRNA, LSU-rRNA and single-copy markers)
Output is given as a TSV file with the following structure:
Sample | Reads | Reads_HQ | SSU rRNA alignment (%) | LSU rRNA alignment (%) | Bacterial_Markers alignment (%) | total enrichmnet score |
---|---|---|---|---|---|---|
your_sample.fq | 40000 | 39479 | 0.00759898 | 0.0227969 | 0.01266496 | 5.795329 |
- An alignment score of 5.8 means that the virome is 5.8 times more enriched than a comparable metagenome
- High score (e.g. 10-50) reflect high VLP enrichment
If you find this tool useful, please cite:
Zolfo, M., Pinto, F., Asnicar, F., Manghi, P., Tett A., Segata N. Detecting contamination in viromes using ViromeQC, Nature Biotechnology 37, 1408–1412 (2019)