vClean is a fully automated command-line pipeline for assessing the multiple virus-derived contamination risk of environmental viral genomes.
Run the following commands to install vClean, download the database, and run the program.
Replace </path/to/database>
, <input_fasta_dir>
and <output_dir>
with the correct paths.
conda install -c conda-forge -c bioconda vclean
vclean download_database </path/to/database>
export VCLEANDB=</path/to/database>
vclean run <input_fasta_dir> <output_dir> [options]
You can install vClean as follows:
conda install -c conda-forge -c bioconda vclean python=3.9
Specify python version to be 3.9
You have to download the databases.
Please replace </path/to/database>
with the desired path for downloading the database:
vclean download_database </path/to/database>
You'll need to use the -d
flag or update the VCLEANDB
environment variable to specify the database location:
export VCLEANDB=<path/to/database>
You can simply run vClean as follows:
vclean run <input_fasta_dir> <output_dir> [OPTIONS]
In the output directory, two files are generated: Contamination_probability.tsv
and feature_table.tsv
. For sequences confirmed to be contaminated, new FASTA files containing only the longest contigs are stored in the purified_fasta
directory within the output directory.