-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Tessa Alexanian edited this page Aug 23, 2024
·
9 revisions
commec
is a tool for DNA sequence screening that is part of the
Common Mechanism.
commec
provides three main subcommands::
-
screen
: Run Common Mechanism screening on an input FASTA file. -
flag
: Parse all.screen
files in a directory and create two CSV files of flags raised. -
split
: Split a multi-record FASTA file into individual files, one for each record.
The tool is designed to:
- Screen sequences down to 50 base pairs in length.
- Sensitively identify sequences of concern known to contribute to pathogenicity
- Flag regulated pathogens, including those listed on the Australia Group Common Control List and a variety of national control lists, including those from India, China, and South Africa.
The screen
command runs the input FASTA through four steps:
- Biorisk scan (uses a hmmer search against custom databases)
- Regulated protein homology search (uses a BLASTX or DIAMOND search against NCBI nr)
- Regulated nucleotide homology search (uses BLASTN against NCBI nt)
- Benign scan (users hmmer, cmscan and BLASTN against custom databases)
The biorisk scan to identify sequences of concern can be run in seconds on a laptop using under 1 Gb of curated databases. The complete protein homology search is designed for high-performance bioinformatics environments and requires 275-650 Gb of reference databases and at least 20 Gb of RAM. See Install Guide for more details.