-
Notifications
You must be signed in to change notification settings - Fork 1
Command Line Usage
This guide provides instructions on how to use commec
for DNA sequence screening. commec
provides three main subcommands:
# Run screening on a FASTA file
commec screen -d /path/to/databases input.fasta
# Parse screen files and generate flag CSVs
commec flag /path/to/directory/with/screen/files
# Split a multi-record FASTA file into individual files, one for each record
commec split input.fasta
To screen a FASTA file, run:
commec screen -d ~/path/to/databases input.fasta
screen
has two required arguments:
- input FASTA file: Path to FASTA file to screen
-
-d
,--databases
: Path to the directory containing the required databases
Optional arguments:
-
-o
,--output
: Output prefix (default: input filename) -
-t
,--threads
: Number of threads to use (default: 1) -
-p
,--protein-search-tool
: Tool for homology search (choices: "blastx", "diamond", default: "blastx")
Flags:
-
-f
,--fast
: Run in fast mode (skip homology search) -
-n
,--skip-nt
: Skip nucleotide search if no protein hits are found -
-c
,--cleanup
: Delete intermediate files after screening
The .screen
files produced by commec screen
pipeline can be passed to flag
to produce two output CSVs.
flags.csv
will have the following columns:
filename: .screen file basename
biorisk: "F" if flagged, "P" if no flags
virulence_factor: "F" if flagged, "P" if no flags
regulated_virus: "F" if flagged, "P" if no flags, "Err" if error logged
regulated_bacteria: "F" if flagged, "P" if no flags, "Err" if error logged
regulated_eukaryote: "F" if flagged, "P" if no flags, "Err" if error logged
mixed_regulated_non_reg: "F" if flagged, "P" if no flags, "Err" if error logged
benign: "F" if not cleared, "P" if all cleared, "-" if not run
These flags are based on the biorisk scan (determining the "biorisk" and "virulence_factor" fields), the protein and nucleotide homology scans (determining the "regulated" fields) and the benign scan (determining the "benign" field).
The flags_recommended CSV just has two columns, "filename" and "recommend_flag_or_pass". The flags_recommended CSV just has two columns, "filename" and "recommend_flag_or_pass". The recommendation is based on the following decision flow:
For any questions or issues, please contact [email protected] or open an issue on our GitHub repository.