Skip to content

Commit

Permalink
add metagenome option to CLI #25
Browse files Browse the repository at this point in the history
  • Loading branch information
oschwengers committed Dec 7, 2021
1 parent fa47297 commit 78747e8
Show file tree
Hide file tree
Showing 5 changed files with 31 additions and 30 deletions.
47 changes: 20 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
![PyPI - Status](https://img.shields.io/pypi/status/cb-platon.svg)
[![Conda](https://img.shields.io/conda/v/bioconda/platon.svg)](https://bioconda.github.io/recipes/platon/README.html)

# Platon: identification and characterization of bacterial plasmid contigs from short-read draft assemblies.
# Platon: identification and characterization of bacterial plasmid contigs from short-read draft assemblies

## Contents

Expand All @@ -26,14 +26,11 @@
## Description

**TL;DR**
Platon detects plasmid contigs within bacterial draft genomes from WGS short-read assemblies.
Therefore, Platon analyzes the natural distribution biases of certain protein coding genes between
chromosomes and plasmids. This analysis is complemented by comprehensive contig characterizations
upon which several heuristics are applied.
Platon detects plasmid-borne contigs within bacterial draft (meta) genomes assemblies. Therefore, Platon analyzes the distribution bias of protein-coding gene families among chromosomes and plasmids. This analysis is complemented by comprehensive contig characterizations follwoed by heuristic filters.

Platon conducts three analysis steps:

1. It predicts and searches coding sequences against a custom and pre-computed database comprising marker protein sequences (**MPS**) and related replicon distribution scores (**RDS**). These scores express the empirically measured frequency biases of protein sequence distributions between plasmids and chromosomes pre-computed on complete NCBI RefSeq replicons. Platon calculates the mean RDS for each contig and either classifies them as chromosome if the RDS is below a sensitivity cutoff determined to 95% sensitivity or as plasmid if the RDS is above a specificity cutoff determined to 99.9% specificity. Exact values for these thresholds have been computed based on Monte Carlo simulations of artifical replicon fragments created from complete RefSeq chromosome and plasmid sequences.
1. It predicts and searches protein sequences against a custom and pre-computed database comprising marker protein sequences (**MPS**) and related replicon distribution scores (**RDS**). These scores express the empirically measured bias of protein sequence family distributions among plasmids and chromosomes pre-computed on complete NCBI RefSeq replicons. Platon calculates the mean RDS for each contig and either classifies them as chromosome if the RDS is below a sensitivity cutoff determined to 95% sensitivity or as plasmid if the RDS is above a specificity cutoff determined to 99.9% specificity. Exact values for these thresholds have been computed based on Monte Carlo simulations of artifical replicon fragments created from complete RefSeq chromosome and plasmid sequences.
2. Contigs passing the sensitivity filter get comprehensivley characterized. Hereby, Platon tries to circularize the contig sequences, searches for rRNA, replication, mobilization and conjugation genes, oriT sequences, incompatibility group DNA probes and finally performs a BLAST+ search against the NCBI plasmid database.
3. Finally, to increase the overall sensitivity, Platon classifies all remaining contigs based on the gathered information by several heuristics.

Expand All @@ -45,7 +42,7 @@ Platon conducts three analysis steps:

### Input

Platon accepts draft assemblies in fasta format. If contigs have been assembled with SPAdes, Platon is able to extract the coverage information from the contig names.
Platon accepts draft (meta) genome assemblies in fasta format. If contigs have been assembled with SPAdes, Platon is able to extract the coverage information from the contig names.

### Output

Expand Down Expand Up @@ -130,34 +127,30 @@ $ cp -r db/ <platon-installation-dir>
Usage:

```bash
usage: platon [-h] [--db DB] [--mode {sensitivity,accuracy,specificity}]
[--characterize] [--output OUTPUT] [--prefix PREFIX]
[--threads THREADS] [--verbose] [--version]
<genome>
usage: platon [--db DB] [--prefix PREFIX] [--output OUTPUT] [--mode {sensitivity,accuracy,specificity}] [--characterize] [--meta] [--help] [--verbose] [--threads THREADS] [--version] <genome>

Identification and characterization of bacterial plasmid contigs from short-read draft assemblies.

positional arguments:
Input / Output:
<genome> draft genome in fasta format

optional arguments:
-h, --help show this help message and exit
--db DB, -d DB database path (default = <platon_path>/db)
--prefix PREFIX, -p PREFIX
Prefix for output files
--output OUTPUT, -o OUTPUT
Output directory (default = current working directory)

Workflow:
--mode {sensitivity,accuracy,specificity}, -m {sensitivity,accuracy,specificity}
applied filter mode: sensitivity: RDS only (>= 95%
sensitivity); specificity: RDS only (>=99.9%
specificity); accuracy: RDS & characterization
heuristics (highest accuracy) (default = accuracy)
applied filter mode: sensitivity: RDS only (>= 95% sensitivity); specificity: RDS only (>=99.9% specificity); accuracy: RDS & characterization heuristics (highest accuracy) (default = accuracy)
--characterize, -c deactivate filters; characterize all contigs
--output OUTPUT, -o OUTPUT
output directory (default = current working directory)
--prefix PREFIX, -p PREFIX
file prefix (default = input file name)
--meta use metagenome gene prediction mode

General:
--help, -h Show this help message and exit
--verbose, -v Print verbose information
--threads THREADS, -t THREADS
number of threads to use (default = number of
available CPUs)
--verbose, -v print verbose information
--version, -V show program's version number and exit
Number of threads to use (default = number of available CPUs)
--version show program's version number and exit
```
## Examples
Expand Down
4 changes: 4 additions & 0 deletions platon.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@ inputs:
inputBinding: {prefix: --mode}
type: string
default: 'accuracy'
- doc: Run in metagenome mode
id: metagenome
inputBinding: {prefix: --meta}
type: boolean
- doc: Threads
id: threads
inputBinding: {prefix: --threads}
Expand Down
5 changes: 4 additions & 1 deletion platon/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
# workflow configuration
mode = None
characterize = None
metagenome = None


def setup(args):
Expand Down Expand Up @@ -89,8 +90,10 @@ def setup(args):
log.info('output-path=%s', output_path)

# workflow configurations
global mode, characterize
global mode, characterize, metagenome
mode = args.mode
log.info('mode=%s', mode)
characterize = args.characterize
log.info('characterize=%s', characterize)
metagenome = args.meta
log.info('metagenome=%s', metagenome)
4 changes: 2 additions & 2 deletions platon/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -462,10 +462,10 @@ def predict_orfs(contigs, filteredDraftGenomePath):
]

genome_size = sum([v['length'] for k, v in contigs.items()])
if(cfg.characterize and genome_size < 50000):
if(cfg.metagenome or (cfg.characterize and genome_size < 50000)):
cmd.append('-p')
cmd.append('meta')
log.info('ORFs: execute prodigal in meta mode! characterize=%s, genome-size=%d', cfg.characterize, genome_size)
log.info('ORFs: execute prodigal in meta mode! characterize=%s, genome-size=%d, metagenome=%s', cfg.characterize, genome_size, cfg.metagenome)

proc = sp.run(
cmd,
Expand Down
1 change: 1 addition & 0 deletions platon/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ def parse_arguments():
arg_group_workflow = parser.add_argument_group('Workflow')
arg_group_workflow.add_argument('--mode', '-m', action='store', type=str, choices=['sensitivity', 'accuracy', 'specificity'], default='accuracy', help='applied filter mode: sensitivity: RDS only (>= 95%% sensitivity); specificity: RDS only (>=99.9%% specificity); accuracy: RDS & characterization heuristics (highest accuracy) (default = accuracy)')
arg_group_workflow.add_argument('--characterize', '-c', action='store_true', help='deactivate filters; characterize all contigs')
arg_group_workflow.add_argument('--meta', action='store_true', help='use metagenome gene prediction mode')

arg_group_general = parser.add_argument_group('General')
arg_group_general.add_argument('--help', '-h', action='help', help='Show this help message and exit')
Expand Down

0 comments on commit 78747e8

Please sign in to comment.