Skip to content

Commit

Permalink
most update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
gbouras13 committed Oct 25, 2023
1 parent 2bd7dfb commit ce54878
Show file tree
Hide file tree
Showing 6 changed files with 75 additions and 66 deletions.
17 changes: 11 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ The full documentation for `dnaapler` can be found [here](https://dnaapler.readt

## Commands

* `dnaapler all`: Reorients multiple contigs to begin with any of dnaA, terL, repA.
* `dnaapler all`: Reorients 1 or more contigs to begin with any of dnaA, terL, repA.
- Practically, this should be the most useful command for most users.

* `dnaapler chromosome`: Reorients your sequence to begin with the dnaA chromosomal replication initiator gene
Expand Down Expand Up @@ -116,7 +116,7 @@ Options:
-V, --version Show the version and exit.
Commands:
all Reorients multiple contigs to begin with any of dnaA, repA...
all Reorients contigs to begin with any of dnaA, repA...
bulk Reorients multiple genomes to begin with the same gene
chromosome Reorients your genome to begin with the dnaA chromosomal...
citation Print the citation(s) for this tool
Expand All @@ -131,7 +131,7 @@ Commands:
```
Usage: dnaapler all [OPTIONS]
Reorients multiple contigs to begin with any of dnaA, repA or terL
Reorients contigs to begin with any of dnaA, repA or terL
Options:
-h, --help Show this message and exit.
Expand All @@ -155,6 +155,10 @@ The reoriented output FASTA will be `{prefix}_reoriented.fasta` in the specified

## Example Usage

```
dnaapler all -i input.fasta -o output_directory_path -p my_genome_name --ignore list_of_contigs_to_ignore.txt
```

```
dnaapler chromosome -i input.fasta -o output_directory_path -p my_bacteria_name -t 8
```
Expand All @@ -180,14 +184,15 @@ dnaapler nearest -i input.fasta -o output_directory_path -p my_genome_name
```

```
# to reorient multiple bacterial chromosomes
dnaapler bulk -i input_file_with_multiple_chromosomes.fasta -m chromosome -o output_directory_path -p my_genome_name
dnaapler largest -i input.fasta -o output_directory_path -p my_genome_name
```

```
dnaapler all -i input_file_with_multiple_contigs.fasta -o output_directory_path -p my_genome_name --ignore list_of_contigs_to_ignore.txt
# to reorient multiple bacterial chromosomes
dnaapler bulk -i input_file_with_multiple_chromosomes.fasta -m chromosome -o output_directory_path -p my_genome_name
```


## Databases

`dnaapler chromosome` uses 584 proteins downloaded from Swissprot with the query "Chromosomal replication initiator protein DnaA" on 24 May 2023 as its database for dnaA. All hits from the query were also filtered to ensure "GN=dnaA" was included in the header of the FASTA entry.
Expand Down
6 changes: 3 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
`dnaapler` is a simple python program that takes a single nucleotide input sequence (in FASTA format), finds the desired start gene using `blastx` against an amino acid sequence database, checks that the start codon of this gene is found, and if so, then reorients the chromosome to begin with this gene on the forward strand.

It was originally designed to replicate the reorientation functionality of [Unicycler](https://github.com/rrwick/Unicycler/blob/main/unicycler/gene_data/repA.fasta) with dnaA, but for for long-read first assembled chromosomes. I have extended it to work with plasmids (`dnaapler plasmid`) and phages (`dnaapler phage`), or for any input FASTA desired with `dnaapler custom`, `dnaapler mystery` or `dnaapler nearest`.
It was originally designed to replicate the reorientation functionality of [Unicycler](https://github.com/rrwick/Unicycler/blob/main/unicycler/gene_data/repA.fasta) with dnaA, but for for long-read first assembled chromosomes. I have extended it to work with plasmids (`dnaapler plasmid`) and phages (`dnaapler phage`), or for any input FASTA desired with `dnaapler custom`,`dnaapler largest`, `dnaapler mystery` or `dnaapler nearest`.

Additionally, you can also reorient multiple bacterial chromosomes/plasmids/phages at once using the `dnaapler bulk` subcommand.
If your input FASTA is mixed and you have 1 or more contigs (e.g. has chromosome and plasmids), you can also use `dnaapler all`, with the option to ignore some contigs with the `--ignore` parameter. This is probably the most useful command for most users.

If your input FASTA is mixed (e.g. has chromosome and plasmids), you can also use `dnaapler all`, with the option to ignore some contigs with the `--ignore` parameter.
Additionally, you can also reorient multiple bacterial chromosomes/plasmids/phages at once using the `dnaapler bulk` subcommand - it will give you more information about what contigs couldn't be rotated which may be useful.

For bacterial chromosomes, `dnaapler chromosome` should ensure the chromosome breakpoint never interrupts genes or mobile genetic elements like prophages. It is intended to be used with good-quality completed bacterial genomes, generated with methods such as [Trycycler](https://github.com/rrwick/Trycycler/wiki), [Dragonflye](https://github.com/rpetit3/dragonflye) or [hybracter](https://github.com/gbouras13/hybracter).

Expand Down
23 changes: 12 additions & 11 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,18 @@ dnaapler creates a number of output files. For all subcommands that are not `dna

* If `dnaapler custom` is run, then a `custom_db` directory will also be present, containing the custom BLAST directory used by `dnaapler`.

### all

If you run `dnaapler all`, the output will be slightly different. There will still be log files and a `{prefix}_blast_output.txt` file.

* There will still be 1 output `.fasta` file. It will be `{prefix}_reoriented.fasta` containing all the contigs.

* In this FASTA file, all contigs that were reoriented will be indicated in the contig FASTA header with `rotated=True`.

* There will be a `{prefix}_all_reorientation_summary.tsv` summary file containing the reorientation information for each contig.

This summary file will be the same as for `bulk` as explained below, but with an extra column `Gene_Reoriented` that denotes which gene was detected in each contig (dnaA, repA or terL).

### bulk

If you run `dnaapler bulk`, the output will be different. There will still be log files and a `{prefix}_blast_output.txt` file. The difference are:
Expand All @@ -32,14 +44,3 @@ For example for an input file with 3 contigs, where the first had no BLAST hit (
| contig_2 | Contig_already_reoriented | Contig_already_reoriented | Contig_already_reoriented | Contig_already_reoriented | Contig_already_reoriented | Contig_already_reoriented | Contig_already_reoriented | Contig_already_reoriented |
| contig_3 | 466148 | reverse | sp\|Q6GD89\|DNAA_STAAS | 453 | 453 | 100 | 453 | 100 |

### all

If you run `dnaapler all`, the output will be different again. There will still be log files and a `{prefix}_blast_output.txt` file.

* There will be 1 output `.fasta` file. One will be `{prefix}_all_reoriented.fasta` containing all the contigs.

* In this FASTA file, all contigs that were reoriented will be indicated in the contig FASTA header with `rotated=True`.

* There will be a `{prefix}_all_reorientation_summary.tsv` summary file containing the reorientation information for each contig.

This summary file will be the same as for `bulk`, but with an extra column `Gene_Reoriented` that denotes which gene was detected in each contig (dnaA, repA or terL).
90 changes: 47 additions & 43 deletions docs/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,53 @@ However, you can decide to autocomplete `dnaapler` using the `-a` or `--autocomp

Also, a seed value using `--seed_value` can be specified with `dnaapler` to ensure that `dnaapler mystery` (or when austocomplete is used with `-a mystery`) to ensure `dnaapler` is reproducible in workflows.


### all

`dnaapler all` is designed to simultaneously orient multiple contigs that can be a mix of chromosomes, plasmids and phages. It will also work on just 1 contig.

If a contig has BLAST hits for both dnaA and terL or repA, dnaA will be chosen for reorientation.

If a contig has BLAST hits for both terL and repA (but not dnaA), repA will be chosen for reorientation.

You can also specify a text file with `--ignore` that lists all contigs (based on their header) to be ignored during reorientation.

e.g. the file (`ignored_contigs.txt`) needs to be formatted as follows:

```
contig_1
contig_2
```

Example usage to reorient a number of contigs in `input.fasta`, ignoring all contigs with headers denoted in `ignored_contigs.txt`

```
dnaapler all -i input.fasta -o output_directory_path -t 8 --ignore ignored_contigs.txt
```

```
Usage: dnaapler all [OPTIONS]
Reorients contigs to begin with any of dnaA, repA or terL
Options:
-h, --help Show this message and exit.
-V, --version Show the version and exit.
-i, --input PATH Path to input file in FASTA format [required]
-o, --output PATH Output directory [default: output.dnaapler]
-t, --threads INTEGER Number of threads to use with BLAST [default: 1]
-p, --prefix TEXT Prefix for output files [default: dnaapler]
-f, --force Force overwrites the output directory
-e, --evalue TEXT e value for blastx [default: 1e-10]
--ignore PATH Text file listing contigs (one per row) that are to
be ignored
-a, --autocomplete TEXT Choose an option to autocomplete reorientation if
BLAST based approach fails. Must be one of: none,
mystery, largest, or nearest [default: none]
--seed_value INTEGER Rand
```


### chromosome

Example usage with `mystery` as the autocomplete command and a random seed of 245 for reproducibility and with 8 threads for BLAST:
Expand Down Expand Up @@ -228,46 +275,3 @@ Options:
specified.
```

### all


`dnaapler all` is designed to simultaneously orient multiple contigs that can be a mix of chromosomes, plasmids and phages.

If a contig has BLAST hits for both dnaA and terL or repA, dnaA will be chosen for reorientation.

If a contig has BLAST hits for both terL and repA (but not dnaA), repA will be chosen for reorientation.

You can also specify a text file with `--ignore` that lists all contigs (based on their header) to be ignored during reorientation.

e.g. the file (`ignored_contigs.txt`) needs to be formatted as follows:

```
contig_1
contig_2
```

Your input FASTA must also have at least 2 contigs.

Example usage to reorient a number of contigs in `input.fasta`, ignoring all contigs with headers denoted in `ignored_contigs.txt`

```
dnaapler all -i input.fasta -o output_directory_path -t 8 --ignore ignored_contigs.txt
```

```
Usage: dnaapler all [OPTIONS]
Reorients multiple contigs to begin with any of dnaA, repA or terL
Options:
-h, --help Show this message and exit.
-V, --version Show the version and exit.
-i, --input PATH Path to input file in FASTA format [required]
-o, --output PATH Output directory [default: output.dnaapler]
-t, --threads INTEGER Number of threads to use with BLAST [default: 1]
-p, --prefix TEXT Prefix for output files [default: dnaapler]
-f, --force Force overwrites the output directory
-e, --evalue TEXT e value for blastx [default: 1e-10]
--ignore PATH TSV file listing contigs (one per row) that are to be
ignored
```
2 changes: 1 addition & 1 deletion src/dnaapler/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -672,7 +672,7 @@ def all(
ignore,
**kwargs,
):
"""Reorients multiple contigs to begin with any of dnaA, repA or terL"""
"""Reorients contigs to begin with any of dnaA, repA or terL"""

# validates the directory (need to before I start dnaapler or else no log file is written)
instantiate_dirs(output, force)
Expand Down
3 changes: 1 addition & 2 deletions src/dnaapler/utils/CITATION
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
Please cite dnaapler in your paper using:

Bouras, G., Roach, M. J., Mallawaarachchi V., Grigson., S., Papudeshi., B. (2023) Dnaapler: A tool to reorient circular microbial genomes https://github.com/gbouras13/dnaapler.

Bouras, G., Grigson., S., Papudeshi., B., Mallawaarachchi V., Roach, M. J. (2023) Dnaapler: A tool to reorient circular microbial genomes https://github.com/gbouras13/dnaapler.

0 comments on commit ce54878

Please sign in to comment.