-
Notifications
You must be signed in to change notification settings - Fork 6
FAQ and miscellaneous tips
- Does Autocycler include an assembly pipeline?
- Polishing after Autocycler
- How does Autocycler compare to Trycycler?
- How does Autocycler compare to Hybracter?
- Can Autocycler be used on eukaryote genomes?
- Can Autocycler be used on mitochondrial/chloroplast genomes?
- Is Autocycler deterministic?
- Does Autocycler rotate circular sequences to a consistent position?
- How does Autocycler colour sequences in its graphs?
- Suppressing terminal colours
No, Autocycler does not include an assembly pipeline. The documentation provides example commands to run in Bash, but further optimisation is left to the user. The ideal approach will depend on your computational environment and requirements. For example, some users may need to use job schedulers like SLURM, while others might use workflow managers such as Nextflow or Snakemake.
If you are optimising Autocycler assemblies or creating a pipeline, here are some things to keep in mind:
- The most time-consuming step in an Autocycler workflow is the creation of input assemblies, which can be carried out in parallel.
- The choice of input assemblers is up to you, and there can be a speed-vs-quality tradeoff: some tools (e.g. Canu) are slow but produce high-quality assemblies, while others (e.g. Raven) are faster but tend to introduce more errors.
- I recommend using a variety of input assemblers. For example, even if Flye is your favourite assembler, a Flye-only pipeline is more prone to errors than one that also includes others assemblers.
- If your pipeline handles multiple isolates, I suggest running Autocycler table at the end to generate a summary table indicating how well each isolate assembled.
Since Autocycler assemblies are long-read-only, they may still contain small-scale errors produced by systematic errors in the reads. A common example would be long homopolymers: if a genome contains A×15
at a locus but most of the long reads erroneously have A×14
at that locus, the assembly is likely to contain the A×14
error.
If you are assembling Oxford Nanopore reads, performing long-read polishing with Medaka can help. I recommend using the --bacteria
option for the latest methylation-aware model. If you also have short reads, then I recommend using Polypolish and Pypolca, both of which are conservative (unlikely to introduce new errors).
Further information about polishing is available in these papers:
- Wick RR, Judd LM, Holt KE. Assembling the perfect bacterial genome using Oxford Nanopore and Illumina sequencing. PLOS Computational Biology. 2023. doi:10.1371/journal.pcbi.1010905. and its online tutorial
- Bouras G, Judd LM, Edwards RA, Vreugde S, Stinear TP, Wick RR. How low can you go? Short-read polishing of Oxford Nanopore bacterial genome assemblies. Microbial Genomics. 2024. doi:10.1099/mgen.0.001254.
Autocycler was designed as a faster and automated successor to Trycycler. Both tools perform the same task: combining multiple alternative assemblies of the same genome into a clean consensus assembly. Here are some key differences:
- Trycycler is human-guided and almost always requires user intervention to complete. Autocycler is automated but still allows for human intervention when needed. For many genomes, Autocycler can complete without intervention.
- Both tools produce similar results, but Autocycler is generally easier to use.
- Trycycler is slower because it is written in Python and includes steps that align all the reads, which can be time-intensive. Autocycler is faster, being written in Rust and avoiding read alignments.
For most users, Autocycler is likely the better choice!
Hybracter and Autocycler occupy different niches. Some key points:
- Hybracter is a full assembly pipeline that includes pre-assembly and post-assembly steps such as read QC, short-read polishing, and sequence reorientation. Autocycler focuses solely on long-read assembly and does not perform these additional steps.
- Hybracter uses a single long-read assembler (Flye), whereas Autocycler combines multiple long-read assemblies. This allows Autocycler to avoid Flye-specific errors in its consensus assemblies.
- Hybracter assemblies (
hybracter long
) can be used as input for Autocycler! This is particularly useful because Hybracter includes specialised logic to recover small plasmids, which other long-read assemblers often struggle with. - While Autocycler often produces complete assemblies without intervention, some genomes may fail or require manual intervention to achieve the best results. If an absolutely no-intervention assembly approach is required, Hybracter is a better choice.
For Autocycler to work, the input assemblies need to mostly be complete: one sequence per piece of DNA in the genome. So if T2T assemblies are possible, then Autocycler should work!
However, phased diploid assemblies might create a problem in the clustering step. If there is a lot of heterozygosity, the two haplotypes for each chromosome might separate in the UPGMA tree, in which case clustering might work (potentially requiring manual specification of the clusters). But more likely it will require the user to separate the haplotypes in the input assemblies. I.e. split each phased input assembly into a maternal assembly and paternal assembly, then run Autocycler twice (once for maternal, once for paternal).
Yes! Since these genomes are circular and descended from bacterial genomes, they are well suited to an Autocycler assembly. I recommend first extracting just the mitochondrial/chloroplast reads from your long read set so you can produce input assemblies without the nuclear genome.
Yes, Autocycler itself is deterministic: for a given set of input assemblies and parameters, it will produce the same consensus assembly. However, not all assemblers are deterministic, so a full Autocycler assembly (including the generation of input assemblies) may differ from run to run.
No, unlike Trycycler, Autocycler does not rotate circular sequences to start at a particular gene (e.g. dnaA). I may add this feature in the future, but for now I recommend using Dnaapler.
For some of the GFA files it creates, Autocycler adds colours to the segments using the CL:z:
tag that Bandage can read:
- For the
3_bridged.gfa
graph, anchors are coloured green and bridges are coloured pink (the same scheme used by Unicycler. - For the
4_merged.gfa
and5_final.gfa
graphs, consentigs (sequences created by merging unitigs together) are coloured blue. - For the final consensus graph made by Autocycler combine, consentigs are again coloured blue, and anything else is a bright orange-red to indicate that the assembly is not complete.
Autocycler uses some ANSI colours in its terminal output to stderr for aesthetic purposes. If you would rather no colours (e.g. when redirecting stderr to a log file), you can set the NO_COLOR
environment variable before running Autocycler:
export NO_COLOR=1
- Step 1: Autocycler subsample
- Step 2: Generating input assemblies
- Step 3: Autocycler compress
- Step 4: Autocycler cluster
- Step 5: Autocycler trim
- Step 6: Autocycler resolve
- Step 7: Autocycler combine