Skip to content

FAQ and miscellaneous tips

Ryan Wick edited this page Nov 27, 2024 · 16 revisions

Table of contents

Does Autocycler include an assembly pipeline?

No, Autocycler does not include an assembly pipeline. The documentation provides example commands to run in Bash, but further optimisation is left to the user. The ideal approach will depend on your computational environment and requirements. For example, some users may need to use job schedulers like SLURM, while others might use workflow managers such as Nextflow or Snakemake.

If you are optimising Autocycler assemblies or creating a pipeline, here are some things to keep in mind:

  • The most time-consuming step in an Autocycler workflow is the creation of input assemblies, which can be carried out in parallel.
  • The choice of input assemblers is up to you, and there can be a speed-vs-quality tradeoff: some tools (e.g. Canu) are slow but produce high-quality assemblies, while others (e.g. Raven) are faster but tend to introduce more errors.
  • I recommend using a variety of input assemblers. For example, even if Flye is your favourite assembler, a Flye-only pipeline is more prone to errors than one that also includes others assemblers.
  • If your pipeline handles multiple isolates, I suggest running Autocycler table at the end to generate a summary table indicating how well each isolate assembled.

Polishing after Autocycler

Since Autocycler assemblies are long-read-only, they may still contain small-scale errors produced by systematic errors in the reads. A common example would be long homopolymers: if a genome contains A×15 at a locus but most of the long reads erroneously have A×14 at that locus, the assembly is likely to contain the A×14 error.

If you are assembling Oxford Nanopore reads, performing long-read polishing with Medaka can help. I recommend using the --bacteria option for the latest methylation-aware model. If you also have short reads, then I recommend using Polypolish and Pypolca, both of which are conservative (unlikely to introduce new errors).

Further information about polishing is available in these papers:

How does Autocycler compare to Trycycler?

Autocycler was designed as a faster and automated successor to Trycycler. Both tools perform the same task: combining multiple alternative assemblies of the same genome into a clean consensus assembly. Here are some key differences:

  • Trycycler is human-guided and almost always requires user intervention to complete. Autocycler is automated but still allows for human intervention when needed. For many genomes, Autocycler can complete without intervention.
  • Both tools produce similar results, but Autocycler is generally easier to use.
  • Trycycler is slower because it is written in Python and includes steps that align all the reads, which can be time-intensive. Autocycler is faster, being written in Rust and avoiding read alignments.

For most users, Autocycler is likely the better choice!

How does Autocycler compare to Hybracter?

Hybracter and Autocycler occupy different niches. Some key points:

  • Hybracter is a full assembly pipeline that includes pre-assembly and post-assembly steps such as read QC, short-read polishing, and sequence reorientation. Autocycler focuses solely on long-read assembly and does not perform these additional steps.
  • Hybracter uses a single long-read assembler (Flye), whereas Autocycler combines multiple long-read assemblies. This allows Autocycler to avoid Flye-specific errors in its consensus assemblies.
  • Hybracter assemblies (hybracter long) can be used as input for Autocycler! This is particularly useful because Hybracter includes specialised logic to recover small plasmids, which other long-read assemblers often struggle with.
  • While Autocycler often produces complete assemblies without intervention, some genomes may fail or require manual intervention to achieve the best results. If an absolutely no-intervention assembly approach is required, Hybracter is a better choice.

Can Autocycler be used on eukaryote genomes?

For Autocycler to work, the input assemblies need to mostly be complete: one sequence per piece of DNA in the genome. So if T2T assemblies are possible, then Autocycler should work!

However, phased diploid assemblies might create a problem in the clustering step. If there is a lot of heterozygosity, the two haplotypes for each chromosome might separate in the UPGMA tree, in which case clustering might work (potentially requiring manual specification of the clusters). But more likely it will require the user to separate the haplotypes in the input assemblies. I.e. split each phased input assembly into a maternal assembly and paternal assembly, then run Autocycler twice (once for maternal, once for paternal).

Can Autocycler be used on mitochondrial/chloroplast genomes?

Yes! Since these genomes are circular and descended from bacterial genomes, they are well suited to an Autocycler assembly. I recommend first extracting just the mitochondrial/chloroplast reads from your long read set so you can produce input assemblies without the nuclear genome.

Is Autocycler deterministic?

Yes, Autocycler itself is deterministic: for a given set of input assemblies and parameters, it will produce the same consensus assembly. However, not all assemblers are deterministic, so a full Autocycler assembly (including the generation of input assemblies) may differ from run to run.

Does Autocycler rotate circular sequences to a consistent position?

No, unlike Trycycler, Autocycler does not rotate circular sequences to start at a particular gene (e.g. dnaA). I may add this feature in the future, but for now I recommend using Dnaapler.

How does Autocycler colour sequences in its graphs?

For some of the GFA files it creates, Autocycler adds colours to the segments using the CL:z: tag that Bandage can read:

  • For the 3_bridged.gfa graph, anchors are coloured green and bridges are coloured pink (the same scheme used by Unicycler.
  • For the 4_merged.gfa and 5_final.gfa graphs, consentigs (sequences created by merging unitigs together) are coloured blue.
  • For the final consensus graph made by Autocycler combine, consentigs are again coloured blue, and anything else is a bright orange-red to indicate that the assembly is not complete.

Suppressing terminal colours

Autocycler uses some ANSI colours in its terminal output to stderr for aesthetic purposes. If you would rather no colours (e.g. when redirecting stderr to a log file), you can set the NO_COLOR environment variable before running Autocycler:

export NO_COLOR=1
Clone this wiki locally