Skip to content
Ryan Wick edited this page Nov 11, 2024 · 17 revisions

Autocycler logo

The problem

Long-read sequencing and assembly have come a long way in recent years. Since bacterial genomes are relatively simple (small, haploid and not too repetitive), a complete assembly (one contig per replicon) is often possible when assembling long reads.

But even the best assemblers are not perfect! Common problems include:

  • Failing to cleanly circularise sequences, e.g. duplicating or omitting sequence at the start/end of a contig.
  • Omitting pieces of the genome, e.g. failing to include a small plasmid.
  • Producing spurious contigs, e.g. assembling a repetitive part of the chromosome into a separate contig.
  • Fragmentation, e.g. assembling a single chromosome into two contigs.
  • Creating medium-scale indel errors, e.g. deleting 50 bp from the genome.
  • Creating large-scale misassemblies, e.g. a significant structural rearrangement.

So how can you be sure that your long-read bacterial assemblies are free from the problems listed above?

The solution

Autocycler addresses these issues by combining multiple alternative assemblies of the same genome (e.g. from different assemblers and/or different read subsets) into a high-confidence consensus assembly. It achieves this by compressing input assemblies into a compacted De Bruijn graph, clustering similar sequences, trimming overlaps and resolving ambiguities. The result is a long-read assembly you can trust!

Autocycler is designed to be both fully automated and human-guided. This means that unlike its predecessor Trycycler, Autocycler can be run without any human intervention, allowing it to be used on large numbers of genomes. But it still allows for manual examination and intervention when desired.

An important requirement

Autocycler aims to produce complete assemblies, where each piece of the genome is assembled into one sequence. It therefore requires that at least most (ideally all) of the input assemblies are complete. If complete input assemblies of a genome are not possible, then Autocycler is not appropriate. The most common reason for this is if the genome contains a repeat longer than the read length.

Where to begin?

If you're new to Autocycler, follow these steps to get started:

Happy assembling!

Clone this wiki locally