Skip to content
Ryan Wick edited this page Oct 30, 2024 · 17 revisions

Autocycler logo

The problem

Long-read sequencing and assembly have come a long way in recent years. Since bacterial genomes are relatively simple (not too large and not too many repeats), a completed assembly (one contig per replicon) is often possible when assembling long reads.

But even the best assemblers are not perfect! They sometimes fail to circularise sequences, either duplicating or omitting sequence at the start/end of a contig. They sometimes produce spurious contigs, e.g. assembling a repetitive part of the chromosome into a separate contig. They sometimes omit entire replicons, e.g. failing to include a plasmid. They sometimes create medium-scale indel errors, e.g. deleting 50 bp from the genome. And they occasionally create large-scale misassemblies, e.g. a significant structural rearrangement.

So imagine that you've done long-read sequencing of a bacterial isolate and assembled the reads. The result looks like a nice completed assembly (e.g. a big circular contig for the chromosome and a couple of smaller circular contigs for plasmids), but how can you be sure that it's free from the kinds of problems listed above?

The solution

Autocycler is a successor to Trycycler. It takes as input multiple separate long-read assemblies of the same genome (e.g. from different assemblers or different read subsets) and produces a consensus long-read assembly.

In brief, Autocycler does the following:

  • Compresses the input assemblies into a compacted De Bruijn graph.
  • Clusters the input contig sequences by similarity and chooses which clusters to keep.
  • Trims any excess sequence from input contig sequences.
  • Produces a consensus sequence for each cluster by taking the majority variant where the sequences differ.
  • Combines all cluster consensus sequences into a final consensus assembly.

The result is a long-read assembly you can trust!

Autocycler is designed to be both fully automated and human-guided. This means that unlike Trycycler, Autocycler can be run without any human intervention, allowing it to be used on large numbers of genomes. But it still allows for manual examination and intervention when desired.

An important caveat

Autocycler does not ensure a perfect assembly of the underlying genome, because systematic basecalling errors can create small-scale sequence errors. Incorrect homopolymer lengths are a common example of this problem, e.g. AAAAAAAA becoming AAAAAAA.

But if all goes well when running Autocycler, small-scale errors will be the only type of error in its consensus long-read assembly. You can then polish your Autocycler assembly to repair these small-scale errors, e.g. long-read polishing with Medaka and/or short-read polishing with Polypolish and Pypolca. An Autocycler+polishing approach to assembly can therefore yield the best possible bacterial genome: Autocycler fixes the medium-to-large-scale errors while polishing fixes the small-scale errors.

Where to begin?

Are you new to Autocycler and interested in trying it out? If so, you'll first need to get it installed, so check out the Software requirements and installation page.

After that, I'd recommend that you look at the Illustrated pipeline overview and read the quick start pages. Autocycler can be run in a fully automated manner or with manual intervention.

Finally, I'd suggest that you practise using Autocycler on the provided demo dataset. Happy assembling!

Clone this wiki locally