Skip to content

How circularisation repair works

Ryan Wick edited this page Jan 10, 2022 · 9 revisions

Most bacterial replicons are circular, which is relevant for Trycycler in two ways: getting a clean circularisation (no gap or overlap) and getting a consistent starting point. This is done as part of the Trycycler reconcile command.

Clean circularisation

Trycycler attempts to circularise each contig sequence using each of the other sequences as a reference. Specifically, it aligns the start and end of the contig to the other sequences and uses those alignments to determine whether the contig is already circular, needs sequence added or needs sequence removed.

In the following examples, sequence A is the one we are trying to circularise and sequence B is the other reference sequence.

Already circular

Circularisation - perfect

Ideally, A's end is immediately followed by A's start in B. If this is the case, that means A is already circular and there's nothing more to do.

Gapped circularisation – needs sequence added

Circularisation - gapped

It may be that A's end and start are both found in B, but with a gap in between. This implies that A is missing some sequence in its circularisation. Trycycler will fill in this gap using the sequence between the hits in B.

Overlapping circularisation – needs sequence removed

Circularisation - overlapping

If A's end and start overlap in B, that implies that A has too much sequence – i.e. some sequence is duplicated at its start/end. In this case, Trycycler will trim A's end to give it a clean circularisation.

Failed circularisation – too much gap

Circularisation - too much gap

If there is too much gap between A's end and start in B, that implies that A is missing a lot of sequence. Trycycler will fail to circularise A in this case. It probably makes sense to exclude A and try running Trycycler reconcile again.

Failed circularisation – too much overlap

Circularisation - too much overlap

Conversely, A's start might come well before A's end in B. This implies that A has quite a lot of overlap. Trycycler may be able to resolve this by trimming the start/end of A, but it might not. If this happens, you can try to manually trim A and then run Trycycler reconcile again. Or else you can simply exclude A.

Failed circularisation – multiple hits

Circularisation - multiple hits

If A's start and end are found in multiple places in B, this will also cause Trycycler to fail circularisation. This suggests that A begins/end in a repeat sequence – not necessarily a problem with the assembly but it does make circularisation difficult. In such cases, simply excluding A is probably in order.

Failed circularisation – missing hits

Circularisation - missing hits

If A's start or end is not found in B, that will also cause a failure to circularise. This suggests that either A contains spurious sequence or B contains missing sequence. When this causes a circularisation failure, it's best to exclude A.

Failed circularisation – same start/end

Circularisation - same start/end

If A and B have the same start/end, then there is no information for fixing A's circularisation. This sometimes happens with two input assemblies from the same assembler. It's usually not a problem, as A's circularisation can be repaired using one of the other sequences instead.

Choosing the best circularisation

Trycycler will conduct all pairwise circularisations. For example, if you have four input assemblies (A, B, C and D), Trycycler will attempt to circularise sequence A using sequences B, C and D. It will attempt to circularise sequence B using sequences A, C and D, and so on.

This means there can be multiple ways to circularise a sequence. For example, A might be circularised in three ways: 20 bp added from B, 21 bp added from C and 19 bp added from D. To choose which is the best option, Trycycler aligns the reads to the circularisation junction (this is why reads must be given as a command line parameter to Trycycler reconcile). Whichever circularisation option results in the highest total alignment score is chosen as the final one.

Starting point

A circular sequence can potentially start at any point on either strand and still be a valid assembly. However, when reconciling multiple alternative contigs, it is necessary to make all sequences consistent with each other – i.e. start at the same point and on the same strand.

Sequence rotation

By convention, Trycycler will try to start the contigs at a replication initiator protein gene sequence like dnaA. For more detail, see Starting sequences for circular replicons. To be a suitable starting point, the starting sequence must be in each of the contigs and only occur once in each contig.

If a replication initiator protein gene sequence can't be found, Trycycler will randomly select a subsequence which is present in each of the contigs only once and use that as the starting sequence.

Clone this wiki locally