Skip to content

Linear sequences

Ryan Wick edited this page Nov 4, 2024 · 16 revisions

Linear sequences can be challenging for Autocycler to resolve automatically, so genomes containing linear sequences are more likely to require manual intervention to achieve an ideal assembly.

  • Sometimes assemblers just do very poorly with linear sequences! This is a problem for Autocycler, because Autocycler assumes that at some some (but ideally most) of the input assemblies are mostly correct. If most-to-all of the input assemblies have major problems (e.g. truncating the ends of a linear sequence), then Autocycler won't be able to home in on a correct consensus sequence.

Hairpin ends

Some linear sequences have hairpin ends where one strand of DNA loop back to become its complement strand. This means that long reads can continue past the hairpin onto the other strand, so the reads do not end at the sequence end. This can confuse long-read assemblers, and their contigs often extend past the hairpin. Autocycler trim looks for this type of overlap and will trim it when possible.

  • If your linear sequence has hairpin ends, you should be careful with quality-based read filtering.
    • The part of the read which extends past the hairpin may be lower quality, dragging down the average read quality.
    • So if you aggressively filter with Filtlong (which prefers reads with a higher average quality), you may deplete reads which span the hairpin, i.e. reads which reach the end of the sequence.
    • Thanks to Nemanja Kuzmanovic for figuring this one out!
  • For hairpin ends, longer contigs (i.e. wrapping over the hairpin) are better, because they will allow Autocycler trim to trim the sequence at the right place.

Blunt ends

Terminal inverted repeats

  • Some linear bacterial chromosomes have very long terminal inverted repeats (TIRs) which can make complete assembly difficult.
Clone this wiki locally