Skip to content

Autocycler trim

Ryan Wick edited this page Sep 11, 2024 · 28 revisions

Basics

Usage

Usage: autocycler trim [OPTIONS] --cluster_dir <CLUSTER_DIR>

Options:
  -c, --cluster_dir <CLUSTER_DIR>    Autocycler cluster directory containing 1_untrimmed.gfa file
                                     (required)
      --min_identity <MIN_IDENTITY>  Minimum alignment identity for trimming alignments [default: 0.75]
      --max_unitigs <MAX_UNITIGS>    Maximum unitigs used for overlap alignment, set to 0 to disable
                                     trimming [default: 5000]
      --mad <MAD>                    Allowed variability in cluster length, measured in median absolute
                                     deviations, set to 0 to disable exclusion of length outliers
                                     [default: 5.0]
  -t, --threads <THREADS>            Number of CPU threads [default: 8]
  -h, --help                         Print help
  -V, --version                      Print version

Notes

  • Uses a dynamic-programming alignment algorithm, but based on unitigs, not bases. This saves a lot of time.
  • --max_unitigs 0 will turn off trimming, e.g. if you've manually trimmed the sequences yourself.
  • Cannot distinguish between artefactual and genuine duplications. E.g. if a plasmid really is doubled, Autocycler trim will still cut it down to a single copy.

Toy example

Autocycler trimming paths

trimmed cluster graphs