-
Notifications
You must be signed in to change notification settings - Fork 4
Autocycler trim
Ryan Wick edited this page Sep 11, 2024
·
28 revisions
Usage: autocycler trim [OPTIONS] --cluster_dir <CLUSTER_DIR>
Options:
-c, --cluster_dir <CLUSTER_DIR> Autocycler cluster directory containing 1_untrimmed.gfa file
(required)
--min_identity <MIN_IDENTITY> Minimum alignment identity for trimming alignments [default: 0.75]
--max_unitigs <MAX_UNITIGS> Maximum unitigs used for overlap alignment, set to 0 to disable
trimming [default: 5000]
--mad <MAD> Allowed variability in cluster length, measured in median absolute
deviations, set to 0 to disable exclusion of length outliers
[default: 5.0]
-t, --threads <THREADS> Number of CPU threads [default: 8]
-h, --help Print help
-V, --version Print version
- Uses a dynamic-programming alignment algorithm, but based on unitigs, not bases. This saves a lot of time.
-
--max_unitigs 0
will turn off trimming, e.g. if you've manually trimmed the sequences yourself. - Cannot distinguish between artefactual and genuine duplications. E.g. if a plasmid really is doubled, Autocycler trim will still cut it down to a single copy.
Of all the input contigs in the toy example, only b1
and b2
contain overlap. These are their paths through the unitig graph, with the overlap highlighted and the overlap-free trimmed paths below:
Note that the alignment does not need to be exact. Contig b2
has a variant in its overlap (unitig 38 vs 34), but the alignment is still sufficiently high identity for trimming to occur.
After trimming is complete, the cluster graph is simplified and saved as 2_trimmed.gfa
. The result is similar to the untrimmed graph but slightly simpler due to the removed pieces:
- Step 1: Autocycler subsample
- Step 2: Generating input assemblies
- Step 3: Autocycler compress
- Step 4: Autocycler cluster
- Step 5: Autocycler trim
- Step 6: Autocycler resolve
- Step 7: Autocycler combine