Skip to content

Autocycler decompress

Ryan Wick edited this page Jan 13, 2025 · 7 revisions

Basics

Autocycler decompress performs the opposite of Autocycler compress: it takes a unitig graph as input and produces a directory of FASTA files as output.

It can be used in two main ways:

  • By running Autocycler decompress on the input_assemblies.gfa file made by Autocycler compress, you can produce the input assemblies used to build that graph. Since they can be recovered from the graph, you can delete your input assemblies to save disk space.
  • By running Autocycler decompress on the 2_trimmed.gfa file made by Autocycler trim, you can extract high-quality alternative sequences for a given cluster.

Example command

autocycler decompress -i autocycler_out/input_assemblies.gfa -o assemblies

Full usage

Usage: autocycler decompress [OPTIONS] --in_gfa <IN_GFA>

Options:
  -i, --in_gfa <IN_GFA>      Autocycler GFA file (required)
  -o, --out_dir <OUT_DIR>    Directory where decompressed sequences will be saved (either -o or -f
                             is required)
  -f, --out_file <OUT_FILE>  FASTA file where decompressed sequences will be saved (either -o or -f
                             is required)
  -h, --help                 Print help
  -V, --version              Print version

Notes

  • Case is not preserved with Autocycler compress, so reconstructing sequences with Autocycler decompress will give all-uppercase sequences, even if the originals had lowercase bases.
  • Autocycler cannot make use of any sequences smaller than the k-mer size (51 by default). This means that if you have very small contigs in your assemblies, they will be ignored, and you will not be able to recover them with Autocycler decompress.
  • All contig header whitespace is converted to a single space and will therefore be restored that way when using Autocycler decompress.
  • When decompressing to a single file, the filename and contig name are concatenated. Any spaces in the filename are replaced with underscores (to prevent the truncation of the sequence name).
Clone this wiki locally