Skip to content

Fully automated assembly

Ryan Wick edited this page Dec 2, 2024 · 23 revisions

The following commands can be run without any human intervention.

In addition to Autocycler, these commands use some of the helper scripts. See Generating input assemblies and Genome size estimation for more details.

For more details on each step in the process, see the corresponding wiki pages.

threads=16  # set as appropriate for your system
genome_size=$(genome_size_raven.sh ont.fastq.gz "$threads")  # can set this manually if you know the value

# Step 1: subsample the long-read set into multiple files
autocycler subsample --reads ont.fastq.gz --out_dir subsampled_reads --genome_size "$genome_size"

# Step 2: assemble each subsampled file
mkdir assemblies
for i in 01 07 13 19; do
    canu.sh subsampled_reads/sample_"$i".fastq assemblies/canu_"$i" "$threads" "$genome_size"
done
for i in 02 08 14 20; do
    flye.sh subsampled_reads/sample_"$i".fastq assemblies/flye_"$i" "$threads"
done
for i in 03 09 15 21; do
    miniasm.sh subsampled_reads/sample_"$i".fastq assemblies/miniasm_"$i" "$threads"
done
for i in 04 10 16 22; do
    necat.sh subsampled_reads/sample_"$i".fastq assemblies/necat_"$i" "$threads" "$genome_size"
done
for i in 05 11 17 23; do
    nextdenovo.sh subsampled_reads/sample_"$i".fastq assemblies/nextdenovo_"$i" "$threads" "$genome_size"
done
for i in 06 12 18 24; do
    raven.sh subsampled_reads/sample_"$i".fastq assemblies/raven_"$i" "$threads"
done

# Optional step: remove the subsampled reads to save space
rm -r subsampled_reads

# Step 3: compress the input assemblies into a unitig graph
autocycler compress -i assemblies -a autocycler

# Step 4: cluster the input contigs into putative genomic sequences
autocycler cluster -a autocycler

# Steps 5 and 6: trim and resolve each QC-pass cluster
for c in autocycler/clustering/qc_pass/cluster_*; do
    autocycler trim -c "$c"
    autocycler resolve -c "$c"
done

# Step 7: combine resolved clusters into a final assembly
autocycler combine -a autocycler -i autocycler/clustering/qc_pass/cluster_*/5_final.gfa

The final consensus assembly will be named: autocycler/consensus_assembly.fasta

If you perform many automated assemblies with Autocycler, I recommend using Autocycler table to produce a TSV after they finish to check for problematic genomes.

Clone this wiki locally