-
Notifications
You must be signed in to change notification settings - Fork 6
Fully automated assembly
Ryan Wick edited this page Dec 2, 2024
·
23 revisions
The following commands can be run without any human intervention.
In addition to Autocycler, these commands use some of the helper scripts. See Generating input assemblies and Genome size estimation for more details.
For more details on each step in the process, see the corresponding wiki pages.
threads=16 # set as appropriate for your system
genome_size=$(genome_size_raven.sh ont.fastq.gz "$threads") # can set this manually if you know the value
# Step 1: subsample the long-read set into multiple files
autocycler subsample --reads ont.fastq.gz --out_dir subsampled_reads --genome_size "$genome_size"
# Step 2: assemble each subsampled file
mkdir assemblies
for i in 01 07 13 19; do
canu.sh subsampled_reads/sample_"$i".fastq assemblies/canu_"$i" "$threads" "$genome_size"
done
for i in 02 08 14 20; do
flye.sh subsampled_reads/sample_"$i".fastq assemblies/flye_"$i" "$threads"
done
for i in 03 09 15 21; do
miniasm.sh subsampled_reads/sample_"$i".fastq assemblies/miniasm_"$i" "$threads"
done
for i in 04 10 16 22; do
necat.sh subsampled_reads/sample_"$i".fastq assemblies/necat_"$i" "$threads" "$genome_size"
done
for i in 05 11 17 23; do
nextdenovo.sh subsampled_reads/sample_"$i".fastq assemblies/nextdenovo_"$i" "$threads" "$genome_size"
done
for i in 06 12 18 24; do
raven.sh subsampled_reads/sample_"$i".fastq assemblies/raven_"$i" "$threads"
done
# Optional step: remove the subsampled reads to save space
rm -r subsampled_reads
# Step 3: compress the input assemblies into a unitig graph
autocycler compress -i assemblies -a autocycler
# Step 4: cluster the input contigs into putative genomic sequences
autocycler cluster -a autocycler
# Steps 5 and 6: trim and resolve each QC-pass cluster
for c in autocycler/clustering/qc_pass/cluster_*; do
autocycler trim -c "$c"
autocycler resolve -c "$c"
done
# Step 7: combine resolved clusters into a final assembly
autocycler combine -a autocycler -i autocycler/clustering/qc_pass/cluster_*/5_final.gfa
The final consensus assembly will be named: autocycler/consensus_assembly.fasta
If you perform many automated assemblies with Autocycler, I recommend using Autocycler table to produce a TSV after they finish to check for problematic genomes.
- Step 1: Autocycler subsample
- Step 2: Generating input assemblies
- Step 3: Autocycler compress
- Step 4: Autocycler cluster
- Step 5: Autocycler trim
- Step 6: Autocycler resolve
- Step 7: Autocycler combine