-
Notifications
You must be signed in to change notification settings - Fork 1
Introduction to fairy
After metagenomic assembly, optimal workflows require aligning all metagenomic reads against all assemblies to obtain coverages. Then, metagenome-assembled genomes (MAGs) are generated using a binner like metabat2.
Unfortunately, all-to-all alignment of samples to assemblies is very slow.
Fairy resolves this bottleneck by using a fast k-mer alignment-free method to obtain coverage instead of aligning reads. Fairy's coverages are correlated with aligners (but still approximate). However, fairy is 10-1000x faster than BWA for all-to-all coverage calculation.
Important: fairy is designed for multi-sample usage and short reads or nanopore reads. Do not use fairy for single-sample binning.
Fairy seems to be comparable to BWA for multi-sample binning (maybe a +5% to -15% loss in sensitivity). Preliminary testing indicates that fairy may perform as good as (and sometimes better than) BWA on host-associated datasets and slightly worse (but usable) on environmental datasets.
Non-HiFi: For simplex nanopore reads, fairy seems to be comparable with minimap2.
HiFi (strain-resolved assemblies): Fairy is worse than minimap2 for strain-resolved assemblies when using >99.9% identity reads (using e.g. hifiasm or meta-mdbg).
(A) Number of bins with contamination/completeness indicated for different environment and n samples. (B) (# fairy bins)/(# bwa bins) for > 50% complete and < 5% contaminated bins for several binners.