This is the official SeqWho Repository.
SeqWho is a reliable and extremely rapid program designed to determine a FASTQ(A) sequencing file identity, both source protocol and species of origin. This is accomplished using an alignment-free algorithm that leverages a Random Forest classifier that learns from biases in k-mer frequencies and repeat sequence identity. SeqWho is capable of achieving greater than 96% accuracy in its ability to classify files.
You can find the Documentation for SeqWho at: https://daehwankimlab.github.io/seqwho/
SeqWho is written in Python 3 and we recommend using a conda environment built from the environment.yml included with SeqWho for optimal performance.
Please read https://daehwankimlab.github.io/seqwho/manual/ for more details.
Species | Libraries | Index |
---|---|---|
Human, Mouse | Amplicon, ChIP-Seq, WGS, WES, miRNA-Seq, RNA-Seq, Bisulfite-Seq, DNase-Seq, ATAC-Seq | Index(md5sum) Training File List Testing File List |
Human, Mouse, Rattus norvegicus | ChIP-Seq, WGS, RNA-Seq | Index(md5sum) Training File List Testing File List |
Human, Mouse, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces cerevisiae | ChIP-Seq, WGS, RNA-Seq | Index(md5sum) Training File List Testing File List |
v1.0.3 - Added option to select number of reads drawn from files during model building
v1.0.2 - Removed extra commas in some fields to facilitate CSV conversion
v1.0.1 - Addition of test files and scripts
v1.0.0 - Initial public release