Skip to content

Pre‐trained data

raquellewei edited this page Nov 17, 2023 · 20 revisions

To improve the efficiency of the algorithm, we first preprocess the reference genome database by reducing its size based on certain parameters, namely, K-mer size and ANI threshold. The user choose their own reference genome database and customize these parameter values to suit their research needs. Here, we provide some pre-trained databases generated using some of the popular databases and commonly used parameters.

Pre-training procedures

The exact algorithm for this process can be found in Algorithm 4 of the preprint. In short, given a collection of reference genomes, we first sketch them based on the K-mer size given. If two genome sketches have an ANI above the threshold given, the two genomes will be regarded as identical. The smaller of the two will then be removed and the larger one will be used as the representative of the two.

GTDB-RS214 databases

These databases were pre-trained using the 214 release of bacterial and archaeal data from GTDB, which GTDB spans 402,709 genomes organized into 85,205 species clusters. For more information on the raw data, refer to the statistics here.

K-mer size ANI Zip file
21 0.8 Download
21 0.95 Download
21 0.995 Download
21 0.9995 Download
31 0.8 Download
31 0.95 Download
31 0.995 Download
31 0.9995 Download
51 0.8 Download
51 0.95 Download
51 0.995 Download
51 0.9995 Download

Archaea

K-mer size ANI Zip file
21 0.8 Download
21 0.95 Download
21 0.995 Download
21 0.9995 Download
31 0.8 Download
31 0.95 Download
31 0.995 Download
31 0.9995 Download
51 0.8 Download
51 0.95 Download
51 0.995 Download
51 0.9995 Download

Fungi

K-mer size ANI Zip file
21 0.8 Download
21 0.95 Download
21 0.995 Download
21 0.9995 Download
31 0.8 Download
31 0.95 Download
31 0.995 Download
31 0.9995 Download
51 0.8 Download
51 0.95 Download
51 0.995 Download
51 0.9995 Download

Protozoa

K-mer size ANI Zip file
21 0.8 Download
21 0.95 Download
21 0.995 Download
21 0.9995 Download
31 0.8 Download
31 0.95 Download
31 0.995 Download
31 0.9995 Download
51 0.8 Download
51 0.95 Download
51 0.995 Download
51 0.9995 Download
Clone this wiki locally