Skip to content

v4.4.0

Compare
Choose a tag to compare
@ctb ctb released this 13 May 16:33
· 711 commits to latest since this release
851dc2b

This release contains many new features! Of particular note:

  • sourmash now estimates and outputs average nucleotide identity (ANI) based on k-mer measures;
  • sourmash sketch translate is no longer unusably slow;
  • we provide Mac OS 'arm64' wheels for the new M1 Macs;
  • we've added a number of support features for managing large collections of signatures and building very large databases;
  • and we've added support for SQLite databases that can be used for storing and searching signatures and doing Kraken-style LCA analysis of genomes and metagenomes.

In addition, we have built updated Genbank genome databases (with contents from March 2022) as well as GTDB R07-RS207 databases; see the prepared databases page. We've also made some benchmarks available for these databases, so you can get some idea of the necessary computational resources for your searches.

Last but by no means least, we have begun providing a number of examples and recipes for using sourmash - see the new sourmash examples Web site!


Major new features:

  • add ANI output to search, prefetch, and gather (#1934, #1952, #1955, #1966, #1967, #2011, #2031, #2032)
  • new GTDB and Genbank database releases (#2013, #2038)
  • provide macos arm64 wheels (#1935)
  • support for SQLite databases (#1808)
  • implement sourmash sketch fromfile (#1884, #1885, #1886, #2009)
  • add sourmash sig check for comparing picklists and databases (#1907, #1915, #1917)
  • add sig collect command (#2036) for building standalone manifests from many databases
  • Add direct loading of manifest CSVs as sourmash indices (#1891)
  • add -A/--abundance-from to sig subtract & add sig inflate (#1889)
  • advanced database format documentation (#2025)

Minor new features:

  • add -d/--debug to sourmash sig describe; upgrade output errors. (#1782)
  • add sum_hashes to sourmash sig describe output. (#1882)

Bug fixes:

  • catch TypeError in search w/abund vs flat at the command line (#1928)
  • speed up SeqToHashes translate (#1938, #1946)

Cleanup and documentation fixes:

  • better handle some pickfile errors (#1924)
  • remove unnecessary downsampling warnings (#1971)
  • use same wording for dayhoff/hp as for dna/protein (#1929)
  • rename covered_bp property to better reflect function (#2050)

Developer updates: