v4.4.0
This release contains many new features! Of particular note:
- sourmash now estimates and outputs average nucleotide identity (ANI) based on k-mer measures;
sourmash sketch translate
is no longer unusably slow;- we provide Mac OS 'arm64' wheels for the new M1 Macs;
- we've added a number of support features for managing large collections of signatures and building very large databases;
- and we've added support for SQLite databases that can be used for storing and searching signatures and doing Kraken-style LCA analysis of genomes and metagenomes.
In addition, we have built updated Genbank genome databases (with contents from March 2022) as well as GTDB R07-RS207 databases; see the prepared databases page. We've also made some benchmarks available for these databases, so you can get some idea of the necessary computational resources for your searches.
Last but by no means least, we have begun providing a number of examples and recipes for using sourmash - see the new sourmash examples Web site!
Major new features:
- add ANI output to search, prefetch, and gather (#1934, #1952, #1955, #1966, #1967, #2011, #2031, #2032)
- new GTDB and Genbank database releases (#2013, #2038)
- provide macos arm64 wheels (#1935)
- support for SQLite databases (#1808)
- implement
sourmash sketch fromfile
(#1884, #1885, #1886, #2009) - add
sourmash sig check
for comparing picklists and databases (#1907, #1915, #1917) - add
sig collect
command (#2036) for building standalone manifests from many databases - Add direct loading of manifest CSVs as sourmash indices (#1891)
- add
-A/--abundance-from
tosig subtract
& addsig inflate
(#1889) - advanced database format documentation (#2025)
Minor new features:
- add
-d/--debug
tosourmash sig describe
; upgrade output errors. (#1782) - add
sum_hashes
tosourmash sig describe
output. (#1882)
Bug fixes:
- catch TypeError in search w/abund vs flat at the command line (#1928)
- speed up
SeqToHashes
translate
(#1938, #1946)
Cleanup and documentation fixes:
- better handle some pickfile errors (#1924)
- remove unnecessary downsampling warnings (#1971)
- use same wording for dayhoff/hp as for dna/protein (#1929)
- rename
covered_bp
property to better reflect function (#2050)
Developer updates:
- provide "protocol" tests for
Index
,CollectionManifest
, andLCA_Database
classes (#1936) - remove khmer CI tests (#1950)
- Benchmarks for seq_to_hashes in protein mode (#1944)
- add some tests for Jaccard output ordering (#1926)
- Oxidize ZipStorage (#1909)
- cleanup and commenting of
test_index.py
tests. (#1898, #1900) - rationalize
_signatures_with_internal
(#1896) - Convert nix to flakes (#1904)
- fix docs build (#1897)
- Fix build/CI and unused imports papercuts (#1974)
- fix hypothesis CI (#2028)
- dependabot version updates (#1977, #1978, #1979, #1980, #1981, #1982, #1983, #1984, #1985, #1986, #1987, #1988, #1989, #1991, #1993, #1994, #1995, #1996, #1997, #1998, #2017, #2019, #2020, #2021, #2022, #2023, #2042)