You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Return Index containing only signatures that match requirements.
Current arguments can be any or all of:
* ksize
* moltype
* scaled
* num
* containment
'select' will raise ValueError if the requirements are incompatible
with the Index subclass.
'select' may return an empty object or None if no matches can be
found.
This was added into LinearIndex, LazyLinearIndex, ZipFileLinearIndex, and MultiIndex, as well as the SBT and LCA Database classes.
The ultimate idea is to cleanly support databases and collections with richer signature types, see #198.
A few design decisions were made as part of this - the most consequential one is that select just selects compatible signatures, and doesn't actually do any downsampling or anything. See #1072 (comment) for links.
#1433 discusses the distinction between finding compatible sketches and/or filtering out incompatible ones; and making the sketches compatible. Maybe we should think about an apply_select method, that returns a collection of compatible sketches?
#1433 discusses the distinction between finding compatible sketches and/or filtering out incompatible ones; and making the sketches compatible. Maybe we should think about an apply_select method, that returns a collection of compatible sketches?
We have a bunch of janky code that does this on the fly, but I'm pretty happy with the overall design philosophy. And downsampling doesn't seem to be that slow. So I'm inclined to not worry about doing downsampling on the fly, and instead I think we should stick with declarative approaches ("I want a sketch at this scaled/etc") and do clever caching or lazy evaluation underneath.
This is an update of & replacement for #1072, which introduced the idea of a
database.select(...)
function.This issue is being updated after the release of sourmash 4.1.
In #1406 and #1392, we significantly expanded selector functionality.
The key changes were actually in #1420, a PR into #1406. This introduced the following method on
Index
classes -with this docstring:
This was added into
LinearIndex
,LazyLinearIndex
,ZipFileLinearIndex
, andMultiIndex
, as well as the SBT and LCA Database classes.The ultimate idea is to cleanly support databases and collections with richer signature types, see #198.
A few design decisions were made as part of this - the most consequential one is that
select
just selects compatible signatures, and doesn't actually do any downsampling or anything. See #1072 (comment) for links.Items from #1072 not tackled in #1420:
db = db.select(ksize=31).select(moltype'dna')
db = db.select(other_db)
ordb = db.select(some_sig)
(not all of these may be good ideas, but leaving them in here for discussion ;).
Some other TODO items:
Index.select(...)
tests #1427 suggests we need moreselect
testsNotes and additional thoughts:
db.select(sig)
yields all compatible signatures) could be a better, cleaner way to fix sourmash gather doesn't automatically figure out ksize from database #809 (and replace [WIP] Refactor subject database and signature loading for search, gather, and multigather. #934)The text was updated successfully, but these errors were encountered: