-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Add StandaloneManifestIndex
class for direct loading of manifest CSVs
#1891
Conversation
Codecov Report
@@ Coverage Diff @@
## latest #1891 +/- ##
==========================================
+ Coverage 82.74% 82.83% +0.08%
==========================================
Files 122 122
Lines 13203 13257 +54
Branches 1779 1789 +10
==========================================
+ Hits 10925 10981 +56
+ Misses 2014 2013 -1
+ Partials 264 263 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Hmm, I wonder if this is just another use case for |
StandaloneManifestIndex
class for direct loading of manifest CSVsStandaloneManifestIndex
class for direct loading of manifest CSVs
StandaloneManifestIndex
class for direct loading of manifest CSVsStandaloneManifestIndex
class for direct loading of manifest CSVs
This is ready for review & merge, @sourmash-bio/devs. |
Co-authored-by: Tessa Pierce Ward <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
Some mechanism to change/reset paths in a manifest might be nice in the future, to help switch between abs paths and rel paths. Or not, at least you're very clear about the dir restriction for rel paths for dir collections! |
Yes! I kind of left it up in the air (beyond documenting it...) because as soon as I started to think about nailing it down further, it became complicated. I opted for defaulting to the "hey you can generate a manifest for a directory! just put it under the TLD!" functionality. The good news is it's all CSV format so it's easy to manipulate the paths if you really have to. |
🎉 |
This PR enables direct loading of manifest CSV files through the addition of a
StandaloneManifestIndex
class. This class supports manifest operations directly, and implements lazy loading of files within manifests upon demand.This is a simpler alternative to
DirectoryIndex
from #1619.A future effort is to use this to better support manifests-of-manifests functionality #1671, and/or implement a sqlite-based manifest storage, perhaps in conjunction with #1808.
This PR:
Index
class,StandaloneManifestIndex
, and enables its loading from the command line.Index
implementation.Closes #1096
Closes #1641
TODO
test_index
.test_sig_describe_3_manifest_fails_when_moved
Example
This PR lets you do:
that is,
tests/test-data/track-abund/mf.csv
becomes a loadable sourmash collection. 😎Importantly, that collection contains a manifest, because it is a manifest. This means that all the signature selection commands work on it without necessarily loading the files underneath.
If you give
sourmash sig manifest
a directory, it will produce the manifest relative to that directory - that is, the manifest internal locations will all be relative to that top-level directory.That's useful but... the proximal motivation is to help with #1671. And for that, the cool functionality provided here is for absolute paths. If you do:
now you have a manifest containing absolute paths. This is obviously much less portable, but it is extremely useful when you have very large collections of signatures sitting somewhere and you want to be able to operate on them without reloading the files (i.e., reparsing the JSON, or reloading all of the manifests from each file).
I don't currently think we should try to a one-step CLI mechanism to generate a manifest with absolute paths, because it's going to lead to ...challenges