-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG: read sig.gz_{n}
sigs from zipfiles
#136
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, ok, but ... I worry we're now introducing different behavior b/t pyo3_branchwater and sourmash :)
Two questions:
- how does manysketch handle this situation? Are multiple sketches created and saved?
- does pyo3_branchwater currently pay any attention to manifests when reading?
Anyway, I think it's fine to go ahead and merge, but let's also create an issue to re-examine this in the future :)
relevant
no, it currently looks through the sig files themselves. I expect to switch to using manifests as part of #134 or follow ups. Hmm, the optimal solution might be to use the manifest to load the same file for duplicates (not storing the duplicates at all). What are the challenges associated with that solution? |
ooh! I like it ;). Not ready to commit to it on sourmash yet, but hot take is it's a leading contender! |
Note: some of our utilities may only keep
I confirmed this is not a reporting difference by listing files in |
(just to be clear, I think you can/should merge this :) |
as noted in sourmash-bio/sourmash#2749 (comment), sourmash zipfiles store repeated md5sums as
.sig.gz_{n}
instead of_{n}.sig.gz
. This PR allows us to read those files as well, rather than skipping over them.