-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add feature so that sourmash gather ignores perfect matches #433
Comments
This would definitely require modifying the Thinking about this, I'm not sure it's possible with the current SBT gather implementation without abandoning some of the optimizations. LCA gather would require modifications, too, but could be easier. |
ref #849 and motivation here, dib-lab/charcoal#121 |
underlying support for this added in #1477. |
this is easily supported at the underlying algorithmic level by #1370. |
@bluegenes added generic functionality to support this in #1623; thinking about adding command-line options now :) |
I... think
🎉 @bluegenes does this meet your needs for workflows, or do you think we should put in a command-line switch '--exclude-self` that does this automagically? I imagine the simplest way to implement If we can't trust the identifiers (which is always a questionable proposition...) we'd need to use md5sum / sketch identity 🤔 . On the flip flip side, this is maybe niche enough that rather than a dedicated command line option it's simpler to do whatever exclusion makes sense to a workflow based on that workflow's special needs, and make sure we provide the command line options to support that, which is now available with picklists and/or --exclude-db-pattern. |
I'm going to close this now that we have picklists and |
See #432 for rationale; basically, if you run gather on a signature that is present in a database, it will always report itself.
sourmash search
already ignores signatures that are identical to the query.The text was updated successfully, but these errors were encountered: