Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential modifications to signature format for major releases #268

Open
2 of 6 tasks
ctb opened this issue Jun 3, 2017 · 10 comments
Open
2 of 6 tasks

Potential modifications to signature format for major releases #268

ctb opened this issue Jun 3, 2017 · 10 comments

Comments

@ctb
Copy link
Contributor

ctb commented Jun 3, 2017

Meta-issue to group together all of the issues that propose to modify the signature format for a sourmash 2.0 release.

@luizirber
Copy link
Member

luizirber commented Jun 3, 2017 via email

@ctb
Copy link
Contributor Author

ctb commented Jun 3, 2017

Other ideas:

  • support tagging and simple Folksonomy (although this might be better done via hypothesis and/or some kind of IPLD magic - we probably don't want to allow/require modifications of the signature itself!)

@ctb
Copy link
Contributor Author

ctb commented Jun 3, 2017

Also: provide some sort of verification/signature tester so that before making a bunch of
signatures public (e.g on IPFS) we can get suggestions on what kind of metadata we should add in.

@luizirber
Copy link
Member

a great discussion on hierarchical vs tag-based file systems, which is very relevant for how to store metadata too: https://www.nayuki.io/page/designing-better-file-organization-around-tags-not-hierarchies

@ctb
Copy link
Contributor Author

ctb commented Sep 27, 2017

also, remove email (done in #335).

@ctb
Copy link
Contributor Author

ctb commented Sep 27, 2017

also add an md5sum for the input sample.

@ctb
Copy link
Contributor Author

ctb commented Sep 28, 2017

@luizirber here's a proposal - any thoughts?

https://hackmd.io/KYdgjATARiCcwFoBmIBsAGBAWYssIENhUBWBEAZhOAgqiQA4pYGg

current text:

Sourmash signatures - metadata thoughts

tl;dr? Keep the core signature format lean and mean, with a few required fields; put other stuff in the metadata attribute, which will be a list of dictionaries. This metadata attribute would be passed along with the signature internally, and reading/writing routines would need to leave it unchanged, but it would not affect md5sums of equality of signatures.

Required fields in a signature:

  • class
  • license
  • hash function
  • signatures
  • version

Reserved block names.

We should identify some reserved block names that have special meaning. Obvious ones include:

  • sample (for information about the sample that was computed upon) - number of bases, name, filename, maybe a download URL, etc.
  • provenance for (maybe free form?) provenance info - what command was run on what system, etc.
  • ipfs for IPFS data file retrieval information.
  • ncbi or ncbi_taxonomy for accessions and/or taxonomy IDs and other such information.

Content of these should be more completely described and then encoded in software & a software validator.

Reserved metadata block proposal: tags

tl; dr? Can we build a useful tagging system in, to support a folksonomy?

See Better file organization around tags not hierarchies via @luizirber.

One thought: here it would be nice to have something that didn't change the actual signature file content, so that the hash didn't change (for e.g. IPFS distribution). Can this be done via some IPFS mechanism (IPNS?) or hypothesis, e.g. specify a unique URI for each signature that could then be annotated in hypothesis. (It looks like the term we want here is "external metadata")

Conundrum: how do we do forwarding?

Another use for external metadata would be forwarding between signatures (e.g. "signature 5e665d is from a sample that has been updated; new signature is 48d23d".) Again, we want to avoid updating the signature content with this information because that would change the hash.

Luiz comments

  • IPLD allows traversing IPFS objects.
    I worked a bit on making the SBT JSON valid for IPLD too,
    they have examples using git https://github.com/ipfs/js-ipfs/tree/master/examples/traverse-ipld-graphs

  • The annoying thing with metadata in IPFS objects is that any change will generate another hash =/
    But not sure how to best represent it outside, either (if it is another IPFS object, we still need to update the signature to point to it, which defeats the purpose).

    • oh, we could save the minhash in one object, and let different signatures point to the same data.
      this way people can make their own metadata or extras on the signatures,
      but everyone has the same values for the minhash.
  • I really like the idea of making tags or extra metadata using hypothesis,
    how easy is to access their data outside the browser?

  • Add a 'previous_version' field, pointing to the previous IPFS object.
    This way we can keep some simple versioning info

@ctb
Copy link
Contributor Author

ctb commented Feb 18, 2018

I am thinking this should be punted to 3.0, given all the other stuff we have in 2.0 already.

@luizirber luizirber added the 4.0 issues to address for a 4.0 release label Jun 8, 2020
@ctb ctb changed the title Modifications to signature format for 2.0 release Potential modifications to signature format for major releases Aug 3, 2020
@ctb
Copy link
Contributor Author

ctb commented Aug 3, 2020

punted to... well, 4.0? 5.0? whatever :)

@ctb ctb removed the 4.0 issues to address for a 4.0 release label Feb 6, 2021
@ctb
Copy link
Contributor Author

ctb commented Feb 6, 2021

I'm starting to think that maybe a good goal would be to add a flexible selector framework (ref #1072) that would let us do keyword searches to select subsets of signatures based on tags, taxonomy, etc. This could integrate well with a folksonomy-style tagging approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants