-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: options for hypercore feed-level metadata #13
Comments
Been thinking a bit about the handshake message. Instead of making it the first message it could be powerful if you could update that message. Especially if we have an "relatedFeeds" scheme since you most likely wanna update that over time (hyperdb does this a bunch!) What about something like this message Header {
message Feed {
required bytes key = 1;
}
required string protocolType = 1;
optional uint64 version = 2; // defaults to version 0
repeated Feed relatedFeeds = 3; // use a Feed message so users can extend it with other metadata
} And then a convention that every message points back to the last header message message Entry {
optional uint64 headerSeq = 4;
} Or some variation of this |
@bnewbold expanding on the above... what if you could attach an immutable blob and a mutable sequence pointer, or simply just a mutable sequence pointer. Then the header would be stored in the feed but you'd keep the "latest header" pointer outside |
Another idea. Being able to attach a mutable blob that is history less. Meaning that in the handshake or somehow each peer exchange
|
Sounds like a good solution to me |
Been thinking about the security aspects of the mutable header. It becomes a bit tricky fast, imo. I want to avoid situations where peers can withhold the latest Going back to pragmatism what is it we can to get out of this? Originally my thoughts about the immutable header was you'd include a string like this
Or
Ie. immutable descriptions of the data that let's you pick the right strategy to parse the data. The main thing gained from the mutable one would be if we could specify which feeds to crawl (makes archivers easier over time), assuming we spec out a required schema for the handshake on top. This ofcourse could be a massive benefit as well. Unsure how to proceed, again open for input. |
Could an immutable blob which identifies the data structure then use custom headers to identify additional feeds? Then that custom header would be part of the data structure and could be made mutable |
@pfrazee yea that's what i've been thinking too. use the immutable string to pick the right strategy to crawl the feed (default to contentfeed which means no crawling). pros
cons
|
As per the original message, I think there are sort of two things going on here. The first is for all clients/readers/infra/etc to be able to quickly get from a bare The second is the ability to associate generic metadata with a feed, sort of a key/value sidecar to the feed contents proper, which might include related feeds or anything else. We sort of do this with hyperdrive-like feeds via I think the first is more urgently needed for hyperdb+hyperdrive roll out. I propose we focus on a solution to the first part, but not include any "related feed" functionality in it, because that is more "mutable". I think a mutable solution to the second bit would probably be good... but I also think more thinking is needed. In either/any case, off the top of my head I think we should keep all such metadata "in band" in that the same hashing/merkle structure should cover the metadata as well as feed content, so we don't need to add additional verification complexity. |
Just a ping that I think we want to make progress on this in the next week or so. What would be the best next step? A specific implementation proposal? |
This announcement about git wire protocol v2 has some details about how they shoe-horned in a protocol version flag: https://opensource.googleblog.com/2018/05/introducing-git-protocol-version-2.html (this message is really a poke at @mafintosh to write up what we discussed last week) |
In hyperdb v3.0.0, @mafintosh added a minimal protocol header as the first hypercore entry, with protobuf schema (mafintosh/hyperdb#121):
and hyperdb sets the protocol string to |
I actually think that we should adhere to the layered nature of the hyper* tools, which would mean that it does not make sense to state that a hypercore is a hyperdrive, but only that a hypercore is a hyperdb, and then set on the hyperdb level that it's a hyperdrive. So at the hypercore level: And then, at the hyperdb level, I propse that we have a single special reserved key that stores some JSON to set more properties. So e.g. |
Hi @Frando! Thanks for the feedback. We've gone back and forth on this a few times; i'm not sure all the history is in this issue thread. There are advantages to what i'd call the "recursive" approach you mention (typing at each layer of the stack): tooling can fall back to partial support of higher-level protocols (eg, inspect hyperdb even if hyperdrive isn't supported), etc. Some of the trade-offs that pushed me over into the single-top-level-string camp are:
In the end, this boat has basically sailed, in that DEP-0007 got published. We can leave this thread open a little longer if you have more comments, and then close. |
Motivation: have a way to annotate the "type" of feed contents. For example, determine if you're looking at a hyperdb key/value feed, a hyperdrive, or some other thing. A requirement is that code/libraries be able to replicate the feed and discover the content type (and schema version) without necessarily understanding the schema itself. A related motivation is to discover related ("content") feeds in a protocol-agnostic manner, but this isn't a requirement.
Question: should this blob be strictly immutable? Being able to change some metadata might be nice (eg, paired feeds), but keeping it immutable is simple for, eg, hosting platforms and archives.
Option 1: protobuf message as special first entry in feed. This is basically what hyperdrive does currently to point from metadata to content feed, only we would want to use (extensible) protobuf instead of bare bytes. Could potentially select a small fixed number of fields for this protobuf schema (eg, "repeated relatedFeeds bytes", "optional protobufSchema String", "optional contentType String"; strings could be mimetype-like), which application could extend upon.
Option 2: add a metadata/header blob out-of-band to hypercore feeds. @mafintosh mentioned a scheme where an immutable blob is transmitted during feed handshakes, and the hash of that blob is used as a key for internal hypercore hashing. Would be stored as a new stub file in SLEEP directories (like feed key is currently).
There are probably more options if we get creative!
The text was updated successfully, but these errors were encountered: