Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypercore Header DEP #34

Merged
merged 7 commits into from
Jul 6, 2018

Conversation

pfrazee
Copy link
Contributor

@pfrazee pfrazee commented Jun 26, 2018

Standardizes a HypercoreHeader message which can be used as the first entry in hypercores to declare the datastructure along with custom fields.

Copy link
Contributor

@RangerMauve RangerMauve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the MyCustomHeader example! Didn't know you could do something like that. :D

Copy link
Contributor

@bnewbold bnewbold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still kind of wish we could deal with nested schemas like: hyperdb-v2/hyperdrive-v4, but I think the discussion consensus was to just keep it simple.

I'm really excited that this came out backwards compatible with existing
hyperdrive!

I don't remember that much about the wire representation of protobuf, but we
will implicitly still be trying to parse a lot of random bytestrings to see if
they "match" this schema. Should we mention that folks should specifically
avoid having a protobuf message that has a string field as the first entry,
because that could accidentally get interpreted as a "type" string? Do we need
to set a sane max version length (1024 bytes?) to prevent gnarly/weird parsing?

# Rationale and alternatives
[alternatives]: #alternatives

This standard is backwards compatible with v1 of hyperdrive, which used an `Index` message as the first entry in the metadata hypercore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, :party:!

}
```

Data structures should not add additional fields to the `HypercoreHeader`. They should put any additional data into the `extension`. This is easy to do with protobuf's nested messages. For example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should implementations fail fast on additional fields, or should they ignore additional fields? The later is future-compatible, the former is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore, probably. I'll add a note.

}
```

The `type` string should be unique to the data structure (refer to existing DEPs to avoid conflicts). When a breaking change is made to the data-structure, it should be given a new `type` string. For example, the `"hyperdrive"` type string will be updated to `"hyperdrive-v2"` for its second version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that the - character is special or reserved? Should successor versions use the -vNNN syntax?
To clarify my bikeshedding, I think encoding the version as an unstructured string is a reasonable choice, but I think a default delimiter between "schema" and "version/variant" should be chosen.

Some common aesthetically pleasing separators would be -, /, ., _. How do you read one-two_three-four? As two identifiers (one-two followed by three-four), or three (one, two_three, four)?

To propose something concrete, lets say dash (-) is the separator, and identifiers should have dashes in them. Let's also say that the version is optional but recommended ("hyperdrive-v0"), and that the x- or X- prefixes are interpreted as "experimental" (aka, no DEP exists yet).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with some conventions but I'm not sure I want to standardize a syntax. Just not sure it's worth it. Reason being:

Let's also say that the version is optional but recommended ("hyperdrive-v0"), and that the x- or X- prefixes are interpreted as "experimental" (aka, no DEP exists yet)

This is totally reasonable, but you then need to interpret "hyperdrive-v0" == "hyperdrive", and I just don't know if we need that level of detail for this.


A program that is reading a hypercore will examine the `type` to determine how to process the hypercore.

```js
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a soft preference for using pseudo code rather than literal javascript in the DEPs. In additional to being more implementation-agnostic, an argument against using actual code is that it comes with an implication that this exact API is stable, or that this exact code is the canonical code that people should always copy/paste into their applications even years down the line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I'll rewrite with that in mind.

@pfrazee
Copy link
Contributor Author

pfrazee commented Jun 27, 2018

I still kind of wish we could deal with nested schemas

I'm not against revisiting this. It should pretty much always be layered (one structure on top of another) so all we need is a standard separator. The downside is that it adds some complexity (multiple types emitted instead of just 1).

I don't remember that much about the wire representation of protobuf, but we
will implicitly still be trying to parse a lot of random bytestrings to see if
they "match" this schema.

Hmm, yeah that's true.

Should we mention that folks should specifically avoid having a protobuf message that has a string field as the first entry, because that could accidentally get interpreted as a "type" string?

Tricky. If it ends up possible that non-header messages have strings at tag 1, then we have to interpret an unknown type string in the same way we interpret no header at all. We could add some magic numbers to the schema, except that wouldn't be backwards compatible. Might not be a huge deal.

Do we need to set a sane max version length (1024 bytes?) to prevent gnarly/weird parsing?

Is there a way to put a max string length on protobuf fields? @mafintosh


```protobuf
message HypercoreHeader {
required string type = 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to wonder if we should call this dataStructure instead of type, because "type" is a really generic word that could be used elsewhere in the stack while "data structure" is very specific to what this is.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that this can be updated without any backwards compat issues because of protobuf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type is very generic. I'd bikeshed to contentType or contentSchema, but loosely held opinion (dataStructure works for me).


A program that is reading a hypercore will examine the `type` to determine how to process the hypercore.

```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file type here might be helpful to read the rendered version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We intentionally are trying to write pseudo code


At time of writing, Dat's standard data structures are being expanded upon (hyperdb) while existing data structures are being updated with breaking changes (hyperdrive). In order for these changes to be deployed smoothly to the network, it's important that the following needs be met:

1. Programs should correctly identify the data structure type and version within a hypercore. This will enable the program to use the correct handling code (eg hyperdb, hyperdrive-v1, hyperdrive-v2).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the example of hyperdrive-v1 is somewhat at odds with the example of hyperdrive throughout the rest of the article. Might be best to pick one and stick with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that we should change it to hyperdrive

@yoshuawuyts
Copy link
Contributor

yoshuawuyts commented Jul 2, 2018

Are the proposed version names for hyperdrive / hyperdb also part of the DEP, or do they only serve as an example? If they only serve as an example, it might be good to make it extra clear that the names aren't being specified in this DEP.

edit: typo

@pfrazee
Copy link
Contributor Author

pfrazee commented Jul 2, 2018

Are the proposed version names for hyperdrive / hyperdb also part of the DEP, or do they only serve as an example? If they only serve as an example, it might be good to make it extra clear that the names aren't being specified in this DEP.

Agree. They're just examples

@bnewbold
Copy link
Contributor

bnewbold commented Jul 6, 2018

To summarize the current state of this in my eyes:

  • needs a couple tweaks to naming and examples in the DEP text, but I would approve to merge trusting that Paul will make the changes first
  • I'd like to move forward, but with the expectation that we should code this up ("working code") and make sure that it allows beaker/dat to disambiguate and handle both old and new hyperdrive before 100% commiting to this scheme (aka, move to Draft ASAP, also prove this out in code to get to Active status ASAP, as getting this correct for the long-term is crucial for the ecosystem)

@pfrazee
Copy link
Contributor Author

pfrazee commented Jul 6, 2018

Just fixed that "hypercore-v1" thing and added a note that the strings will be in the data structures' DEPs.

I renamed type to dataStructureType. I'm open to discussion on this. I have some code that's trying to handle dats generally in preparation for more types. In it, I referred to each general item as a "dat" and the instance of hypercore/hyperdrive/hyperdb as that dat's "data structure". So:

var dat = {
  key: '...',
  dataStructureType: 'hyperdrive',
  dataStructure: hyperdrive(...)
}

The word dataStructure is a little clunky but it does seem to work alright.

@pfrazee pfrazee merged commit 61681ac into dat-ecosystem-archive:master Jul 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants