DEP: Session data extension #27

pfrazee · 2018-06-01T14:46:05Z

New DEP proposal. This hasn't been discussed in the WG yet, but came from discussions in #dat on freenode about discovery. The summary:

This DEP defines the non-standard session-data extension message used in the Dat replication protocol. This message provides a way to attach application data to a connection, commonly used for identifying the users and broadcasting personal keys.

The use-case is similar to what @cabal-club has done with the handshake's peerId and userData. I explain in the DEP why I diverged from that approach, and I hope that cabal will be able to adopt this as well so that cabal can work in Beaker.

cc Cabal team @Karissa @cblgh @noffle as well as the dat WG as a whole

mafintosh · 2018-06-01T15:12:22Z

Looks very clear to me. LGTM

mafintosh · 2018-06-01T15:12:42Z

Or ... LGTV (looks good to vote)

okdistribute · 2018-06-01T15:47:27Z

proposals/0000-session-data-extension.md

+
+Some applications have used the `peerId` and/or `userData` fields of the replication handshake message in order to broadcast this information. Those mechanisms are unsuitable for Web applications (as in the "Beaker browser") because the sites' applications are not executed reliably prior to the replication handshake. By using an extension message, we provide the same presence & discovery without relying on the timing of the application-code execution.
+
+An alternative approach would be to establish an ephemeral messaging channel, perhaps using a different extension message. This ephemeral channel would broadcast the payload to the client's application code as an event when it is received, but would not retain the most recent payload as session-data. This ephemeral channel would be less effective in Web applications (as in the "Beaker Browser") because it would rely on the application-code being active (loaded in a tab) at time of receipt, whereas the builtin session-data semantic makes it possible for the browser to retain the last payload on the applications' behalf.


I'm not sure I understand why this would be different -- could you say more about the 'ephemeral messaging channel' and how this is different than the replication protocol? And why couldn't Beaker Browser retain the most recent payload as session-data?

By "ephemeral messaging channel" I'm thinking like the semantics of a UDP socket, which means:

The applications are not alerted about whether delivery is successful.

The payload is not kept for any reason.

Beaker wouldn't retain the most recent payload because that's not the use-case of an ephemeral channel. An ephemeral channel is good for things like sending chat messages; if we had Beaker retain the last message for this use-case too, you'd have to have the app send the "session payload" after other traffic to pin it as the most recent data.

I'm thinking about making a DEP for an ephemeral channel too (for those other use-cases) but I wanted to send this one first and think about the ephemeral channel more.

RangerMauve · 2018-06-01T16:51:19Z

proposals/0000-session-data-extension.md

+
+The client may respond to the message by emitting an event, so that it may be handled by the client's application logic. The client should also make the most recent `sessionData` buffer available to the application logic after message is received.
+
+After publishing this DEP, the "Beaker Browser" will implement a Web API for exposing the `'session-data'` protocol to applications. It will restrict access so that the application code of a `dat://` site will only be able to set the session data for connections related to its own content.


What are "connections related to it's own content"? Is it the DatArchive for the URL the content is being served from?

I think that any DatArchives that are created should support session data to support use cases like chat where people might have different apps (in different dats) all talking via the same protocols over one URL.

Is it the DatArchive for the URL the content is being served from?

Correct

I think that any DatArchives that are created should support session data to support use cases like chat where people might have different apps (in different dats) all talking via the same protocols over one URL.

That makes sense; you're right that this is an issue. I need to think about the security implications. It's also not clear to me how that would work: how would an app know which other dat archives it needs to be looking at to get the session data?

I think solving that would require some meta-identifier which the sessions are being attached to, where you say "this is fritter session data, anybody else interested in fritter should receive this." And then we'd need to know which connections are interested in fritter session data so that we know which connections should receive the message. You also create the possibility for multiple apps to register session data. It gets a much more complex.

Hmm, in fact if you have 2 versions of the fritter app, the only common connections would be for the profile archives.

That's a good point. I could see it getting hairy pretty quick. The ephemeral messages spec wouldn't have the problem of specifying where the data is coming from since that can be done at the application level.

how would an app know which other dat archives it needs to be looking at to get the session data

I think the UX would be copy-pasting the link for the channel you want to use for communication. Then it would open the dat archive, which would probably have info about the channel name or whatever, and start looking for peers to talk to (or do the multiwriter thing from cabal),

Alternatively when people make a chat channel, they can fork the application Dat and send the link to that instead of having the application dat. Though that would make updates to the content harder, and would mean having multiple windows for chatting instead of a single one. (Might be fixable with iframes?)

I guess in the worst case, the main application logic for getting the data can live in an iframe and talk to the parent responsible for the UI.

Overall it's not a dealbreaker, but it'll require being a little more "clever" in the implementation.

Some discussion from IRC. TL;DR: I'm starting to think this may not work and ephemeral messaging may need to be the solution. The problem is, how do you handle multiple apps?

pfrazee> in the ephemeral messaging case, you're going to have apps cross-talking with each other. Imagine two apps sending messages on the same channel and receiving responses intended for one or the other pfrazee> that'd be very confusing RangerMauve> I think multiple apps, or even multiple pages of the same app, writing to the same persistent variable will cause problems RangerMauve> pfrazee: Maybe have a "stream ID" to have multiplexing be a first-class concern? pfrazee> yeah exact same issue, though multiple pages of the same app *should* be able to coordinate enough not to contest with each other (because it OUGHT to be setting the same thing) pfrazee> perhaps pfrazee> the complexity increase of all the possible solutions concerns me RangerMauve> I guess the main issue here is that beaker has a single replication stream for all pages. RangerMauve> For lower-level one-app-to-one-stream use cases what you sketched out works just fine pfrazee> yeah pfrazee> so on fritter, for instance, I'd want to attach my profile key to the dat channel for each profile dat I sync RangerMauve> pfrazee: To let them know you're following them? pfrazee> yeah I figure that's how I'd do that pfrazee> and then the app would probably read the keys attached to their personal dat archive RangerMauve> Yeah, that's pretty elegant pfrazee> yeah then you just gotta figure out how you deal with multiple apps pfrazee> I almost wonder if the solution is to let multiple sessions be attached RangerMauve> I think ephemeral messaging with application-level RPC would be a good approach. If you get a message, and it doesn't make sense to you, ignore it. pfrazee> well the downside there is, ephemeral messages can easily be missed because you may not have the app open when it's sent RangerMauve> Yeah. But that can be accounted for at the application level, too. RangerMauve> For example, ping all contact dats when you open the page, and react to pings while you're active pfrazee> yeah that's probably how you'd have to do it RangerMauve> If it's already considerd an "unreliable" channel, then applications will already need some sort of mechanisms in place to account for "missed" messages RangerMauve> And "duplicated" messages pfrazee> well that's what session-data semantics helps fix pfrazee> persists the data and it's atomic so you can resend RangerMauve> I think that for reliable transports, they can create a throwaway dat and post messages in there, then forget about it when it's no longer relevant RangerMauve> Yeah, but that would only work for cases where one application is writing to the session. pfrazee> I guess you can "multiplex" the sessions pfrazee> allow multiple sessionData's to be attached and then beaker would just let each origin write only one pfrazee> but that does start to feel...weird RangerMauve> Yeah, I'm not sure how clean that would be in the end pfrazee> yeah. I'll talk to mafintosh about this tonight. You are starting to make me think we'll have to do ephemeral messages instead and let apps solve it

Regarding security:

If an application already knows the dat URL it wants to talk on, then it's probably allowed to talk on that URL.

You could require user action for allowing an application to listen / publish on a type of message.

I'm wondering if we should stick with the Single Origin Policy in the DEP (can only attach session-data to connections for your own dat), write another DEP for ephemeral messaging too with the same S.O.P. approach, and then just expect that we'll need a more sophisticated solution to support apps talking to each other.

My goal is really to come up with something that's simple and works well enough for simple apps to "self communicate" among peers, so to speak. I'm not yet sure whether I think this will be a long-term solution for discovery, so I'm not sure we should spend time agonizing over other use-cases. We can always supersede this DEP.

Yeah, working around SOP won't be too hard with iframes.

@RangerMauve (I believe) that's only possible if the target Dat has JS which helps you do it.

Yeah, which is a good thing because then whoever made that dat will need to explicitly opt-into the functionality. I'm making use of frame-rpc for this use case in dat-polyfill, actually.

hackergrrl · 2018-06-01T18:57:56Z

Hey y'all. 👋

The reason we use this on-connection hack in cabal is so that we can auto-authorize and get around these hyperdb issues (131) (132). The proposed change here would help cabal work in Beaker, though removing the hyperdb requirement of having a pre-established shared key and explicit authorization would obviate the need for such a DEP (for cabal or any other app that doesn't want a mandatory authorization model, at least).

RangerMauve · 2018-06-01T19:13:56Z

Yeah, an auto-authorized hyperdb could act as a communication channel if people listen to the changefeed.

okdistribute · 2018-06-01T20:48:06Z

Yeah, that's correct @noffle

pfrazee · 2018-06-02T02:39:16Z

Hm, I'm concerned about auto-authorizing hyperdbs, for scaling and (perhaps more importantly) for spam attacks. Like: What happens if somebody shows up and starts dumping large datasets into it? Is there a way to stop that and remove the data they add?

I like how elegant the idea is though. It'd be great if every app could have a public hyperdb to coordinate and discover. I'm just not sure we can make it work.

What I was trying to enable with this DEP is two things:

For small apps to be able to discover peers when shared. Things like, a dat containing an event invite, or a collaborative document. The idea being that scale would be kept small by the fact that the dat is only shared with friends. If too many people start showing up, the app could stop authorizing or downloading their data.
For apps like cabal and fritter to experiment with more risky policies, like auto-adding.

It's also important to note -- this DEP was motivated by the need for something that works in the near-term. I expect it to be a stepping stone to a more sophisticated solution.

hackergrrl · 2018-06-02T20:05:13Z

I like the idea of the dat/DEP core pieces not worrying too much about policy matters, like authorization, which many different userland app will have different opinions on and needs from. I wonder if it'd make sense to hold off on this DEP until hyperdb makes user-implemented authorization models possible, and see if this DEP is still necessary?

imho specs are so much heavier than a module doing a semver bump to add/remove an API; my preference is to explore the latter option first.

pfrazee · 2018-06-03T01:19:52Z

I’m 👍 to keep discussing multiwriter’s auth policies but I have utility for this spec outside of multiwriter authorization so I’d prefer we didn’t hold up this DEP for that discussion.

RangerMauve · 2018-06-03T19:53:17Z

@pfrazee What sort of API were you aiming for from a beaker perspective? It doesn't look like this would be used as a PeerSocket.

RangerMauve · 2018-06-04T16:38:37Z

If it'll be using the same-origin policy, what will that mean for DNS?

If I publish session data on dat://fritter.hashbase.io, will that conflict with dat://9900f9aad4d6e79e0beb1c46333852b99829e4dfcdfa9b690eeeab3c367c1b9a?

pfrazee · 2018-06-05T22:05:58Z

@RangerMauve I'm still thinking about the API, but I'll write something up soon. I'm also going to write up a DEP for ephemeral messages that's similar to this, and the overall API design will depend on that too.

The domain names will not have an effect in this case. Beaker will resolve the current site to its raw URL, and only allow that site to access the session-data for itself.

bnewbold · 2018-06-09T23:48:49Z

I missed the boat on reviewing this earlier, but a few notes anyways:

"Any additional bytes should be truncated by the receiving client": I wouldn't ever truncate messages; this leads to debugging nightmares (see also packet fragmentation and UDP MTU truncation). I'd recommend dropping the whole connection ("fail fast"), or disregarding the entire message (though even the later could also be hard to debug, especially if session messages have variable size). None of these are actually great solutions though... what is the upgrade mechanism to expanding the size or complexity of this field in the future, while being backwards compatible with older clients/agents?

I guess overall this feels under-motivated to me, so it's hard to judge whether it delivers on the goal it sets out to achieve. This is probably because I am focused on Dat as a protocol for replicating published content (where I don't think this DEP would have much to provide) and haven't thought as much about real-time and ephemeral use cases as much, so I can't be as helpful.

pfrazee · 2018-06-10T00:09:09Z

None of these are actually great solutions though

It's definitely a concern that a truncated message could be misinterpreted. It might be a better idea to drop the message and suggest the receiving client emit an error event, to potentially react to.

what is the upgrade mechanism to expanding the size or complexity of this field in the future, while being backwards compatible with older clients/agents?

Another extension message, like session-data-v2

I guess overall this feels under-motivated to me, so it's hard to judge whether it delivers on the goal it sets out to achieve.

Can you be more specific about what you want? This is kind of a frustratingly broad criticism.

bnewbold · 2018-06-10T03:47:50Z

I guess overall this feels under-motivated to me, so it's hard to judge whether it delivers on the goal it sets out to achieve.

Can you be more specific about what you want? This is kind of a frustratingly broad criticism.

That's totally fair! I think this is a case of not communicating context well over the internet. I think i'm looking for more context around "where does this fit in the big picture". Is this DEP a way to document and formalize what a couple apps are already doing? Seems reasonable. Is it setting out The default way to authenticate hypercore peers as identified users? That seems ambitious and i'd want to do more reading.

Edit: I also commented after only reading the DEP itself. Your comment #27 (comment) above and #27 (comment) provide a lot more context.

RangerMauve · 2018-06-10T04:16:52Z

Personally, I'm going to be using this for discovering peers in an application.

For example, it can be used in a social media setting to discover peer's day URLs to automatically index them.

pfrazee · 2018-06-10T21:45:46Z

@bnewbold okay that makes sense, does #30 improve that you think?

pfrazee added 4 commits May 30, 2018 14:50

Add proposals/0000-session-data-extension.md

83bcec9

Expand on drawbacks in proposals/0000-session-data-extension.md

833f0b8

Expand on rationale in proposals/0000-session-data-extension.md

e339c9b

Set PR link on proposals/0000-session-data-extension.md

d19820d

okdistribute reviewed Jun 1, 2018

View reviewed changes

pfrazee mentioned this pull request Jun 1, 2018

Upcoming Meeting Agenda - 6th June 2018 dat-ecosystem/consortium#21

Closed

6 tasks

RangerMauve reviewed Jun 1, 2018

View reviewed changes

pfrazee mentioned this pull request Jun 2, 2018

Optional authorization model / not require explicit authorization of feeds mafintosh/hyperdb#131

Open

This was referenced Jun 5, 2018

Mentions beakerbrowser/fritter#3

Open

DEP: Ephemeral message extension #28

Closed

Publish 0006-session-data-extension as a draft

029b506

pfrazee merged commit 4d83beb into dat-ecosystem-archive:master Jun 6, 2018

pfrazee deleted the session-data-extension branch June 6, 2018 18:06

pfrazee mentioned this pull request Jun 10, 2018

Add more context & background to 0006-session-data-extension #30

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEP: Session data extension #27

DEP: Session data extension #27

pfrazee commented Jun 1, 2018

mafintosh commented Jun 1, 2018

mafintosh commented Jun 1, 2018

okdistribute Jun 1, 2018

pfrazee Jun 1, 2018

RangerMauve Jun 1, 2018

pfrazee Jun 1, 2018

pfrazee Jun 1, 2018

RangerMauve Jun 1, 2018

pfrazee Jun 1, 2018

RangerMauve Jun 1, 2018 •

edited

Loading

pfrazee Jun 2, 2018

RangerMauve Jun 3, 2018

pfrazee Jun 5, 2018

RangerMauve Jun 6, 2018

hackergrrl commented Jun 1, 2018 •

edited

Loading

RangerMauve commented Jun 1, 2018

okdistribute commented Jun 1, 2018

pfrazee commented Jun 2, 2018

hackergrrl commented Jun 2, 2018 •

edited

Loading

pfrazee commented Jun 3, 2018 via email

RangerMauve commented Jun 3, 2018

RangerMauve commented Jun 4, 2018

pfrazee commented Jun 5, 2018

bnewbold commented Jun 9, 2018

pfrazee commented Jun 10, 2018

bnewbold commented Jun 10, 2018 •

edited

Loading

RangerMauve commented Jun 10, 2018

pfrazee commented Jun 10, 2018


		Some applications have used the `peerId` and/or `userData` fields of the replication handshake message in order to broadcast this information. Those mechanisms are unsuitable for Web applications (as in the "Beaker browser") because the sites' applications are not executed reliably prior to the replication handshake. By using an extension message, we provide the same presence & discovery without relying on the timing of the application-code execution.

		An alternative approach would be to establish an ephemeral messaging channel, perhaps using a different extension message. This ephemeral channel would broadcast the payload to the client's application code as an event when it is received, but would not retain the most recent payload as session-data. This ephemeral channel would be less effective in Web applications (as in the "Beaker Browser") because it would rely on the application-code being active (loaded in a tab) at time of receipt, whereas the builtin session-data semantic makes it possible for the browser to retain the last payload on the applications' behalf.


		The client may respond to the message by emitting an event, so that it may be handled by the client's application logic. The client should also make the most recent `sessionData` buffer available to the application logic after message is received.

		After publishing this DEP, the "Beaker Browser" will implement a Web API for exposing the `'session-data'` protocol to applications. It will restrict access so that the application code of a `dat://` site will only be able to set the session data for connections related to its own content.

DEP: Session data extension #27

DEP: Session data extension #27

Conversation

pfrazee commented Jun 1, 2018

mafintosh commented Jun 1, 2018

mafintosh commented Jun 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RangerMauve Jun 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hackergrrl commented Jun 1, 2018 • edited Loading

RangerMauve commented Jun 1, 2018

okdistribute commented Jun 1, 2018

pfrazee commented Jun 2, 2018

hackergrrl commented Jun 2, 2018 • edited Loading

pfrazee commented Jun 3, 2018 via email

RangerMauve commented Jun 3, 2018

RangerMauve commented Jun 4, 2018

pfrazee commented Jun 5, 2018

bnewbold commented Jun 9, 2018

pfrazee commented Jun 10, 2018

bnewbold commented Jun 10, 2018 • edited Loading

RangerMauve commented Jun 10, 2018

pfrazee commented Jun 10, 2018

RangerMauve Jun 1, 2018 •

edited

Loading

hackergrrl commented Jun 1, 2018 •

edited

Loading

hackergrrl commented Jun 2, 2018 •

edited

Loading

bnewbold commented Jun 10, 2018 •

edited

Loading