Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEP: Session data extension #27

Merged

Conversation

pfrazee
Copy link
Contributor

@pfrazee pfrazee commented Jun 1, 2018

New DEP proposal. This hasn't been discussed in the WG yet, but came from discussions in #dat on freenode about discovery. The summary:

This DEP defines the non-standard session-data extension message used in the Dat replication protocol. This message provides a way to attach application data to a connection, commonly used for identifying the users and broadcasting personal keys.

The use-case is similar to what @cabal-club has done with the handshake's peerId and userData. I explain in the DEP why I diverged from that approach, and I hope that cabal will be able to adopt this as well so that cabal can work in Beaker.

cc Cabal team @Karissa @cblgh @noffle as well as the dat WG as a whole

@mafintosh
Copy link

Looks very clear to me. LGTM

@mafintosh
Copy link

Or ... LGTV (looks good to vote)


Some applications have used the `peerId` and/or `userData` fields of the replication handshake message in order to broadcast this information. Those mechanisms are unsuitable for Web applications (as in the "Beaker browser") because the sites' applications are not executed reliably prior to the replication handshake. By using an extension message, we provide the same presence & discovery without relying on the timing of the application-code execution.

An alternative approach would be to establish an ephemeral messaging channel, perhaps using a different extension message. This ephemeral channel would broadcast the payload to the client's application code as an event when it is received, but would not retain the most recent payload as session-data. This ephemeral channel would be less effective in Web applications (as in the "Beaker Browser") because it would rely on the application-code being active (loaded in a tab) at time of receipt, whereas the builtin session-data semantic makes it possible for the browser to retain the last payload on the applications' behalf.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why this would be different -- could you say more about the 'ephemeral messaging channel' and how this is different than the replication protocol? And why couldn't Beaker Browser retain the most recent payload as session-data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "ephemeral messaging channel" I'm thinking like the semantics of a UDP socket, which means:

  1. The applications are not alerted about whether delivery is successful.
  2. The payload is not kept for any reason.

Beaker wouldn't retain the most recent payload because that's not the use-case of an ephemeral channel. An ephemeral channel is good for things like sending chat messages; if we had Beaker retain the last message for this use-case too, you'd have to have the app send the "session payload" after other traffic to pin it as the most recent data.

I'm thinking about making a DEP for an ephemeral channel too (for those other use-cases) but I wanted to send this one first and think about the ephemeral channel more.


The client may respond to the message by emitting an event, so that it may be handled by the client's application logic. The client should also make the most recent `sessionData` buffer available to the application logic after message is received.

After publishing this DEP, the "Beaker Browser" will implement a Web API for exposing the `'session-data'` protocol to applications. It will restrict access so that the application code of a `dat://` site will only be able to set the session data for connections related to its own content.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are "connections related to it's own content"? Is it the DatArchive for the URL the content is being served from?

I think that any DatArchives that are created should support session data to support use cases like chat where people might have different apps (in different dats) all talking via the same protocols over one URL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the DatArchive for the URL the content is being served from?

Correct

I think that any DatArchives that are created should support session data to support use cases like chat where people might have different apps (in different dats) all talking via the same protocols over one URL.

That makes sense; you're right that this is an issue. I need to think about the security implications. It's also not clear to me how that would work: how would an app know which other dat archives it needs to be looking at to get the session data?

I think solving that would require some meta-identifier which the sessions are being attached to, where you say "this is fritter session data, anybody else interested in fritter should receive this." And then we'd need to know which connections are interested in fritter session data so that we know which connections should receive the message. You also create the possibility for multiple apps to register session data. It gets a much more complex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, in fact if you have 2 versions of the fritter app, the only common connections would be for the profile archives.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I could see it getting hairy pretty quick. The ephemeral messages spec wouldn't have the problem of specifying where the data is coming from since that can be done at the application level.

how would an app know which other dat archives it needs to be looking at to get the session data

I think the UX would be copy-pasting the link for the channel you want to use for communication. Then it would open the dat archive, which would probably have info about the channel name or whatever, and start looking for peers to talk to (or do the multiwriter thing from cabal),

Alternatively when people make a chat channel, they can fork the application Dat and send the link to that instead of having the application dat. Though that would make updates to the content harder, and would mean having multiple windows for chatting instead of a single one. (Might be fixable with iframes?)

I guess in the worst case, the main application logic for getting the data can live in an iframe and talk to the parent responsible for the UI.

Overall it's not a dealbreaker, but it'll require being a little more "clever" in the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some discussion from IRC. TL;DR: I'm starting to think this may not work and ephemeral messaging may need to be the solution. The problem is, how do you handle multiple apps?

pfrazee> in the ephemeral messaging case, you're going to have
apps cross-talking with each other. Imagine two apps sending messages
on the same channel and receiving responses intended for one or the
other

pfrazee> that'd be very confusing

RangerMauve> I think multiple apps, or even multiple pages of the same
app, writing to the same persistent variable will cause problems

RangerMauve> pfrazee: Maybe have a "stream ID" to have multiplexing be
a first-class concern?

pfrazee> yeah exact same issue, though multiple pages of the same
app *should* be able to coordinate enough not to contest with each
other (because it OUGHT to be setting the same thing)

pfrazee> perhaps

pfrazee> the complexity increase of all the possible solutions
concerns me

RangerMauve> I guess the main issue here is that beaker has a single
replication stream for all pages.

RangerMauve> For lower-level one-app-to-one-stream use cases what you
sketched out works just fine

pfrazee> yeah

pfrazee> so on fritter, for instance, I'd want to attach my
profile key to the dat channel for each profile dat I sync

RangerMauve> pfrazee: To let them know you're following them?

pfrazee> yeah I figure that's how I'd do that

pfrazee> and then the app would probably read the keys attached to
their personal dat archive

RangerMauve> Yeah, that's pretty elegant

pfrazee> yeah then you just gotta figure out how you deal with
multiple apps

pfrazee> I almost wonder if the solution is to let multiple
sessions be attached

RangerMauve> I think ephemeral messaging with application-level RPC
would be a good approach. If you get a message, and it doesn't make
sense to you, ignore it.

pfrazee> well the downside there is, ephemeral messages can easily
be missed because you may not have the app open when it's sent

RangerMauve> Yeah. But that can be accounted for at the application
level, too.

RangerMauve> For example, ping all contact dats when you open the
page, and react to pings while you're active

pfrazee> yeah that's probably how you'd have to do it

RangerMauve> If it's already considerd an "unreliable" channel, then
applications will already need some sort of mechanisms in place to
account for "missed" messages

RangerMauve> And "duplicated" messages

pfrazee> well that's what session-data semantics helps fix

pfrazee> persists the data and it's atomic so you can resend

RangerMauve> I think that for reliable transports, they can create a
throwaway dat and post messages in there, then forget about it when
it's no longer relevant

RangerMauve> Yeah, but that would only work for cases where one
application is writing to the session.

pfrazee> I guess you can "multiplex" the sessions

pfrazee> allow multiple sessionData's to be attached and then
beaker would just let each origin write only one

pfrazee> but that does start to feel...weird

RangerMauve> Yeah, I'm not sure how clean that would be in the end

pfrazee> yeah. I'll talk to mafintosh about this tonight. You are
starting to make me think we'll have to do ephemeral messages instead
and let apps solve it

Copy link
Contributor

@RangerMauve RangerMauve Jun 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding security:

  • If an application already knows the dat URL it wants to talk on, then it's probably allowed to talk on that URL.
  • You could require user action for allowing an application to listen / publish on a type of message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should stick with the Single Origin Policy in the DEP (can only attach session-data to connections for your own dat), write another DEP for ephemeral messaging too with the same S.O.P. approach, and then just expect that we'll need a more sophisticated solution to support apps talking to each other.

My goal is really to come up with something that's simple and works well enough for simple apps to "self communicate" among peers, so to speak. I'm not yet sure whether I think this will be a long-term solution for discovery, so I'm not sure we should spend time agonizing over other use-cases. We can always supersede this DEP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, working around SOP won't be too hard with iframes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RangerMauve (I believe) that's only possible if the target Dat has JS which helps you do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, which is a good thing because then whoever made that dat will need to explicitly opt-into the functionality. I'm making use of frame-rpc for this use case in dat-polyfill, actually.

@hackergrrl
Copy link

hackergrrl commented Jun 1, 2018

Hey y'all. 👋

The reason we use this on-connection hack in cabal is so that we can auto-authorize and get around these hyperdb issues (131) (132). The proposed change here would help cabal work in Beaker, though removing the hyperdb requirement of having a pre-established shared key and explicit authorization would obviate the need for such a DEP (for cabal or any other app that doesn't want a mandatory authorization model, at least).

@RangerMauve
Copy link
Contributor

Yeah, an auto-authorized hyperdb could act as a communication channel if people listen to the changefeed.

@okdistribute
Copy link

Yeah, that's correct @noffle

@pfrazee
Copy link
Contributor Author

pfrazee commented Jun 2, 2018

Hm, I'm concerned about auto-authorizing hyperdbs, for scaling and (perhaps more importantly) for spam attacks. Like: What happens if somebody shows up and starts dumping large datasets into it? Is there a way to stop that and remove the data they add?

I like how elegant the idea is though. It'd be great if every app could have a public hyperdb to coordinate and discover. I'm just not sure we can make it work.

What I was trying to enable with this DEP is two things:

  1. For small apps to be able to discover peers when shared. Things like, a dat containing an event invite, or a collaborative document. The idea being that scale would be kept small by the fact that the dat is only shared with friends. If too many people start showing up, the app could stop authorizing or downloading their data.
  2. For apps like cabal and fritter to experiment with more risky policies, like auto-adding.

It's also important to note -- this DEP was motivated by the need for something that works in the near-term. I expect it to be a stepping stone to a more sophisticated solution.

@hackergrrl
Copy link

hackergrrl commented Jun 2, 2018

I like the idea of the dat/DEP core pieces not worrying too much about policy matters, like authorization, which many different userland app will have different opinions on and needs from. I wonder if it'd make sense to hold off on this DEP until hyperdb makes user-implemented authorization models possible, and see if this DEP is still necessary?

imho specs are so much heavier than a module doing a semver bump to add/remove an API; my preference is to explore the latter option first.

@pfrazee
Copy link
Contributor Author

pfrazee commented Jun 3, 2018 via email

@RangerMauve
Copy link
Contributor

@pfrazee What sort of API were you aiming for from a beaker perspective? It doesn't look like this would be used as a PeerSocket.

@RangerMauve
Copy link
Contributor

If it'll be using the same-origin policy, what will that mean for DNS?

If I publish session data on dat://fritter.hashbase.io, will that conflict with dat://9900f9aad4d6e79e0beb1c46333852b99829e4dfcdfa9b690eeeab3c367c1b9a?

@pfrazee
Copy link
Contributor Author

pfrazee commented Jun 5, 2018

@RangerMauve I'm still thinking about the API, but I'll write something up soon. I'm also going to write up a DEP for ephemeral messages that's similar to this, and the overall API design will depend on that too.

The domain names will not have an effect in this case. Beaker will resolve the current site to its raw URL, and only allow that site to access the session-data for itself.

@pfrazee pfrazee merged commit 4d83beb into dat-ecosystem-archive:master Jun 6, 2018
@pfrazee pfrazee deleted the session-data-extension branch June 6, 2018 18:06
@bnewbold
Copy link
Contributor

bnewbold commented Jun 9, 2018

I missed the boat on reviewing this earlier, but a few notes anyways:

"Any additional bytes should be truncated by the receiving client": I wouldn't ever truncate messages; this leads to debugging nightmares (see also packet fragmentation and UDP MTU truncation). I'd recommend dropping the whole connection ("fail fast"), or disregarding the entire message (though even the later could also be hard to debug, especially if session messages have variable size). None of these are actually great solutions though... what is the upgrade mechanism to expanding the size or complexity of this field in the future, while being backwards compatible with older clients/agents?

I guess overall this feels under-motivated to me, so it's hard to judge whether it delivers on the goal it sets out to achieve. This is probably because I am focused on Dat as a protocol for replicating published content (where I don't think this DEP would have much to provide) and haven't thought as much about real-time and ephemeral use cases as much, so I can't be as helpful.

@pfrazee
Copy link
Contributor Author

pfrazee commented Jun 10, 2018

None of these are actually great solutions though

It's definitely a concern that a truncated message could be misinterpreted. It might be a better idea to drop the message and suggest the receiving client emit an error event, to potentially react to.

what is the upgrade mechanism to expanding the size or complexity of this field in the future, while being backwards compatible with older clients/agents?

Another extension message, like session-data-v2

I guess overall this feels under-motivated to me, so it's hard to judge whether it delivers on the goal it sets out to achieve.

Can you be more specific about what you want? This is kind of a frustratingly broad criticism.

@bnewbold
Copy link
Contributor

bnewbold commented Jun 10, 2018

I guess overall this feels under-motivated to me, so it's hard to judge whether it delivers on the goal it sets out to achieve.

Can you be more specific about what you want? This is kind of a frustratingly broad criticism.

That's totally fair! I think this is a case of not communicating context well over the internet. I think i'm looking for more context around "where does this fit in the big picture". Is this DEP a way to document and formalize what a couple apps are already doing? Seems reasonable. Is it setting out The default way to authenticate hypercore peers as identified users? That seems ambitious and i'd want to do more reading.

Edit: I also commented after only reading the DEP itself. Your comment #27 (comment) above and #27 (comment) provide a lot more context.

@RangerMauve
Copy link
Contributor

Personally, I'm going to be using this for discovering peers in an application.

For example, it can be used in a social media setting to discover peer's day URLs to automatically index them.

@pfrazee
Copy link
Contributor Author

pfrazee commented Jun 10, 2018

@bnewbold okay that makes sense, does #30 improve that you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants