-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEP: Session data extension #27
DEP: Session data extension #27
Conversation
Looks very clear to me. LGTM |
Or ... LGTV (looks good to vote) |
|
||
Some applications have used the `peerId` and/or `userData` fields of the replication handshake message in order to broadcast this information. Those mechanisms are unsuitable for Web applications (as in the "Beaker browser") because the sites' applications are not executed reliably prior to the replication handshake. By using an extension message, we provide the same presence & discovery without relying on the timing of the application-code execution. | ||
|
||
An alternative approach would be to establish an ephemeral messaging channel, perhaps using a different extension message. This ephemeral channel would broadcast the payload to the client's application code as an event when it is received, but would not retain the most recent payload as session-data. This ephemeral channel would be less effective in Web applications (as in the "Beaker Browser") because it would rely on the application-code being active (loaded in a tab) at time of receipt, whereas the builtin session-data semantic makes it possible for the browser to retain the last payload on the applications' behalf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why this would be different -- could you say more about the 'ephemeral messaging channel' and how this is different than the replication protocol? And why couldn't Beaker Browser retain the most recent payload as session-data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "ephemeral messaging channel" I'm thinking like the semantics of a UDP socket, which means:
- The applications are not alerted about whether delivery is successful.
- The payload is not kept for any reason.
Beaker wouldn't retain the most recent payload because that's not the use-case of an ephemeral channel. An ephemeral channel is good for things like sending chat messages; if we had Beaker retain the last message for this use-case too, you'd have to have the app send the "session payload" after other traffic to pin it as the most recent data.
I'm thinking about making a DEP for an ephemeral channel too (for those other use-cases) but I wanted to send this one first and think about the ephemeral channel more.
|
||
The client may respond to the message by emitting an event, so that it may be handled by the client's application logic. The client should also make the most recent `sessionData` buffer available to the application logic after message is received. | ||
|
||
After publishing this DEP, the "Beaker Browser" will implement a Web API for exposing the `'session-data'` protocol to applications. It will restrict access so that the application code of a `dat://` site will only be able to set the session data for connections related to its own content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are "connections related to it's own content"? Is it the DatArchive for the URL the content is being served from?
I think that any DatArchives that are created should support session data to support use cases like chat where people might have different apps (in different dats) all talking via the same protocols over one URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the DatArchive for the URL the content is being served from?
Correct
I think that any DatArchives that are created should support session data to support use cases like chat where people might have different apps (in different dats) all talking via the same protocols over one URL.
That makes sense; you're right that this is an issue. I need to think about the security implications. It's also not clear to me how that would work: how would an app know which other dat archives it needs to be looking at to get the session data?
I think solving that would require some meta-identifier which the sessions are being attached to, where you say "this is fritter
session data, anybody else interested in fritter
should receive this." And then we'd need to know which connections are interested in fritter
session data so that we know which connections should receive the message. You also create the possibility for multiple apps to register session data. It gets a much more complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, in fact if you have 2 versions of the fritter app, the only common connections would be for the profile archives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I could see it getting hairy pretty quick. The ephemeral messages spec wouldn't have the problem of specifying where the data is coming from since that can be done at the application level.
how would an app know which other dat archives it needs to be looking at to get the session data
I think the UX would be copy-pasting the link for the channel you want to use for communication. Then it would open the dat archive, which would probably have info about the channel name or whatever, and start looking for peers to talk to (or do the multiwriter thing from cabal),
Alternatively when people make a chat channel, they can fork the application Dat and send the link to that instead of having the application dat. Though that would make updates to the content harder, and would mean having multiple windows for chatting instead of a single one. (Might be fixable with iframes?)
I guess in the worst case, the main application logic for getting the data can live in an iframe and talk to the parent responsible for the UI.
Overall it's not a dealbreaker, but it'll require being a little more "clever" in the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some discussion from IRC. TL;DR: I'm starting to think this may not work and ephemeral messaging may need to be the solution. The problem is, how do you handle multiple apps?
pfrazee> in the ephemeral messaging case, you're going to have
apps cross-talking with each other. Imagine two apps sending messages
on the same channel and receiving responses intended for one or the
other
pfrazee> that'd be very confusing
RangerMauve> I think multiple apps, or even multiple pages of the same
app, writing to the same persistent variable will cause problems
RangerMauve> pfrazee: Maybe have a "stream ID" to have multiplexing be
a first-class concern?
pfrazee> yeah exact same issue, though multiple pages of the same
app *should* be able to coordinate enough not to contest with each
other (because it OUGHT to be setting the same thing)
pfrazee> perhaps
pfrazee> the complexity increase of all the possible solutions
concerns me
RangerMauve> I guess the main issue here is that beaker has a single
replication stream for all pages.
RangerMauve> For lower-level one-app-to-one-stream use cases what you
sketched out works just fine
pfrazee> yeah
pfrazee> so on fritter, for instance, I'd want to attach my
profile key to the dat channel for each profile dat I sync
RangerMauve> pfrazee: To let them know you're following them?
pfrazee> yeah I figure that's how I'd do that
pfrazee> and then the app would probably read the keys attached to
their personal dat archive
RangerMauve> Yeah, that's pretty elegant
pfrazee> yeah then you just gotta figure out how you deal with
multiple apps
pfrazee> I almost wonder if the solution is to let multiple
sessions be attached
RangerMauve> I think ephemeral messaging with application-level RPC
would be a good approach. If you get a message, and it doesn't make
sense to you, ignore it.
pfrazee> well the downside there is, ephemeral messages can easily
be missed because you may not have the app open when it's sent
RangerMauve> Yeah. But that can be accounted for at the application
level, too.
RangerMauve> For example, ping all contact dats when you open the
page, and react to pings while you're active
pfrazee> yeah that's probably how you'd have to do it
RangerMauve> If it's already considerd an "unreliable" channel, then
applications will already need some sort of mechanisms in place to
account for "missed" messages
RangerMauve> And "duplicated" messages
pfrazee> well that's what session-data semantics helps fix
pfrazee> persists the data and it's atomic so you can resend
RangerMauve> I think that for reliable transports, they can create a
throwaway dat and post messages in there, then forget about it when
it's no longer relevant
RangerMauve> Yeah, but that would only work for cases where one
application is writing to the session.
pfrazee> I guess you can "multiplex" the sessions
pfrazee> allow multiple sessionData's to be attached and then
beaker would just let each origin write only one
pfrazee> but that does start to feel...weird
RangerMauve> Yeah, I'm not sure how clean that would be in the end
pfrazee> yeah. I'll talk to mafintosh about this tonight. You are
starting to make me think we'll have to do ephemeral messages instead
and let apps solve it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding security:
- If an application already knows the dat URL it wants to talk on, then it's probably allowed to talk on that URL.
- You could require user action for allowing an application to listen / publish on a type of message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should stick with the Single Origin Policy in the DEP (can only attach session-data to connections for your own dat), write another DEP for ephemeral messaging too with the same S.O.P. approach, and then just expect that we'll need a more sophisticated solution to support apps talking to each other.
My goal is really to come up with something that's simple and works well enough for simple apps to "self communicate" among peers, so to speak. I'm not yet sure whether I think this will be a long-term solution for discovery, so I'm not sure we should spend time agonizing over other use-cases. We can always supersede this DEP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, working around SOP won't be too hard with iframes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RangerMauve (I believe) that's only possible if the target Dat has JS which helps you do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, which is a good thing because then whoever made that dat will need to explicitly opt-into the functionality. I'm making use of frame-rpc for this use case in dat-polyfill, actually.
Hey y'all. 👋 The reason we use this on-connection hack in cabal is so that we can auto-authorize and get around these hyperdb issues (131) (132). The proposed change here would help cabal work in Beaker, though removing the hyperdb requirement of having a pre-established shared key and explicit authorization would obviate the need for such a DEP (for cabal or any other app that doesn't want a mandatory authorization model, at least). |
Yeah, an auto-authorized hyperdb could act as a communication channel if people listen to the changefeed. |
Yeah, that's correct @noffle |
Hm, I'm concerned about auto-authorizing hyperdbs, for scaling and (perhaps more importantly) for spam attacks. Like: What happens if somebody shows up and starts dumping large datasets into it? Is there a way to stop that and remove the data they add? I like how elegant the idea is though. It'd be great if every app could have a public hyperdb to coordinate and discover. I'm just not sure we can make it work. What I was trying to enable with this DEP is two things:
It's also important to note -- this DEP was motivated by the need for something that works in the near-term. I expect it to be a stepping stone to a more sophisticated solution. |
I like the idea of the dat/DEP core pieces not worrying too much about policy matters, like authorization, which many different userland app will have different opinions on and needs from. I wonder if it'd make sense to hold off on this DEP until hyperdb makes user-implemented authorization models possible, and see if this DEP is still necessary? imho specs are so much heavier than a module doing a semver bump to add/remove an API; my preference is to explore the latter option first. |
I’m 👍 to keep discussing multiwriter’s auth policies but I have utility for this spec outside of multiwriter authorization so I’d prefer we didn’t hold up this DEP for that discussion.
|
@pfrazee What sort of API were you aiming for from a beaker perspective? It doesn't look like this would be used as a PeerSocket. |
If it'll be using the same-origin policy, what will that mean for DNS? If I publish session data on |
@RangerMauve I'm still thinking about the API, but I'll write something up soon. I'm also going to write up a DEP for ephemeral messages that's similar to this, and the overall API design will depend on that too. The domain names will not have an effect in this case. Beaker will resolve the current site to its raw URL, and only allow that site to access the session-data for itself. |
I missed the boat on reviewing this earlier, but a few notes anyways: "Any additional bytes should be truncated by the receiving client": I wouldn't ever truncate messages; this leads to debugging nightmares (see also packet fragmentation and UDP MTU truncation). I'd recommend dropping the whole connection ("fail fast"), or disregarding the entire message (though even the later could also be hard to debug, especially if session messages have variable size). None of these are actually great solutions though... what is the upgrade mechanism to expanding the size or complexity of this field in the future, while being backwards compatible with older clients/agents? I guess overall this feels under-motivated to me, so it's hard to judge whether it delivers on the goal it sets out to achieve. This is probably because I am focused on Dat as a protocol for replicating published content (where I don't think this DEP would have much to provide) and haven't thought as much about real-time and ephemeral use cases as much, so I can't be as helpful. |
It's definitely a concern that a truncated message could be misinterpreted. It might be a better idea to drop the message and suggest the receiving client emit an error event, to potentially react to.
Another extension message, like
Can you be more specific about what you want? This is kind of a frustratingly broad criticism. |
That's totally fair! I think this is a case of not communicating context well over the internet. I think i'm looking for more context around "where does this fit in the big picture". Is this DEP a way to document and formalize what a couple apps are already doing? Seems reasonable. Is it setting out The default way to authenticate hypercore peers as identified users? That seems ambitious and i'd want to do more reading. Edit: I also commented after only reading the DEP itself. Your comment #27 (comment) above and #27 (comment) provide a lot more context. |
Personally, I'm going to be using this for discovering peers in an application. For example, it can be used in a social media setting to discover peer's day URLs to automatically index them. |
New DEP proposal. This hasn't been discussed in the WG yet, but came from discussions in #dat on freenode about discovery. The summary:
The use-case is similar to what @cabal-club has done with the handshake's
peerId
anduserData
. I explain in the DEP why I diverged from that approach, and I hope that cabal will be able to adopt this as well so that cabal can work in Beaker.cc Cabal team @Karissa @cblgh @noffle as well as the dat WG as a whole