Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2354: Device to device streaming file transfers #2354

Open
wants to merge 7 commits into
base: old_master
Choose a base branch
from

Conversation

mvgorcum
Copy link

@mvgorcum mvgorcum commented Nov 14, 2019

Related to matrix-org/matrix-spec#432 and should fix matrix-org/matrix-spec#189

Rendered

Signed-off-by: Mathijs van Gorcum [email protected]

@turt2live turt2live changed the title Create 2354-streaming-file-transfer.md MSC2354: Device to device streaming file transfers Nov 14, 2019
@turt2live turt2live added the proposal A matrix spec change proposal label Nov 14, 2019
@turt2live turt2live self-requested a review November 14, 2019 22:10
@ara4n
Copy link
Member

ara4n commented Nov 14, 2019

oh, cool!

Copy link
Member

@uhoreg uhoreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only did a quick skim, and have some suggestions/comments.

Thanks for getting the ball rolling on this!

proposals/2354-streaming-file-transfer.md Outdated Show resolved Hide resolved
proposals/2354-streaming-file-transfer.md Outdated Show resolved Hide resolved
proposals/2354-streaming-file-transfer.md Show resolved Hide resolved
proposals/2354-streaming-file-transfer.md Show resolved Hide resolved
Copy link
Member

@turt2live turt2live left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This certainly seems more reasonable than abusingto_device messages with base64 encoded blobs :D

@eras
Copy link

eras commented Jan 3, 2020

Couple ideas:

  • The proposal seems to not allow transferring files of indeterminate size, which would allow ie. piping output of tar directly to another device (use case: "backup restore bot" for a company), or (ab)using it for video streaming not related to video conferencing per se. This would be easily(?) addressed by making the "size" field optional and perhaps then the actual data should be framed to include the information when it has finished.
  • This functionality could be extended to provide random access to the file being shared, enabling ie. video streaming of already existing files that are not created directly for streaming purposes (ie. MPEG4 files can have their metadata in the end, making the whole file unusable until it has been received in full, but random access enabled playing such files). This may be out of scope, though.

@eras
Copy link

eras commented Jan 3, 2020

Actually random access or at least resume-from-offset -functionality could be essential for transferring large files in spotty network conditions or while roaming (though I don't know if WebRTC already has some built-in mechanism to overcome those issues).

@danielniccoli
Copy link

Wouldn't this also fix matrix-org/matrix-spec#37?

"content": {
"transfer_id": "12345",
"lifetime": 60000,
"filename": "example.doc",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a sha256sum field here, to nail down the exact content to be transferred.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea but think that it should be optional. I can imagine cases where the file is generated on the fly, for example by bots. Alternatively in embedded cases the resource usage may be undesirable. Even end-user-cases such as streaming a large video from network storage should not need to be blocked by downloading the first time to compute a hash and maybe requiring re-downloading some or all of the file to send it.

I also think it doesn't add much either since you are getting the offer on the same channel and the offer + WebRTC provide integrity. But having an additional checksum doesn't hurt.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checksum checking should definitely be part of the transfer process, however it makes more sense to send any hashes after the transfer is complete.

@turt2live turt2live added the kind:feature MSC for not-core and not-maintenance stuff label Apr 20, 2020
@ara4n ara4n mentioned this pull request Mar 10, 2021
the URL. By necessity this has a maximum filesize and also results in a more or less permanent availability
of said file. For larger files, and/or files that should only be sent from point-to-point, it may be desirable
to be able to send from device to device. As matrix already implements
[WebRTC signalling for voip](https://matrix.org/docs/spec/client_server/r0.6.0#voice-over-ip), this

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[WebRTC signalling for voip](https://matrix.org/docs/spec/client_server/r0.6.0#voice-over-ip), this
[WebRTC signalling for VoIP](https://matrix.org/docs/spec/client_server/r0.6.0#voice-over-ip), this

[WebRTC signalling for voip](https://matrix.org/docs/spec/client_server/r0.6.0#voice-over-ip), this
functionality can be replicated for streaming file transfers from device to device. It can even be possible
to send files not between devices of two matrix users, but two matrix devices owned by the same user. For
this we can use the webrtc datachannel.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this we can use the webrtc datachannel.
this we can use the WebRTC datachannel.

"content": {
"transfer_id": "12345",
"lifetime": 60000,
"filename": "example.doc",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea but think that it should be optional. I can imagine cases where the file is generated on the fly, for example by bots. Alternatively in embedded cases the resource usage may be undesirable. Even end-user-cases such as streaming a large video from network storage should not need to be blocked by downloading the first time to compute a hash and maybe requiring re-downloading some or all of the file to send it.

I also think it doesn't add much either since you are getting the offer on the same channel and the offer + WebRTC provide integrity. But having an additional checksum doesn't hurt.

"filename": "example.doc",
"info": {
"mimetype": "application/msword",
"size": 46144

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a similar note, is size required? What if the transfer should be dynamically generated?

It's tempting to want to use this for device to device transfer in public rooms,
but an evil homeserver could hijack the webrtc session by pretending to be a device
for the intended recipient (if that user has an account on the evil server). As such this
should be limited to private rooms.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should public/private be replaced with unencrypted/e2ee rooms? IIUC the main concern is that without e2ee the homeservers can send/accept transfers on behalf of their users.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm unclear on the risks associated here, but would the perceived risks be mitigated by offloading the actual file transfer to a webtorrent tracker endpoint, similarly to how TURN is required to negotiate calling?

This file transfer will only send the file once, from one device to another.

While this proposal focusses on streaming file transfer,
a webrtc datachannel could be used for any generic data transfer,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
a webrtc datachannel could be used for any generic data transfer,
a WebRTC datachannel could be used for any generic data transfer,

"origin_server_ts": 1432735824653,
"room_id": "!jEsUZKDJdhlrceRyVU:example.org",
"sender": "@example:example.org",
"type": "m.d2dfile.invite",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'm not a huge fan of d2dfile. d2d isn't self-documenting, you would need to look up to find out what it means. How about file_transfer. I think "transfer" is a good name to emphasize that this is ephemeral and live. Other options are live_file, direct_file, live_file_transfer or direct_file_transfer.

@turt2live turt2live added the needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. label Jun 8, 2021
which may be unwanted for file transfers.

Both sender and reciever need to be online simultaneously for this mode of file transfer
to work, which is unexpected in the context of the existing matrix file transfer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use the homeserver as intermediate storage? This would require less resources on the HS side than traditional file sending, while drastically relaxing any time constraints.

@piegamesde
Copy link

Generally I like the proposal, and I would prefer to keep it specific to file transfer only (as opposed to other potential usages of such a connection).

One thing that I am missing is more details about how communication over the WebRTC channel works exactly.

@eras
Copy link

eras commented Oct 3, 2021

Isn't https://spec.matrix.org/unstable/client-server-api/#send-to-device-messaging a perfect match for this application? The document doesn't seem to touch that at all. I don't think the full handshake should be within the room; why should other's even be aware if one requests a file from an invite?

Even if the initial invite needs to be in the room (and even parts of that could be preserved for direct to_device -messages; parts of communication could be in the webrtc session itself), I don't think the actual requests should be.

@williamkray
Copy link

I left an inline content related to this, but would it make more sense to offload this as an external feature (i.e. matrix itself supports external storage resources like SharePoint, Box, Dropbox, GDrive, WebDAV, and most importantly in this case webtorrent and returns links to those endpoints as appropriate), rather than rolling this feature into the matrix protocol?

In this way it could be configurable on a per server basis what storage endpoints to use as fallback file share mechanisms for files over a given size, and for p2p that fallback would be webtorrent. Fallbacks would require valid endpoints to communicate with and link to (in the case of p2p it would depend on a webtorrent tracker service endpoint) , so a corporate server could ensure files were all shared via governed systems that might be used outside of matrix.

This is not dissimilar to needing to depend on external TURN/STUN for connecting calls. The matrix spec does not try to reimplement TURN, and so p2p file transfer over webRTC is a "solved problem" with with webtorrent.

@foresto
Copy link

foresto commented Aug 16, 2022

would it make more sense to offload this as an external feature

Speaking as a non-maintainer, for what it's worth:

There is a widespread need for easy peer-to-peer file transfer. Existing tools (e.g. Magic Wormhole) can sometimes approximate this, but they tend to be too difficult for average people to use, or too fragile for large files, or too slow even when the network path between the peers is fast.

Having direct transfers standardized in Matrix clients, without requiring a server admin to opt in or carry the data (except for cases requiring TURN), could elegantly solve the this problem for a great many people. From what I understand of WebRTC data channels, they seem perfect for this job.

external storage resources like SharePoint, Box, Dropbox, GDrive, WebDAV

Relying on external services like that would:

  • Lock the functionality behind those services' (often onerous) terms and conditions.
  • Require making an additional party aware of the transfer.
  • Create an additional point from which to mount a crypto attack on the transferred data.
  • Create an additional point of failure.
  • Not provide device-to-device transfers at all. (Those services are store-and-forward systems).
  • Fail to meet the need for something easy.

webtorrent

Wouldn't that present an obstacle to non-web-browser Matrix clients, since WebTorrent has few implementations compared to WebRTC? Also, does it work for large files? In my attempt to transfer a multi-GiB file using FilePizza, it seemed to be trying to pull the entire file into RAM, and failed miserably, though I don't know if that was the protocol's fault or the web app's fault.

People need easy, direct, private file transfers, not another service or admin acting as gatekeeper. Matrix has a rare opportunity here to orchestrate that, delivering exactly what people need. I daresay it would be a compelling reason to adopt Matrix. Perhaps even a "killer feature".

@williamkray
Copy link

williamkray commented Aug 17, 2022

@foresto there's a lot to unpack in your comment, and i think a better place to address those issues is in a subsequent MSC where i can express my thoughts more thoroughly. i'll be sure to reference this MSC and your comments in my submission 👍

EDIT: on second thought, as i process these ideas more, they end up being unrelated to the matrix spec, and are purely media handling implementation details. i don't think i'll be generating that msc after all.

@piegamesde
Copy link

In my attempt to transfer a multi-GiB file using FilePizza, it seemed to be trying to pull the entire file into RAM, and failed miserably, though I don't know if that was the protocol's fault or the web app's fault.

@foresto This is very likely the browser's fault, and not going to improve very soon. As far as I understand it, only chrom* currently provides the API to stream a file to disk, and Firefox does not implement this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spec p2p file transfer via webrtc datachannel (SPEC-430)