Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webrtc: add WebRTC transport #1655

Closed
wants to merge 65 commits into from
Closed

Conversation

ckousik
Copy link
Contributor

@ckousik ckousik commented Jul 12, 2022

This PR implements the webrtc transport spec according to libp2p/specs#412 .
The webrtc protocol for multiaddr is implemented in this PR and needs to be implemented in go-multiaddr prior to merging this PR. This PR also uses multibase encoded multihash for DTLS fingerprint verification after the NOISE handshake.

@ckousik
Copy link
Contributor Author

ckousik commented Jul 12, 2022

@marten-seemann @mxinden I've implemented the multibase multihash noise handshake here. For verification, we use the remote certificate of the underlying DTLS transport.

@ckousik ckousik marked this pull request as ready for review July 13, 2022 13:39
@ckousik
Copy link
Contributor Author

ckousik commented Jul 15, 2022

@marten-seemann @mxinden this is ready for review

@marten-seemann marten-seemann self-requested a review July 15, 2022 15:17
Copy link
Contributor

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of good stuff here. Thank you!
I added a large number of comments, please don't be shocked by that ;)

One thing I don't really understand yet is how the opening / accepting of data channels works, and what it means for them to be detached. Maybe you could explain that to me here, that would make the next round of reviews easier.

p2p/transport/webrtc/transport.go Outdated Show resolved Hide resolved
p2p/transport/webrtc/transport.go Show resolved Hide resolved
p2p/transport/webrtc/transport.go Outdated Show resolved Hide resolved
p2p/transport/webrtc/transport.go Outdated Show resolved Hide resolved
p2p/transport/webrtc/transport.go Outdated Show resolved Hide resolved
p2p/transport/webrtc/listener.go Outdated Show resolved Hide resolved
p2p/transport/webrtc/listener.go Outdated Show resolved Hide resolved
p2p/transport/webrtc/listener.go Outdated Show resolved Hide resolved
p2p/transport/webrtc/listener.go Outdated Show resolved Hide resolved
p2p/transport/webrtc/listener.go Outdated Show resolved Hide resolved
@BigLep
Copy link
Contributor

BigLep commented Aug 5, 2022

2022-08-05 triage conversation: with the latest spec changes, #1663 is a prereq for this.

@ckousik
Copy link
Contributor Author

ckousik commented Aug 5, 2022

@BigLep That PR is approved and awaiting merging.

@mxinden
Copy link
Member

mxinden commented Sep 30, 2022

@marten-seemann or @MarcoPolo do you have capacity to give this another review? If not, I will take a deeper look myself, though I think your review is way more valuable than mine.

Copy link
Contributor

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, but there’s already a lot to do here.

The biggest problem is that the datachannel doesn’t implement any backpressure: You’re reading every message that’s sent, and appending it to a buffer. If the application is reading less quickly than the sender is sending us messages, all those messages will accumulate in the buffer, eventually leading to an OOM panic.
Instead, you need to keep the messages at the WebRTC / SCTP layer, until we’re ready to process them (i.e. until Read is called). Only then can we dequeue the message, so that backpressure can build up.

p2p/transport/webrtc/util.go Outdated Show resolved Hide resolved
if err != nil {
return "", err
}
return multibase.Encode(multibase.Base58BTC, encoded)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for using Base 58 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is url friendly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the spec defines an encoding that we should use, doesn’t it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ckousik what is to be done about this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved. We now use return multibase.Encode(multibase.Base64url, encoded)

p2p/transport/webrtc/util_test.go Show resolved Hide resolved
p2p/transport/webrtc/test_constants.go Outdated Show resolved Hide resolved
p2p/transport/webrtc/sdp.go Outdated Show resolved Hide resolved
p2p/transport/webrtc/datachannel.go Outdated Show resolved Hide resolved
p2p/transport/webrtc/datachannel.go Outdated Show resolved Hide resolved
atomic.StoreUint32(&d.remoteReadClosed, 1)
case pb.Message_RESET:
log.Errorf("remote reset")
d.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think that’s correct. Just because the remote reset the stream, we can still write on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ckousik Why did you resolve this comment?? You haven’t addressed it at all!

Either my point is invalid, then please comment here. Or my point is valid, then it needs to be fixed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ckousik what is to be done about this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code has completely changed since, so a new review in future will need to indicate if this is fixed / fine or not. For now can be ignored I am afraid.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can be resolved as such

p2p/transport/webrtc/datachannel.go Outdated Show resolved Hide resolved
"time"
)

type deadline struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overly complicated, and wasteful. We should only start a timer when there’s actually a Read / Write call running (and stop the timer as soon as that call returns), not just because the deadline was set.

It also seems like we’re not using the deadline correctly (see comments in datachannel.go).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows updating the deadline while the Read/Write call is running.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… as does the other solution, which probably won’t add that many LOC and long-running timers.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ckousik what is to be done about this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code has completely changed since, so if something similar is still an issue it will have to be rechecked in a new review I am afraid.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As such, this comment can be resolved

if addr.IP.To4() == nil {
ipVersion = "IP6"
}
fp := fingerprintToSDP(fingerprint)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fp is empty string if fingerprint is nil, is that ever a valid state?
If not we probably want to return an error here?!?!!

As otherwise you can get subtle hard to debug issues later?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

case multihash.SHA2_512:
return crypto.SHA512, true
default:
return crypto.Hash(0), false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return crypto.Hash(0), false
return 0, false

type gets elided here implicitly

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

remaining := len(d.readBuf)
d.m.Unlock()

if state := d.getState(); remaining == 0 && (state == stateReadClosed || state == stateClosed) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if state := d.getState(); remaining == 0 && (state == stateReadClosed || state == stateClosed) {
if remaining == 0 && !d.getState().allowRead() {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I seem to have missed this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

return string(buf[:n])
}

func replaceAll(s string, b byte) string {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to name this removeAll, or so,
as it doesn't replace it but rather removes them from the string

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

return nil, fmt.Errorf("could not get local peer ID: %w", err)
}
// We use elliptic P-256 since it is widely supported by browsers.
// See: https://github.com/libp2p/specs/pull/412#discussion_r968294244
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the link above does not really tell anything to an outsider? Perhaps link to an actual spec paragraph?!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)


listenerMultiaddr = listenerMultiaddr.Encapsulate(certMultiaddress)

return newListener(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: socket is not closed in case newListener returns an error

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

// The only requirement here is that the ufrag and password
// must be equal, which will allow the server to determine
// the password using the STUN message.
ufrag := "libp2p+webrtc+v1/" + genUfrag(32)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: given we always use this prefix, we might as well pre-allocate this as part of the 32 byte string,
and already use it in the front

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

settingEngine := webrtc.SettingEngine{}
// suppress pion logs
loggerFactory := pionlogger.NewDefaultLoggerFactory()
loggerFactory.DefaultLogLevel = pionlogger.LogLevelDisabled
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: could it not be useful to have these logs in verbose mode (so opt-in?)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)


remoteMultihash, err := decodeRemoteFingerprint(remoteMultiaddr)
if err != nil {
return pc, nil, fmt.Errorf("could not decode fingerprint: %w", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: more of a remark, but it's not very useful to say stuff like "could not" in an error,
as an error is always some kind of failure. So you might as well shorten it to: "instantiate peerconnection: %w"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

// set the local address from the candidate pair
cp, err := rawHandshakeChannel.Transport().Transport().ICETransport().GetSelectedCandidatePair()
if cp == nil || err != nil {
return pc, nil, fmt.Errorf("ice connection did not have selected candidate pair")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to split up these cases, as now this error gets surpresed, meaning it will also never be logged

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

if err != nil {
return nil, err
}
remoteFp = replaceAll(strings.ToLower(remoteFp), byte(':'))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO[Question]: is the : in the string really needed in the first place, seems a human-readable thing
that for machine processing is making things just less efficient / straightforward?!

I get that this is perhaps a spec thing, but still, find it odd...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed in the fingerprint spec. From the spec https://www.rfc-editor.org/rfc/rfc4572#section-5

A fingerprint is represented in SDP as an attribute (an 'a' line).
It consists of the name of the hash function used, followed by the
hash value itself. The hash value is represented as a sequence of
uppercase hexadecimal bytes, separated by colons. The number of
bytes is defined by the hash function. (This is the syntax used by
openssl and by the browsers' certificate managers. It is different
from the syntax used to represent hash values in, e.g., HTTP digest
authentication [18], which uses unseparated lowercase hexadecimal
bytes. It was felt that consistency with other applications of
fingerprints was more important.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

close(start)
wg.Wait()
require.Equal(t, count, atomic.LoadUint32(&success))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+func TestWebsocketTransport(t *testing.T) {
+       ta, _ := getTransport(t)
+       tb, _ := getTransport(t)
+       ttransport.SubtestTransport(t, ta, tb, fmt.Sprintf("/ip4/%s/udp/0/webrtc", listenerIp), "peerA")
+}

with import ttransport "github.com/libp2p/go-libp2p/p2p/transport/testsuite"
is missing.

Seems like a standard transport test. Granted not all seem to implement it, but might be good to support this as to adhere to their generic transport logic?! Or what is the desire around this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

socket net.PacketConn
unknownUfragCallback func(string, net.Addr)

m sync.Mutex
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: UDP mux is complex enough that you might want to isolate this thread-unsafe code into a separate data structure,
so you can use it as a blackbox thread-safe data structure, this ensures the mutexes will be used safely

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

// UDP addresses of the same IP address family (eg. Server-reflexive addresses
// and peer-reflexive addresses).
func (mux *udpMux) GetConn(ufrag string, addr net.Addr) (net.PacketConn, error) {
a, ok := addr.(*net.UDPAddr)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really ever valid for this addr not to be an UDP address??! as for now that error is silently ignored

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

_ = conn.closeConnection()
delete(mux.ufragMap, key)
for _, addr := range conn.addresses {
// log.Errorf("deleting address : %v %v", ufrag, addr)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out line can be deleted?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

}

func newMuxedConnection(mux *udpMux, ufrag string) *muxedConnection {
ctx, cancel := context.WithCancel(context.Background())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t is owned by a parent?!?!?!?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we just have a reference to the parent mux because we write to the socket. That can be removed. Also, not sure which t you are referring to.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)


// SetDeadline implements net.PacketConn
func (*muxedConnection) SetDeadline(t time.Time) error {
return nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: document why doing nothing is OK

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

}

// SetReadDeadline implements net.PacketConn
func (*muxedConnection) SetReadDeadline(t time.Time) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: document why doing nothing is OK

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)


// SetWriteDeadline implements net.PacketConn
func (*muxedConnection) SetWriteDeadline(t time.Time) error {
return nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: document why doing nothing is OK

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

func (conn *muxedConnection) closeConnection() error {
select {
case <-conn.ctx.Done():
return fmt.Errorf("already closed")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: turn into a private static error (var alreadyClosedErr = errors.New("already closed"))
such that you can replace the line below with "return alreadyClosedErr"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

}

var (
errTooManyPackets = fmt.Errorf("too many packets in queue; dropping")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: use errors.New instead of fmt.Errorf

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

// pop reads a packet from the packetQueue or blocks until
// either a packet becomes available or the buffer is closed.
func (pq *packetQueue) pop(ctx context.Context, buf []byte) (int, net.Addr, error) {
select {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: cannot put my finger on it yet, but this code smells,
it seems like you are mixing logic boundaries here,
making it harder to reason about the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this does not return when the internal context's Done() returns is because there could still be packets in the channel that could be read even if the channel is closed. It should ideally be a priority select with priority for reading from the channel, followed by checking if the context is closed.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)


// push adds a packet to the packetQueue
func (pq *packetQueue) push(buf []byte, addr net.Addr) error {
// we acquire a lock when sending on the channel to prevent
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: is this really needed?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've seen it panic without this. We cannot guarantee sending on the channel and closure occurs in the same goroutine.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

return ""
}
fpDigest := intersperse2(hex.EncodeToString(fp.Digest), ':', 2)
return getSupportedSDPString(fp.Code) + " " + fpDigest
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: below can be a subtle bug as getSupportedSDPString can return
an empty string due to an error, but as we now append it no longer is an empty string,
thus probably lead do an obscure error much later?
Not better to just return an error to begin with?!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)


const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890"

func genUfrag(n int) string {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: not sure you need to make it as generic as you do here.
might as well hardcode 32 here and reuse the objects in a pool?
You would need to benchmark to be sure though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can hardcode it to 32, but given that I return a string to make this usage convenient, I would somehow have to track the lifetime of the string to ensure it is returned to the pool.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f76dfc4 (first commit of new PR #1999)

@GlenDC GlenDC mentioned this pull request Jan 17, 2023
17 tasks
@GlenDC
Copy link

GlenDC commented Jan 17, 2023

This PR can be closed in favour of #1999.

@ckousik
Copy link
Contributor Author

ckousik commented Jan 18, 2023

Closing in favor of #1999

@ckousik ckousik closed this Jan 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants