Fix ICE reconnect flashings #3930

keianhzo · 2021-02-18T19:10:38Z

This PR adds support for better ICE failures recovery. Now instead of rejoining in case of an ICE failure we:

Restart ICE for browser that support it
Recreate the transports and recreate producers/consumers for browser that don't support ICE updateIceServers

This PR adds a couple of methods in the Dialog side for closing a transport and getting the consumers.

This has to be merged at the same time as: Hubs-Foundation/dialog#12

netpro2k

I am still reviewing the actual ICE related changes in naf-dialog-adapter but wanted to submit the review comments I already have. Mostly just some minor styling feedback on the events system being used.

Starting to look at naf-dialog-adapter more, at first glance it feels like (even prior to this PR) we are doing a lot of work polling the connection state and trying to reconnect things.. I wonder if we are actually hurting ourselves, since I was under the impression that the ICE agent in the browser should already be handling that for us... I want to read through more of the mediasoup-client library to see if its already doing any sort of reconnect logic that we might be getting in the way of with our own reconnect logic.

I also want to read more about TURN and our coturn implementation in particular to understand why our TURN sessions would be expiring. Surely there is some way this is supposed to be getting refreshed... If we fix that then I don't think we have to worry about special handling for Firefox since updateIceServers should never need to be called, meaning we never have to recreate the transports, which would greatly simplify this PR.

src/components/media-views.js

src/components/audio-feedback.js

src/naf-dialog-adapter.js

src/components/avatar-audio-source.js

src/naf-dialog-adapter.js

netpro2k · 2021-02-23T01:32:46Z

src/naf-dialog-adapter.js

@@ -202,36 +224,91 @@ export default class DialogAdapter {
   * https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/setConfiguration


I know its not from this PR, but kind of surprising that setConfiguration needs to be called here. The ICE servers (assuming this means STUN and TURN servers would not need to be changed in most cases (except as noted in the above comments when our TURN credentials have expired)... In the case we are using a non proxy candidate does this mean we should be able to restart with firefox without having to create new Transports?

That's what I understand and what I saw during testing. When using TURN we need to update the ICE servers in case the credentials have expired otherwise we are good with just restarting ICE in both sides. As yous y say, if we are not using proxy candidates we can just restart ICE without updating the servers.

We can also check here if we are we are using a non proxy candidate to avoid an ICE server update as even when forcing TURN we still add a STUN server. I was assuming here that if TURN is forced is because the client is having ICE issues using STUN.

Even when a candidate selected via STUN fails the STUN server itself should not need to change so I think if we can fix the credentials issue we should not ever need to change the ICE servers (except in the case of a dialog server transfer

src/components/avatar-audio-source.js

keianhzo · 2021-02-23T14:22:46Z

@netpro2k I'm happy to pair anytime and go over this and see if we can simplify it.

keianhzo · 2021-02-25T13:30:15Z

@netpro2k I've updated this PR to stop handling the disconnected state.

netpro2k

Pretty happy with where we landed on this, nice work! I think the refactoring of naf-dialog-adapter is a nice step forward in starting to make it easier to reason about, and I think removing the watchdogs and handing stream recreation should hopefully lead to more reliability. I made some suggestions on future cleanup work we can do but I think most of it can be done in future PRs and we don't need to hold this back. I am still a bit confused on some of the stuff in restartSendICE and would like to discuss that, but other than that lgtm!

src/components/avatar-audio-source.js

src/components/media-views.js

src/naf-dialog-adapter.js

netpro2k · 2021-03-03T01:32:53Z

src/naf-dialog-adapter.js

+    }
+
+    // Resolve initial audio resolver since this person left.
+    const initialAudioResolver = this._initialAudioConsumerResolvers.get(peerId);


Not part of this PR but the whole _initialAudioConsumerResolvers thing is something I want to look into soon. This stuff still feels a bit fishy to be and would like to avoid it if we can. I would bet there are cases where we end up stalling on this for whatever reason. Its unclear why we need to wait for this.

I guess that covers the case where that peer disconnected before it's audio consumer resolved which is something that we do upon connect to wait until we receive all the other peer's consumers. I agree that can cause some issues. I'm not sure why we do that though, is it really that important that we wait until we can hear all the other peers to connect?

netpro2k · 2021-03-03T01:36:03Z

src/naf-dialog-adapter.js

-
-    try {
-      this._mediasoupDevice = new mediasoupClient.Device({});
+  async createSendTransport(iceServers) {


Didn't comb through the changes here very closely since I think this is just moving code around, let me know if that is not the case.

netpro2k · 2021-03-03T01:39:18Z

src/naf-dialog-adapter.js

-        iceServers,
-        iceTransportPolicy: this._iceTransportPolicy
-      });
+  async createRecvTransport(iceServers) {


Don't think we should do it in this PR but just noting for future cleanup. Seems like we could DRY up creating the send/receive transports. A lot of the code in here is the same for both.

keianhzo · 2021-03-03T13:24:17Z

@netpro2k Updated with some of the change requests above. If you are ok, I'll move forward with landing this.

gfodor · 2021-04-09T06:06:44Z

src/components/media-views.js

+              console.error(`Error getting video stream for ${peerId}`, e);
+            });
+            if (stream) {
+              videoEl.srcObject = new MediaStream(stream);


Hi! I discovered this doesn't work unless you also call videoEl.play(), because if the track dies the video is stopped.

Does this happen in all browsers? Can you provide STRs? I can can't really reproduce it with current master in FF/Chrome.

The stream_update event in media-view.js and avatar-audio-source.js should only be as a consequence of a receive transport recreation, otherwise both components are not yet created so the stream_update event is not triggered on them. Maybe we are are missing some case where it's being triggered? I haven't found any but having STRs would really help.

A receive transport recreation should only happen in case where the transport has been closed but signaling is still opened. The transport disconnected/failed states are handled by the WebRTC stack and we should recover without any consumer recreation or transport updates whatsoever (so no stream_update events).

I couldn't also reproduce the issue mentioned above. How are you triggering it?

keianhzo requested a review from netpro2k February 18, 2021 19:12

This was referenced Feb 18, 2021

Support for getting peers and closing transports Hubs-Foundation/dialog#12

Merged

RTC retry improvements #3692

Merged

keianhzo self-assigned this Feb 19, 2021

netpro2k reviewed Feb 23, 2021

View reviewed changes

keianhzo added 3 commits February 23, 2021 11:56

Fix ICE reconnect flashings

590d32a

Move NAF adapter events to EventEmitter

ab1dd01

Replace track when updating audio stream in audio source

53a0462

keianhzo force-pushed the ice-restart-flashing branch from 353260d to 53a0462 Compare February 23, 2021 14:21

keianhzo added 2 commits February 24, 2021 18:25

getPeers -> refreshConsumers

5edbbab

add/remove track doesn't work, using a node graph better

da873fc

keianhzo force-pushed the ice-restart-flashing branch from 6f81c98 to c39789a Compare February 25, 2021 13:27

Do not handle disconnected state

6d45372

keianhzo force-pushed the ice-restart-flashing branch from c39789a to 6d45372 Compare February 25, 2021 13:54

keianhzo requested a review from netpro2k February 25, 2021 22:48

netpro2k reviewed Mar 3, 2021

View reviewed changes

PR review fixes

827f9e2

keianhzo merged commit 70d7630 into master Mar 9, 2021

keianhzo deleted the ice-restart-flashing branch March 9, 2021 12:46

gfodor reviewed Apr 9, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ICE reconnect flashings #3930

Fix ICE reconnect flashings #3930

keianhzo commented Feb 18, 2021 •

edited

Loading

netpro2k left a comment

netpro2k Feb 23, 2021

keianhzo Feb 23, 2021

netpro2k Feb 23, 2021

keianhzo commented Feb 23, 2021

keianhzo commented Feb 25, 2021

netpro2k left a comment

netpro2k Mar 3, 2021

keianhzo Mar 3, 2021

netpro2k Mar 3, 2021

netpro2k Mar 3, 2021

keianhzo commented Mar 3, 2021

gfodor Apr 9, 2021

keianhzo Apr 12, 2021

		@@ -202,36 +224,91 @@ export default class DialogAdapter {
		* https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/setConfiguration

Fix ICE reconnect flashings #3930

Fix ICE reconnect flashings #3930

Conversation

keianhzo commented Feb 18, 2021 • edited Loading

netpro2k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keianhzo commented Feb 23, 2021

keianhzo commented Feb 25, 2021

netpro2k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keianhzo commented Mar 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keianhzo commented Feb 18, 2021 •

edited

Loading