-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ICE reconnect flashings #3930
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still reviewing the actual ICE related changes in naf-dialog-adapter but wanted to submit the review comments I already have. Mostly just some minor styling feedback on the events system being used.
Starting to look at naf-dialog-adapter more, at first glance it feels like (even prior to this PR) we are doing a lot of work polling the connection state and trying to reconnect things.. I wonder if we are actually hurting ourselves, since I was under the impression that the ICE agent in the browser should already be handling that for us... I want to read through more of the mediasoup-client library to see if its already doing any sort of reconnect logic that we might be getting in the way of with our own reconnect logic.
I also want to read more about TURN and our coturn implementation in particular to understand why our TURN sessions would be expiring. Surely there is some way this is supposed to be getting refreshed... If we fix that then I don't think we have to worry about special handling for Firefox since updateIceServers should never need to be called, meaning we never have to recreate the transports, which would greatly simplify this PR.
src/naf-dialog-adapter.js
Outdated
@@ -202,36 +224,91 @@ export default class DialogAdapter { | |||
* https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/setConfiguration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know its not from this PR, but kind of surprising that setConfiguration needs to be called here. The ICE servers (assuming this means STUN and TURN servers would not need to be changed in most cases (except as noted in the above comments when our TURN credentials have expired)... In the case we are using a non proxy candidate does this mean we should be able to restart with firefox without having to create new Transports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I understand and what I saw during testing. When using TURN we need to update the ICE servers in case the credentials have expired otherwise we are good with just restarting ICE in both sides. As yous y say, if we are not using proxy candidates we can just restart ICE without updating the servers.
We can also check here if we are we are using a non proxy candidate to avoid an ICE server update as even when forcing TURN we still add a STUN server. I was assuming here that if TURN is forced is because the client is having ICE issues using STUN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even when a candidate selected via STUN fails the STUN server itself should not need to change so I think if we can fix the credentials issue we should not ever need to change the ICE servers (except in the case of a dialog server transfer
353260d
to
53a0462
Compare
@netpro2k I'm happy to pair anytime and go over this and see if we can simplify it. |
6f81c98
to
c39789a
Compare
@netpro2k I've updated this PR to stop handling the disconnected state. |
c39789a
to
6d45372
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty happy with where we landed on this, nice work! I think the refactoring of naf-dialog-adapter is a nice step forward in starting to make it easier to reason about, and I think removing the watchdogs and handing stream recreation should hopefully lead to more reliability. I made some suggestions on future cleanup work we can do but I think most of it can be done in future PRs and we don't need to hold this back. I am still a bit confused on some of the stuff in restartSendICE
and would like to discuss that, but other than that lgtm!
} | ||
|
||
// Resolve initial audio resolver since this person left. | ||
const initialAudioResolver = this._initialAudioConsumerResolvers.get(peerId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not part of this PR but the whole _initialAudioConsumerResolvers
thing is something I want to look into soon. This stuff still feels a bit fishy to be and would like to avoid it if we can. I would bet there are cases where we end up stalling on this for whatever reason. Its unclear why we need to wait for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that covers the case where that peer disconnected before it's audio consumer resolved which is something that we do upon connect to wait until we receive all the other peer's consumers. I agree that can cause some issues. I'm not sure why we do that though, is it really that important that we wait until we can hear all the other peers to connect?
|
||
try { | ||
this._mediasoupDevice = new mediasoupClient.Device({}); | ||
async createSendTransport(iceServers) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't comb through the changes here very closely since I think this is just moving code around, let me know if that is not the case.
iceServers, | ||
iceTransportPolicy: this._iceTransportPolicy | ||
}); | ||
async createRecvTransport(iceServers) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we should do it in this PR but just noting for future cleanup. Seems like we could DRY up creating the send/receive transports. A lot of the code in here is the same for both.
@netpro2k Updated with some of the change requests above. If you are ok, I'll move forward with landing this. |
console.error(`Error getting video stream for ${peerId}`, e); | ||
}); | ||
if (stream) { | ||
videoEl.srcObject = new MediaStream(stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! I discovered this doesn't work unless you also call videoEl.play()
, because if the track dies the video is stopped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this happen in all browsers? Can you provide STRs? I can can't really reproduce it with current master in FF/Chrome.
The stream_update
event in media-view.js and avatar-audio-source.js should only be as a consequence of a receive transport recreation, otherwise both components are not yet created so the stream_update
event is not triggered on them. Maybe we are are missing some case where it's being triggered? I haven't found any but having STRs would really help.
A receive transport recreation should only happen in case where the transport has been closed but signaling is still opened. The transport disconnected/failed states are handled by the WebRTC stack and we should recover without any consumer recreation or transport updates whatsoever (so no stream_update
events).
I couldn't also reproduce the issue mentioned above. How are you triggering it?
This PR adds support for better ICE failures recovery. Now instead of rejoining in case of an ICE failure we:
updateIceServers
This PR adds a couple of methods in the Dialog side for closing a transport and getting the consumers.
This has to be merged at the same time as: Hubs-Foundation/dialog#12