Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved Hubs connectivity diagnostics #4133

Closed
wants to merge 27 commits into from

Conversation

rawnsley
Copy link
Contributor

@rawnsley rawnsley commented Apr 9, 2021

This is a speculative suggestion for improving the support process when users have connectivity issues. The goal is as follows:

  1. A test page where users can self-diagnose connection problems.
  2. If self-diagnosis is not possible then a way to gather information for support operatives without being on site.

I'm aware of the WebRTC debugging mode. It is very comprehensive, but it is really targeted at skilled users who are sitting at the computer being tested. The average user will not be able to gather actionable information or communicate results reliably to the Hubs support staff.

This implementation tries to balance two competing factors: minimal impact on the Hubs code-base so it doesn't cause any trouble, but maximum use of the actual code used when connecting to Hubs. To this end only one high-risk change was made (the addition of a parameter to the room opening code). The test itself is implemented as a rather monolithic sequence of javascript calls within a simple React component.

The hope is that firstly any failures to connect to Hubs would be reflected in a failure in the test results. Secondly that the cause of the failure is clear either from the test that fails or from the reported error information. This is likely to be an iterative process. Initially it may simply be used as a first-line triage step, but with some investment it could evolve into a more autonomous support tool.

Here is the expected output when there are no connection problems:

Screenshot 2021-04-09 at 14 39 40

I don't really expect this PR to be accepted as is, but hopefully it's enough to illustrate what I think is a very important feature gap and perhaps offer the seed of a solution. Some caveats on the current implementation:

  • A full solution would need to include suggestions (or links to support pages) in the Notes field about how to fix connectivity (domain whitelists, port unblocking, etc...).
  • The tests don't completely open the audio channels and have no coverage for video. This may not be important for the types of connectivity problems people get, but I'm not certain.
  • It hasn't actually diagnosed anything yet. I'm waiting on initial results from customers with connectivity issues.
  • This is the first React I've ever written so apologies if it's not idiomatic.

┆Issue is synchronized with this Jira Task

gfodor and others added 23 commits May 26, 2020 14:08
…public-api-internal

Mark the public_api_access flag as internal
…banner

[QA Stage] Admin UI update banner
April Hubs Cloud Update (followup w/ custom UI themes)
April Hubs Cloud Release [Themes Update]
@rawnsley
Copy link
Contributor Author

On reflection it might be worth considering an alternative approach: a similar dialog, but on the actual Hubs room page that would track the loading and error events during the live room connection process. This could be displayed conditionally on the debug parameter, which could be included in any error page reload instructions in a similar way to the "Try using TCP" suggestion.

One advantage would be using the actual load process rather than an analogous test harness. Another advantage would be that other load events (and errors) could be tracked such as scene model loading, which is also a common point of failure and currently completely opaque to the user.

The number of events fired might need to be increased and some mechanism (possibly event-based) would be required to provide rich error messages, but all of this would be nicely abstracted away from the diagnostic reporting implementation itself.

I guess it's more like a diagnostic logging window, similar to the browser console logs, but more accessible to end users and customizable with remedial suggestions.

@keianhzo keianhzo self-requested a review April 12, 2021 21:09
@rawnsley
Copy link
Contributor Author

A customer with connection problems reported the following result:

Screenshot 2021-04-13 at 10 37 28

The "Join Room" test had failed and I had to request the console log, which looked like this:

browser.js:21 WebSocket connection to 'wss://vigilant-balrog.dev-hub.link:80/?roomId=xnX49Dp&peerId=undefined' failed: Error in connection establishment: net::ERR_CONNECTION_CLOSED
s @ browser.js:21

I've added some exception handling to hopefully capture this type of error more cleanly in future.

It appears that the second WebSocket connection failed. The first one went through fine and the most obvious difference is that the second uses port 80. It is possibly the same as this issue, which was caused by transferring secure traffic (WSS in this case) over a non-standard port.

@Utopiah
Copy link

Utopiah commented Apr 13, 2021

"transferring secure traffic (WSS in this case) over a non-standard port." is indeed problematic, more specifically I think using a non standard port is fine but rather using a standard port for non-standard traffic is not.

@keianhzo
Copy link
Contributor

Thanks a lot @rawnsley this is great work. I'm going to look into it and see what's the best way of bringing this into the main branch. I agree that both a inside room and also an outside room diagnostic can be useful and we should also incorporate some ways to flag possible issues while "in-call" but this is a great start point.

@rawnsley
Copy link
Contributor Author

rawnsley commented Nov 8, 2021

I'm withdrawing this PR as the code-base has drifted a bit and I'm using this approach more in production now anyway.

@rawnsley rawnsley closed this Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants