-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Hubs connectivity diagnostics #4133
Conversation
Hubs Cloud 5-26-2020 Release
…ud-06122020 Feature/hubs cloud 06122020
Hubs Cloud Update 2020-07-21 / 22
Hubs Cloud Update 2020-09-08
…point Hubs Cloud Deploy 2020-10-13
(cherry picked from d9e27b3)
…public-api-internal Mark the public_api_access flag as internal
…banner [QA Stage] Admin UI update banner
Hubs Cloud QA Update 2021-01-07
Hubs Cloud Update 2021-02-02
April 2021 Hubs Cloud Update
April 2021 Hubs Cloud Update
April Hubs Cloud Update (followup w/ custom UI themes)
April Hubs Cloud Release [Themes Update]
… necessarily, but good enough to test.
On reflection it might be worth considering an alternative approach: a similar dialog, but on the actual Hubs room page that would track the loading and error events during the live room connection process. This could be displayed conditionally on the debug parameter, which could be included in any error page reload instructions in a similar way to the "Try using TCP" suggestion. One advantage would be using the actual load process rather than an analogous test harness. Another advantage would be that other load events (and errors) could be tracked such as scene model loading, which is also a common point of failure and currently completely opaque to the user. The number of events fired might need to be increased and some mechanism (possibly event-based) would be required to provide rich error messages, but all of this would be nicely abstracted away from the diagnostic reporting implementation itself. I guess it's more like a diagnostic logging window, similar to the browser console logs, but more accessible to end users and customizable with remedial suggestions. |
Added try-catch based on user testing
A customer with connection problems reported the following result: The "Join Room" test had failed and I had to request the console log, which looked like this:
I've added some exception handling to hopefully capture this type of error more cleanly in future. It appears that the second WebSocket connection failed. The first one went through fine and the most obvious difference is that the second uses port 80. It is possibly the same as this issue, which was caused by transferring secure traffic (WSS in this case) over a non-standard port. |
"transferring secure traffic (WSS in this case) over a non-standard port." is indeed problematic, more specifically I think using a non standard port is fine but rather using a standard port for non-standard traffic is not. |
Thanks a lot @rawnsley this is great work. I'm going to look into it and see what's the best way of bringing this into the main branch. I agree that both a inside room and also an outside room diagnostic can be useful and we should also incorporate some ways to flag possible issues while "in-call" but this is a great start point. |
April 2021 Hubs Cloud Release
I'm withdrawing this PR as the code-base has drifted a bit and I'm using this approach more in production now anyway. |
This is a speculative suggestion for improving the support process when users have connectivity issues. The goal is as follows:
I'm aware of the WebRTC debugging mode. It is very comprehensive, but it is really targeted at skilled users who are sitting at the computer being tested. The average user will not be able to gather actionable information or communicate results reliably to the Hubs support staff.
This implementation tries to balance two competing factors: minimal impact on the Hubs code-base so it doesn't cause any trouble, but maximum use of the actual code used when connecting to Hubs. To this end only one high-risk change was made (the addition of a parameter to the room opening code). The test itself is implemented as a rather monolithic sequence of javascript calls within a simple React component.
The hope is that firstly any failures to connect to Hubs would be reflected in a failure in the test results. Secondly that the cause of the failure is clear either from the test that fails or from the reported error information. This is likely to be an iterative process. Initially it may simply be used as a first-line triage step, but with some investment it could evolve into a more autonomous support tool.
Here is the expected output when there are no connection problems:
I don't really expect this PR to be accepted as is, but hopefully it's enough to illustrate what I think is a very important feature gap and perhaps offer the seed of a solution. Some caveats on the current implementation:
┆Issue is synchronized with this Jira Task