-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception writing to websocket #516
Comments
@amangarg96 - thanks for the issue - another interesting issue from you. Use of a reverse proxy is something we recommend, so this shouldn't be a problem. In addition, your use of the client machine as the affinity key seems fine as well. I'm curious your EG log is indicating during this period. Perhaps there's something indicated there, or, for that matter, the kernel-specific logs maintained in YARN. Please check those for any clues. Also, have you tried a forced Regarding the separation of duties relative to the HTTP and WS, the HTTP requests essentially invoke the various manager classes to either get, start, interrupt, etc. data relative to a specific (or all) instance(s). These requests do not (necessarily) go directly to the kernel process itself. (Of course, things like interrupt or restart will implicitly trigger interaction with the kernel process.) . The WS request handler communicates directly with the kernel. The One area that will make things a little difficult, should you find you need to make changes, is that the EG doesn't define any handlers - all are inherited from the Kernel Gateway and Notebook projects, so this may open a can of worms for you. That said, it would be fine to define a subclass in EG that derives from the class you need to change, assuming that a change of that magnitude is warranted and can be done in a relatively clean way. |
I reproduced the above error and checked the kernel-specific logs in YARN. The stdout and stderr look fine to me, and I'm putting it here for your reference stdout:
stderr:
|
I hadn't tried the I got the "connecting to websocket" log in Notebook server too
It's surprising why the This seems to have solved my use-case, is there anything else we should troubleshoot? Update: The 'Reconnect to kernel' option is available in the 'Command' palette of JupyterLab |
Thanks for that update. I was just about to post a question on the jupyterlab gitter forum. They sure don't make that easy to find! Are you satisfied with this behavior? We'll likely revisit this area when we go to implement a robust HA solution. |
When the Notebook server tries to poll the state of the kernel (through REST calls), it should ideally attempt the 'reconnect' to the kernel (websockets) too. |
Sounds like a good suggestion/contribution to the Notebook server. 😃 Since this is sounding more like a client-side issue, I'm inclined to close this issue for now. Should any activity occur in Notebook/Lab, we can post a reference here. Are you okay with closure? |
Yes! I'm happy with the resolution. Thanks for the help :) |
I am using Jupyter Enterprise Gateway in YARN Cluster mode, with a slight modification.
An NGINX proxy is being used over multiple Jupyter Enterprise Gateway servers, which sends users to different gateway servers by hashing over the hostname of the machine. (This was done as a quick-fix for #86)
When the kernel loses connection with the Gateway server, it tries to reconnect with the previous kernel. The Notebook server logs are as follows
The Notebook server seems to be able to contact with the Kernel through REST calls, but it's not able to connect to the websockets.
On the notebook, it shows that kernel is active (in idle state), but it doesn't execute the cell.
The following step is what is missing from the reconnection attempt, right?
Does it have something to do with the NGINX proxy?
Also, what all messages are sent through Websockets? Is it possible to switch notebooks and enterprise to just HTTP (Since the REST calls are working just fine)? If so, what all functionality of notebooks would be affected?
Pointers to the Documentation would also help.
The text was updated successfully, but these errors were encountered: