-
Notifications
You must be signed in to change notification settings - Fork 31
Jupyter Lab Stuck on executing code with JEG--NB2KG setup. #39
Comments
As a data point, have you tried reproducing this same scenario with a local kernel using only jupyterlab (sans NB2KG/EG)? Please ensure debug logging is enabled. I'm wondering if this is related to the 'buffered message' stuff and the log should reflect that is happening, especially if the kernel is doing work during the laptop's 45 minute closure. Thanks. |
@kevin-bates I tried reproducing this with local kernel, without NB2KG/EG setup, and things seem to run fine even after laptop sleeping for 40-50 minutes ( I am able to execute cells). |
Sorry, I have no experience of this issue. @IMAM9AIS Can I ask you full error log of |
@esevan this is only message i receive which is supposed to be executed from this part of the code. Line 247 in ddf6b7c
But this is something which is executed after i try executing the cell multiple times even when it is not executing results. |
@IMAM9AIS - sorry for the lack of help here. Just for grins (and another datapoint), can you try using the embedded nb2kg in Notebook 6.0 (now that its released). Instead of installing the extension, enabling it, and configuring the class overrides, you simply start Notebook with I suspect the results will be the same, but there are some changes wrt to handlers (but at the http level, not relative to websockets). |
I've checked both nb2kg and notebook/gateway doesn't reconnect websocket to the gateway when the connection between nb2kg and EG is closed.
Lines 215 to 216 in ddf6b7c
@IMAM9AIS If you see @kevin-bates Could you give me a comment about this case? Do you agree with the idea of recovering connection between nb2kg and EG when it's closed? |
@esevan - yeah, I agree. Your PR looks promising. Actually glad both embedded and nb2kg behave the same. Let's focus on nb2kg for now since that's what is being used, and we'll port accordingly. Thanks! @IMAM9AIS - as I mentioned in the PR, it would be great if you could take #42 for a spin prior to merge. |
@kevin-bates @esevan have some comments here:- |
@esevan @kevin-bates . We tested the PR changes out and everything looks good. We haven't had a connection loss after this. |
@IMAM9AIS I'm so glad to hear that! Thank you for issuing this and testing in your environment ;D |
Fantastic news! @esevan - I've merged the PR. Could you please apply the applicable changes to the gateway subsystem in Notebook? |
Closing via #42. |
@kevin-bates Sure. I'll upload this patch for jupyter/notebook :D |
@kevin-bates jupyter/notebook#4777 Requested :) |
Hi,
Brief background:-
We are using jupyter lab setup with Jupyter Enterprise gateway setup and with NB2KG to override the required classes. We had noticed that the kernels spawned were closing the websockets after a timeout and we created a pull request here:- jupyter-server/enterprise_gateway#698 to refrain kernels from closing the websocket connection and it seems to work fine. (At least the logs from servers end reflect this)
Problem:-
We launch a kernel remotely and if there are intermittent client disconnection the websocket connection still remains alive which is good. But in cases were we close the laptop for let's say 45 minutes and then come back and execute any cell, the execution gets stuck with * symbol.
I tried debugging with a lot of conditions, it seems at the end of the day, the websocket message from the client is conveyed properly to nb2kg which tries to send this to KernelGateway using this
nb2kg/nb2kg/handlers.py
Line 230 in ddf6b7c
But the somewhere the final call in websocket library is indefinitely stuck (probably there is a closure in internal stream or something) which does not actually complete the write message process for this web socket client.
Solution :-
Any ideas why this could be happening. From my understanding there is no websocket client close event being called because I have monitored the logs multiple times, so we always have ws object alive, but whenever we try to send the message there is obvious failure.
The text was updated successfully, but these errors were encountered: