Skip to content
This repository has been archived by the owner on Dec 8, 2022. It is now read-only.

Jupyter Lab Stuck on executing code with JEG--NB2KG setup. #39

Closed
IMAM9AIS opened this issue Jul 3, 2019 · 15 comments
Closed

Jupyter Lab Stuck on executing code with JEG--NB2KG setup. #39

IMAM9AIS opened this issue Jul 3, 2019 · 15 comments

Comments

@IMAM9AIS
Copy link

IMAM9AIS commented Jul 3, 2019

Hi,

Brief background:-

We are using jupyter lab setup with Jupyter Enterprise gateway setup and with NB2KG to override the required classes. We had noticed that the kernels spawned were closing the websockets after a timeout and we created a pull request here:- jupyter-server/enterprise_gateway#698 to refrain kernels from closing the websocket connection and it seems to work fine. (At least the logs from servers end reflect this)

Problem:-

We launch a kernel remotely and if there are intermittent client disconnection the websocket connection still remains alive which is good. But in cases were we close the laptop for let's say 45 minutes and then come back and execute any cell, the execution gets stuck with * symbol.

I tried debugging with a lot of conditions, it seems at the end of the day, the websocket message from the client is conveyed properly to nb2kg which tries to send this to KernelGateway using this

def on_message(self, message):

But the somewhere the final call in websocket library is indefinitely stuck (probably there is a closure in internal stream or something) which does not actually complete the write message process for this web socket client.

Solution :-
Any ideas why this could be happening. From my understanding there is no websocket client close event being called because I have monitored the logs multiple times, so we always have ws object alive, but whenever we try to send the message there is obvious failure.

@kevin-bates
Copy link
Member

As a data point, have you tried reproducing this same scenario with a local kernel using only jupyterlab (sans NB2KG/EG)? Please ensure debug logging is enabled. I'm wondering if this is related to the 'buffered message' stuff and the log should reflect that is happening, especially if the kernel is doing work during the laptop's 45 minute closure. Thanks.

@IMAM9AIS
Copy link
Author

IMAM9AIS commented Jul 3, 2019

@kevin-bates I tried reproducing this with local kernel, without NB2KG/EG setup, and things seem to run fine even after laptop sleeping for 40-50 minutes ( I am able to execute cells).
I have been in debug mode since the start of the issue but could not see anything useful except that if cell goes into buzy mode and I try to re run it multiple times, after some tries i see
Exception writing message to websocket: error.

@kevin-bates
Copy link
Member

Thanks for the update. This topic is beyond my knowledge level. Hopefully others can help here.

cc: @rolweber, @esevan - any ideas?

@esevan
Copy link
Contributor

esevan commented Jul 4, 2019

Sorry, I have no experience of this issue.

@IMAM9AIS Can I ask you full error log of Exception writing message to websocket: error.?

@IMAM9AIS
Copy link
Author

IMAM9AIS commented Jul 4, 2019

@esevan this is only message i receive which is supposed to be executed from this part of the code.

self.log.error("Exception writing message to websocket: {}".format(e)) # , exc_info=True)

But this is something which is executed after i try executing the cell multiple times even when it is not executing results.

@kevin-bates
Copy link
Member

@IMAM9AIS - sorry for the lack of help here. Just for grins (and another datapoint), can you try using the embedded nb2kg in Notebook 6.0 (now that its released). Instead of installing the extension, enabling it, and configuring the class overrides, you simply start Notebook with --gateway-url <gateway url>.

I suspect the results will be the same, but there are some changes wrt to handlers (but at the http level, not relative to websockets).

@esevan
Copy link
Contributor

esevan commented Jul 18, 2019

I've checked both nb2kg and notebook/gateway doesn't reconnect websocket to the gateway when the connection between nb2kg and EG is closed.
As a result, browser-notebook connection is alive but notebook-EG connection is closed.
-> Communication between browser and kernel is broken.

Tornado.websocket guide: a message of None indicates that the connection has been closed.

nb2kg/nb2kg/handlers.py

Lines 215 to 216 in ddf6b7c

if message is None:
break

@IMAM9AIS If you see 'connections': 0 log in your case, this is the case.
Something like Kernel retrieved: {'id': 'd61a9037-1211-422f-86a0-ef5ed4b01789', 'name': 'python3', 'last_activity': '2019-07-18T07:08:55.641346Z', 'execution_state': 'starting', 'connections': 0}

@kevin-bates Could you give me a comment about this case? Do you agree with the idea of recovering connection between nb2kg and EG when it's closed?
In my case, this also keeps jupyterlab from reconnecting the session when the session is recovered in EG.

@kevin-bates
Copy link
Member

@esevan - yeah, I agree. Your PR looks promising. Actually glad both embedded and nb2kg behave the same. Let's focus on nb2kg for now since that's what is being used, and we'll port accordingly. Thanks!

@IMAM9AIS - as I mentioned in the PR, it would be great if you could take #42 for a spin prior to merge.

@IMAM9AIS
Copy link
Author

@kevin-bates @esevan have some comments here:-
#42

@IMAM9AIS
Copy link
Author

@esevan @kevin-bates . We tested the PR changes out and everything looks good. We haven't had a connection loss after this.

@esevan
Copy link
Contributor

esevan commented Jul 22, 2019

@IMAM9AIS I'm so glad to hear that! Thank you for issuing this and testing in your environment ;D

@kevin-bates
Copy link
Member

Fantastic news! @esevan - I've merged the PR. Could you please apply the applicable changes to the gateway subsystem in Notebook?

@kevin-bates
Copy link
Member

Closing via #42.

@esevan
Copy link
Contributor

esevan commented Jul 22, 2019

@kevin-bates Sure. I'll upload this patch for jupyter/notebook :D

@esevan
Copy link
Contributor

esevan commented Jul 23, 2019

@kevin-bates jupyter/notebook#4777 Requested :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants