Fix a bug in the port selection for the nested Jupyter servers. #886

ojarjur · 2016-06-28T23:09:37Z

Previously, Datalab was picking the next port that it wanted Jupyter
to run on, passing that to the launched Jupyter process using the
'--port' flag, and then assuming that the server did in fact start
listening on that port.

However, if the port selected was already in use, then Jupyter would
automatically try the next port in sequence. That could cause an issue
where the one Jupyter server (for one user) was listening on a port
that we thought was being used by the Jupyter server for a different
user.

This change fixes that bug by doing the following:

Verifying that the next port is free before we try to start a
Jupyter server listening on it.
Telling the Jupyter server to not try any additional ports if
the one specified was not available (so it will instead die in
that situation).

The combination of these two steps effectively moves the retry
logic for picking a port out of the Jupyter process and in to the
wrapping Node server.

There are still two failure cases that can manifest even with this
change.

The first is that if the attempt to start a Jupyter server
fails more than 10 times, then the retry logic will give up and
an internal error will be reported to the user. This failure mode
always existed, but now we will be properly tracking it.

The second is a race condition where another process grabs the
requested port between the time when we verify that it is free
and the time that the launched Jupyter server starts up. If that
happens, then the Jupyter server will kill itself, and it will
be removed from the users->Jupyter servers map. However, the
request that caused us to spin up a Jupyter server to begin
with will be forwarded to the process that took hold of the
assigned port.

This failure case should be difficult enough to trigger that it
will not occur in practice.

We believe this will fix the issue reported in #884

qimingj · 2016-06-29T05:08:48Z

sources/web/datalab/jupyter.ts

@@ -137,7 +137,16 @@ function createJupyterServer(userId: string, resolved: (server: JupyterServer)=>
        server.proxy.on('proxyRes', responseHandler);
        server.proxy.on('error', errorHandler);

-        resolved(server);
+        tcp.waitUntilUsed(server.port).then(


Maybe set a larger timeout value? Default is only 300ms and not sure Jupyter is quick enough to start up especially in a potentially heavy loaded VM.

Previously, Datalab was picking the next port that it wanted Jupyter to run on, passing that to the launched Jupyter process using the '--port' flag, and then assuming that the server did in fact start listening on that port. However, if the port selected was already in use, then Jupyter would automatically try the next port in sequence. That could cause an issue where the one Jupyter server (for one user) was listening on a port that we thought was being used by the Jupyter server for a different user. This change fixes that bug by doing the following: 1. Verifying that the next port is free before we try to start a Jupyter server listening on it. 2. Telling the Jupyter server to not try any additional ports if the one specified was not available (so it will instead die in that situation). The combination of these two steps effectively moves the retry logic for picking a port out of the Jupyter process and in to the wrapping Node server. There are still two failure cases that can manifest even with this change. The first is that if the attempt to start a Jupyter server fails more than 10 times, then the retry logic will give up and an internal error will be reported to the user. This failure mode always existed, but now we will be properly tracking it. The second is a race condition where another process grabs the requested port between the time when we verify that it is free and the time that the launched Jupyter server starts up. If that happens, then the Jupyter server will kill itself, and it will be removed from the users->Jupyter servers map. However, the request that caused us to spin up a Jupyter server to begin with will be forwarded to the process that took hold of the assigned port. This failure case should be difficult enough to trigger that it will not occur in practice.

qimingj · 2016-06-30T01:51:27Z

LGTM.

ojarjur added the bug label Jun 28, 2016

ojarjur assigned qimingj Jun 28, 2016

qimingj reviewed Jun 29, 2016
View reviewed changes

ojarjur force-pushed the ojarjur/fix-jupyter-ports-for-datalab-managed branch from 73167c5 to 2e00445 Compare June 30, 2016 01:49

ojarjur merged commit 1701663 into datalab-managed Jun 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a bug in the port selection for the nested Jupyter servers. #886

Fix a bug in the port selection for the nested Jupyter servers. #886

ojarjur commented Jun 28, 2016 •

edited

Loading

qimingj Jun 29, 2016

ojarjur Jun 30, 2016

qimingj commented Jun 30, 2016

Fix a bug in the port selection for the nested Jupyter servers. #886

Fix a bug in the port selection for the nested Jupyter servers. #886

Conversation

ojarjur commented Jun 28, 2016 • edited Loading

qimingj Jun 29, 2016

Choose a reason for hiding this comment

ojarjur Jun 30, 2016

Choose a reason for hiding this comment

qimingj commented Jun 30, 2016

ojarjur commented Jun 28, 2016 •

edited

Loading