Skip to content
This repository has been archived by the owner on Sep 3, 2022. It is now read-only.

Fix a bug in the port selection for the nested Jupyter servers. #886

Merged

Conversation

ojarjur
Copy link
Contributor

@ojarjur ojarjur commented Jun 28, 2016

Previously, Datalab was picking the next port that it wanted Jupyter
to run on, passing that to the launched Jupyter process using the
'--port' flag, and then assuming that the server did in fact start
listening on that port.

However, if the port selected was already in use, then Jupyter would
automatically try the next port in sequence. That could cause an issue
where the one Jupyter server (for one user) was listening on a port
that we thought was being used by the Jupyter server for a different
user.

This change fixes that bug by doing the following:

  1. Verifying that the next port is free before we try to start a
    Jupyter server listening on it.
  2. Telling the Jupyter server to not try any additional ports if
    the one specified was not available (so it will instead die in
    that situation).

The combination of these two steps effectively moves the retry
logic for picking a port out of the Jupyter process and in to the
wrapping Node server.

There are still two failure cases that can manifest even with this
change.

The first is that if the attempt to start a Jupyter server
fails more than 10 times, then the retry logic will give up and
an internal error will be reported to the user. This failure mode
always existed, but now we will be properly tracking it.

The second is a race condition where another process grabs the
requested port between the time when we verify that it is free
and the time that the launched Jupyter server starts up. If that
happens, then the Jupyter server will kill itself, and it will
be removed from the users->Jupyter servers map. However, the
request that caused us to spin up a Jupyter server to begin
with will be forwarded to the process that took hold of the
assigned port.

This failure case should be difficult enough to trigger that it
will not occur in practice.

We believe this will fix the issue reported in #884

@@ -137,7 +137,16 @@ function createJupyterServer(userId: string, resolved: (server: JupyterServer)=>
server.proxy.on('proxyRes', responseHandler);
server.proxy.on('error', errorHandler);

resolved(server);
tcp.waitUntilUsed(server.port).then(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe set a larger timeout value? Default is only 300ms and not sure Jupyter is quick enough to start up especially in a potentially heavy loaded VM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Previously, Datalab was picking the next port that it wanted Jupyter
to run on, passing that to the launched Jupyter process using the
'--port' flag, and then assuming that the server did in fact start
listening on that port.

However, if the port selected was already in use, then Jupyter would
automatically try the next port in sequence. That could cause an issue
where the one Jupyter server (for one user) was listening on a port
that we thought was being used by the Jupyter server for a different
user.

This change fixes that bug by doing the following:

1. Verifying that the next port is free before we try to start a
   Jupyter server listening on it.
2. Telling the Jupyter server to not try any additional ports if
   the one specified was not available (so it will instead die in
   that situation).

The combination of these two steps effectively moves the retry
logic for picking a port out of the Jupyter process and in to the
wrapping Node server.

There are still two failure cases that can manifest even with this
change.

The first is that if the attempt to start a Jupyter server
fails more than 10 times, then the retry logic will give up and
an internal error will be reported to the user. This failure mode
always existed, but now we will be properly tracking it.

The second is a race condition where another process grabs the
requested port between the time when we verify that it is free
and the time that the launched Jupyter server starts up. If that
happens, then the Jupyter server will kill itself, and it will
be removed from the users->Jupyter servers map. However, the
request that caused us to spin up a Jupyter server to begin
with will be forwarded to the process that took hold of the
assigned port.

This failure case should be difficult enough to trigger that it
will not occur in practice.
@ojarjur ojarjur force-pushed the ojarjur/fix-jupyter-ports-for-datalab-managed branch from 73167c5 to 2e00445 Compare June 30, 2016 01:49
@qimingj
Copy link
Contributor

qimingj commented Jun 30, 2016

LGTM.

@ojarjur ojarjur merged commit 1701663 into datalab-managed Jun 30, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants