This repository has been archived by the owner on Sep 3, 2022. It is now read-only.
Fix a bug in the port selection for the nested Jupyter servers. #886
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, Datalab was picking the next port that it wanted Jupyter
to run on, passing that to the launched Jupyter process using the
'--port' flag, and then assuming that the server did in fact start
listening on that port.
However, if the port selected was already in use, then Jupyter would
automatically try the next port in sequence. That could cause an issue
where the one Jupyter server (for one user) was listening on a port
that we thought was being used by the Jupyter server for a different
user.
This change fixes that bug by doing the following:
Jupyter server listening on it.
the one specified was not available (so it will instead die in
that situation).
The combination of these two steps effectively moves the retry
logic for picking a port out of the Jupyter process and in to the
wrapping Node server.
There are still two failure cases that can manifest even with this
change.
The first is that if the attempt to start a Jupyter server
fails more than 10 times, then the retry logic will give up and
an internal error will be reported to the user. This failure mode
always existed, but now we will be properly tracking it.
The second is a race condition where another process grabs the
requested port between the time when we verify that it is free
and the time that the launched Jupyter server starts up. If that
happens, then the Jupyter server will kill itself, and it will
be removed from the users->Jupyter servers map. However, the
request that caused us to spin up a Jupyter server to begin
with will be forwarded to the process that took hold of the
assigned port.
This failure case should be difficult enough to trigger that it
will not occur in practice.
We believe this will fix the issue reported in #884