Fix race condition with async kernel management #5875
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
With AsyncMappingKernelManager enabled, a race condition can occur between the shutdown of a kernel and a "current" fetch of active kernels or even a fetch of that kernel. This is because the kernel shutdown method removes the
kernel_id
key from the dictionary of_kernel_connections
prior to awaiting the call to the superclass shutdown method. Since some shutdowns can take some time (esp with remote kernels), there is a window where the front-end (esp Jupyter Lab) is polling the list of active kernels, then polling each active kernel (every 10 seconds or so) - which can result in aKeyError
when accessing the_kernel_connections
dictionary for the kernel associated with the awaited shutdown.This change moves the removal of the
kernel_id
key from_kernel_connections
to after the superclass shutdown method has been awaited, eliminating this race condition. It also continues building the list of active kernels should a (now) non-existent kernel exception be encountered - rather than terminating the collection of active kernels.This isn't an issue with the synchronous shutdown method, but I also moved the dictionary pop statement as well for future maintainability.
Here are examples of these occurrences:
This should be ported to jupyter_server once merged.