You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran into a strange issue this morning: I upgraded the version of jupyter_client in my jupyter-enterprise-gateway (JEG) deployment to v6.2.0 and then became unable to switch kernels. I could still start as many kernels as I liked normally, but the moment I tried to restart, stop, or change a kernel the gateway instance would crash. This was on a work machine, so unfortunately I can't copy/paste the error message, but essentially the os.killpg line here:
was somehow sending SIGINT to JEG whenever a kernel change was requested, which then "gracefully" shut down due to "keyboard interrupt". Apparently the process group id of JEG itself is somehow getting passed into os.killpg.
For me the fix was to downgrade to jupyter_client v6.1.12, so I'm fairly certain that this is a jupyter_client issue and not a JEG problem. Since jupyter_client v6.2.0 has already been yanked from pypi for an unrelated version conflict issue, the question now is whether the problem is still present in jupyter_client v7.0.0. Though it moved around a bit, the relevant os.killpg command is still present in the latest HEAD:
For right now, this can't be directly tested as the latest version of JEG is pinned jupyter_client<7, since JEG is currently incompatible with the kernel provisioner stuff. So this issue is mostly just something to keep an eye out for in the near future.
Thanks for the heads-up @telamonian - I think I see the issue.
Enterprise Gateway's RemoteKernelManager fully overrides AsyncKernelManager.signal_kernel() which is called from other methods within AsyncKernelManager. This override does not issue killpg. PR #623 did some fairly radical refactoring that nicely eliminated a lot of duplication. These changes introduced an aliasing approach that essentially mapped the class's method name to an internal implementation. Initially, the internal methods were called from other methods and I raised this approach as problematic for subclass implementations and many were addressed and I even added some tests to ensure subclass call sequences because of this. Unfortunately, those tests didn't go far enough.
It looks like there are a few _async internal methods being called from other methods that need to be replaced with their "alias" - one of which is signal_kernel(). The primary culprit you're running into is relative to interrupt_kernel - which shutdown_kernel now (post 6.1.12) calls. As you can see, interrupt_kernel is calling _async_signal_kernel() rather than signal_kernel() - which then skips EG's override.
This will be an issue in 7.0 for subclasses of KernelManager which happen to override these remaining, lower-level, methods. Not so much from the killpg standpoint, but more from the fact that overrides are not getting called. That said, I don't think this will be an issue for Enterprise Gateway once it is compatible with 7.0 because EG will no longer need to override methods within its RemoteKernelManager to that degree (and perhaps at all).
I will spend time, first reproducing this by extending the test I added, then patch the remaining instances of unaliased methods such that the test passes.
I ran into a strange issue this morning: I upgraded the version of jupyter_client in my jupyter-enterprise-gateway (JEG) deployment to v6.2.0 and then became unable to switch kernels. I could still start as many kernels as I liked normally, but the moment I tried to restart, stop, or change a kernel the gateway instance would crash. This was on a work machine, so unfortunately I can't copy/paste the error message, but essentially the
os.killpg
line here:jupyter_client/jupyter_client/manager.py
Lines 659 to 664 in c16c0c9
was somehow sending
SIGINT
to JEG whenever a kernel change was requested, which then "gracefully" shut down due to "keyboard interrupt". Apparently the process group id of JEG itself is somehow getting passed intoos.killpg
.For me the fix was to downgrade to jupyter_client v6.1.12, so I'm fairly certain that this is a jupyter_client issue and not a JEG problem. Since jupyter_client v6.2.0 has already been yanked from pypi for an unrelated version conflict issue, the question now is whether the problem is still present in jupyter_client v7.0.0. Though it moved around a bit, the relevant
os.killpg
command is still present in the latest HEAD:jupyter_client/jupyter_client/provisioning/local_provisioner.py
Lines 82 to 86 in 109e7ac
so it may still cause problems for JEG.
For right now, this can't be directly tested as the latest version of JEG is pinned
jupyter_client<7
, since JEG is currently incompatible with the kernel provisioner stuff. So this issue is mostly just something to keep an eye out for in the near future.pinging @kevin-bates
The text was updated successfully, but these errors were encountered: