-
Notifications
You must be signed in to change notification settings - Fork 568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ExecutePreprocessor using jupyter_kernel_mgmt APIs #809
base: main
Are you sure you want to change the base?
Conversation
9cf745d
to
e517770
Compare
nbconvert/preprocessors/execute.py
Outdated
except ioloop.TimeoutError: | ||
raise TimeoutError("Cell execution timed out") | ||
except ErrorInKernel as e: | ||
reply = e.reply_msg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API question: should the client object raise an exception if the reply has status: 'error'
? At the moment it does, but based on this and similar code in jupyter_kernel_test
, I'm inclined to change it to return the reply, whatever the status of that reply.
This PR now drops testing for Python < 3.5. I had planned that using the new kernel management code would be the time to require Python 3. Everything actually works with Python 3.4, but it's an annoyance for the tests, because there's a minor change in IPython 7 which affects it, and IPython 7 only installs on Python 3.5 and above. |
I think I'm now happy with this PR itself, if anyone wants to take a look. The larger question that I haven't tried to tackle yet is how to integrate the new APIs with the notebook server itself. |
This is really awesome, thanks @takluyver! I'm really excited about the kernel-mgmt packages. |
self.log.debug("Executing cell:\n%s", cell.source) | ||
exec_reply = self._wait_for_reply(msg_id, cell) | ||
try: | ||
reply = self.kc.execute_interactive( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see execute_interactive working out here
idle_timeout=self.iopub_timeout, | ||
raise_on_no_idle=self.raise_on_iopub_timeout, | ||
) | ||
except ioloop.TimeoutError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem for now, but could this ioloop.TimeoutError -> TimeoutError happen in jupyter_protocol? Does that seem appropriate to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I think it would be appropriate for the BlockingKernelClient to do that translation.
This looks like it could potentially fix #878 and therefore make #886 obsolete, which is great! However, when I'm trying to create the
... and it takes much longer than normally. However, in the end everything seems to be executed correctly. Steps how to reproduce: https://github.com/spatialaudio/nbsphinx/blob/master/CONTRIBUTING.rst UPDATE: I'm getting the same error messages when simply running |
It may fix #878, but it will quite possibly break something else you rely on. ;-) Thanks for the report; I've been doing a bunch of changes recently in connection with making the notebook server use the same new APIs, and I've probably missed something. |
At some point, in connection with this, I'm thinking of making it so that |
This commit should fix the problem you described: takluyver/jupyter_kernel_mgmt@8a74331 |
I'm not really worried about that. The only thing I'm relying on, is that I can use and empty string (which is the default value for the traitlet) for pp = nbconvert.preprocessors.ExecutePreprocessor(kernel_name='') ... and if I specify a non-empty string, it should use that string to start an appropriate kernel.
This sounds quite strange to me, TBH. Why should a Python notebook have a different behavior than, say, a Julia notebook?
That's unfortunate, but I don't think it's reason enough to simply ignore the information stored in the notebook. Probably some fallback mechanism should be implemented for the case that the specified kernel is not found?
Thanks, this indeed fixes the problem! |
Yes and no. I see quite a few people installing Jupyter into an environment, running it from that environment, and then getting confused that it's not seeing the things they installed in that environment. This is inconsistent and based on hidden state: neither the notebook metadata nor your installed kernelspecs are clearly visible. It's a recurring source of confusion. The fact that it's Python matters to a degree, because it gives us an obvious default kernel to use: the one on the same Python interpreter as the parent process. There's no obvious default for other languages. My plan for command-line tools like nbconvert is something like this:
This will undoubtedly be less convenient in some cases, but it's hopefully easier to understand and more predictable. |
But that's no reason to unconditionally ignore them, right? If this "hidden" information is useless, it should be removed. I'm not a I have the feeling that you should at least add one step to your 5-step-program mentioned above:
I still think that taking the fact that |
I do want to remove the precise kernel name from notebook metadata - if I run a notebook with kernel The reason this hasn't happened already is that it's not clear how to select a kernel instead when opening a notebook interactively. Should notebooks be associated with a particular environment, or just with a language? If they're associated with an environment, how does that link work? Or does associating a file with an environment break people's expectations about how they run the same code in different environments? The kernelspecs are useful for telling Jupyter what kernels exist and how to start them. But they're a source of confusion because a notebook can behave in different ways depending on what kernelspecs you have installed. It's a particular issue with the default We designed the kernelspec mechanism with the idea that one kernelspec would typically represent one language, or one major version of a language. So you might have your Python 2 and 3 kernels and an R kernel. We explicitly put off dealing with environments, because it had already been a long and tiring discussion, and we wanted to move forwards. So kernelspecs have ended up being used for environments, and they're not a good fit. That's what I'm now trying to address. |
(Writing that has helped me further my thinking a bit. I'm going to add a post to my relevant notebook PR). |
👍 That sounds good! Removing the kernel name is definitely better than ignoring it! I don't really grok all that notebook metadata, but it always seemed to me that there is a lot of redundant information in there. It sounds good to me to remove some of it. |
It turns out that you were absolutely right! The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@takluyver - I used the PR review to update status relative to async juptyer_kernel_mgmt PR takluyver/jupyter_kernel_mgmt#23 - in particular since a change was necessary to handle notebooks that already contain provider-prefixed kernel names.
We should probably rebase this PR with master before completing the exercise, but this appears to be a good first step.
finally: | ||
for attr in ['nb', 'km', 'kc']: | ||
delattr(self, attr) | ||
kernel_name = 'spec/' + nb.metadata.get('kernelspec', {})[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will produce a duplicated 'spec/'
prefix when the existing notebook file has been persisted using the new kernel providers. I modified this to the following when investigating how async support in jupyter_kernel_mgmt affects existing clients (per this comment).
# Check for unset kernel_name first and attempt to pull from metadata. Only then
# should we check for provider id prefixes, otherwise we could end up with a double prefix.
kernel_name = self.kernel_name
if not kernel_name:
try:
kernel_name = nb.metadata.get('kernelspec', {})['name']
except KeyError:
kernel_name = 'pyimport/kernel'
# Ensure kernel_name is of the new form in case of older metadata
if '/' not in kernel_name:
kernel_name = 'spec/' + kernel_name
kernel_name = 'pyimport/kernel' | ||
|
||
self.log.info("Launching kernel %s to execute notebook" % kernel_name) | ||
conn_info, self.km = kf.launch(kernel_name, cwd=path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To address the async jupyter_kernel_mgmt changes, the only change required to get things working is replacing this line with the following (along with accompanying import asyncio
statement)...
conn_info, self.km = asyncio.get_event_loop().run_until_complete(kf.launch(kernel_name, cwd=path))
However, what is strange (and independent of this change) is that when I run nbconvert via a debugger (pycharm), I get the following exception:
File "/Users/kbates/repos/oss/gateway-experiments/nbconvert/nbconvert/exporters/exporter.py", line 315, in _preprocess
nbc, resc = preprocessor(nbc, resc)
File "/Users/kbates/repos/oss/gateway-experiments/nbconvert/nbconvert/preprocessors/base.py", line 47, in __call__
return self.preprocess(nb, resources)
File "/Users/kbates/repos/oss/gateway-experiments/nbconvert/nbconvert/preprocessors/execute.py", line 318, in preprocess
nb.metadata['language_info'] = info_dict['language_info']
File "/opt/anaconda3/envs/kernel-mgmt-dev/lib/python3.6/contextlib.py", line 92, in __exit__
raise RuntimeError("generator didn't stop")
Since this occurs in either case, I figured its a side-effect of having a yield
in a contextmanager - but I'm also a relative novice in python - so that may not be what's going on. When run from the command line, also in both cases, everything works.
These changes also ignore the case where the kernel manager instance could be passed as an argument. We should remove that parameter if that's no longer applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this occurs in either case, I figured its a side-effect of having a yield in a contextmanager - but I'm also a relative novice in python - so that may not be what's going on. When run from the command line, also in both cases, everything works.
Well, obviously a yield
in a context manager is correct (yes, I've learned a little), but I suspect something is getting side-affected by the async changes in conjunction with the content manager. If I break things down to not use a context manager, nbconvert --execute
no longer encounters the "generator didn't stop" issue.
I've gone ahead and created a PR against your branch in case you want to take these changes. If not, just close the PR. Thanks
As an FYI the execute preprocessor code has been lifted to https://github.com/jupyter/nbclient and shortly I'll be releasing the first version there and changing these files being touched to reference that library instead. I know this disrupts this PR severely in needing to move the PR to that repo instead, but that change for the 6.0 release finally got some tracking and this PR has been open for a very long time. On a side note any extra input on the code sitting in nbclient before we release would be helpful. For now it's a faithful clone of functionality with a few minor interface changes from what's in master right now. |
Thanks for the FYI @MSeal - that's good to know and makes sense. When/if nbclient moves to the new framework, it will need these changes. |
What is the status of this PR, is somebody working on it? |
Not at the moment, but there is an enhancement proposal under discussion (jupyter/enhancement-proposals#45 ) which would include adopting jupyter_kernel_mgmt as an official Jupyter package. That's probably necessary before any serious Jupyter infrastructure relies on it. |
Thanks, looks like the JEP is moving forward. |
This ports the execute preprocessor to my experimental jupyter_kernel_mgmt and jupyter_protocol APIs. I did this to start giving those APIs some 'real world' use.