Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecutePreprocessor using jupyter_kernel_mgmt APIs #809

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

takluyver
Copy link
Member

@takluyver takluyver commented May 7, 2018

This ports the execute preprocessor to my experimental jupyter_kernel_mgmt and jupyter_protocol APIs. I did this to start giving those APIs some 'real world' use.

except ioloop.TimeoutError:
raise TimeoutError("Cell execution timed out")
except ErrorInKernel as e:
reply = e.reply_msg
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API question: should the client object raise an exception if the reply has status: 'error'? At the moment it does, but based on this and similar code in jupyter_kernel_test, I'm inclined to change it to return the reply, whatever the status of that reply.

@takluyver
Copy link
Member Author

This PR now drops testing for Python < 3.5. I had planned that using the new kernel management code would be the time to require Python 3.

Everything actually works with Python 3.4, but it's an annoyance for the tests, because there's a minor change in IPython 7 which affects it, and IPython 7 only installs on Python 3.5 and above.

@takluyver takluyver added this to the 6.0 milestone Oct 25, 2018
@takluyver takluyver changed the title WIP: ExecutePreprocessor using jupyter_kernel_mgmt APIs ExecutePreprocessor using jupyter_kernel_mgmt APIs Oct 25, 2018
@takluyver
Copy link
Member Author

I think I'm now happy with this PR itself, if anyone wants to take a look.

The larger question that I haven't tried to tackle yet is how to integrate the new APIs with the notebook server itself.

@minrk
Copy link
Member

minrk commented Oct 25, 2018

This is really awesome, thanks @takluyver! I'm really excited about the kernel-mgmt packages.

self.log.debug("Executing cell:\n%s", cell.source)
exec_reply = self._wait_for_reply(msg_id, cell)
try:
reply = self.kc.execute_interactive(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see execute_interactive working out here

idle_timeout=self.iopub_timeout,
raise_on_no_idle=self.raise_on_iopub_timeout,
)
except ioloop.TimeoutError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem for now, but could this ioloop.TimeoutError -> TimeoutError happen in jupyter_protocol? Does that seem appropriate to you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I think it would be appropriate for the BlockingKernelClient to do that translation.

@mgeier
Copy link
Contributor

mgeier commented Nov 7, 2018

This looks like it could potentially fix #878 and therefore make #886 obsolete, which is great!

However, when I'm trying to create the nbsphinx docs (which uses nbconverts ExecutePreprocessor), I'm getting a myriad of messages like this:

ERROR:traitlets:Exception from message handler <function IOLoopKernelClient._execution_future.<locals>.watch_for_idle at 0x7f1d094a88c8>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/jupyter_kernel_mgmt/client.py", line 115, in _call_handlers
    handler(msg, channel)
TypeError: watch_for_idle() takes 1 positional argument but 2 were given
ERROR:traitlets:Exception from message handler <function IOLoopKernelClient._execution_future.<locals>.watch_for_idle at 0x7f1d094a88c8>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/jupyter_kernel_mgmt/client.py", line 115, in _call_handlers
    handler(msg, channel)
TypeError: watch_for_idle() takes 1 positional argument but 2 were given
etc., etc.

... and it takes much longer than normally. However, in the end everything seems to be executed correctly.

Steps how to reproduce: https://github.com/spatialaudio/nbsphinx/blob/master/CONTRIBUTING.rst

UPDATE: I'm getting the same error messages when simply running python3 -m nbconvert --execute on my notebooks, so it doesn't seem to have anything to do with nbsphinx.
Am I missing some up-to-date dependency?

@takluyver
Copy link
Member Author

It may fix #878, but it will quite possibly break something else you rely on. ;-)

Thanks for the report; I've been doing a bunch of changes recently in connection with making the notebook server use the same new APIs, and I've probably missed something.

@takluyver
Copy link
Member Author

At some point, in connection with this, I'm thinking of making it so that nbconvert --execute on a Python notebook defaults to the Python nbconvert is running on; I think not doing that is a common source of confusion. As kernel names proliferate, it's also less useful to store in notebook metadata the name of the last kernel it was run with, because that name may only be meaningful on one system.

@takluyver
Copy link
Member Author

This commit should fix the problem you described: takluyver/jupyter_kernel_mgmt@8a74331

@mgeier
Copy link
Contributor

mgeier commented Nov 9, 2018

It may fix #878, but it will quite possibly break something else you rely on. ;-)

I'm not really worried about that. The only thing I'm relying on, is that I can use and empty string (which is the default value for the traitlet) for kernel_name and it should use the kernel stored in the notebook:

pp = nbconvert.preprocessors.ExecutePreprocessor(kernel_name='')

... and if I specify a non-empty string, it should use that string to start an appropriate kernel.
I didn't think anything could go wrong with those simple assumptions ... until #878 happened!

I'm thinking of making it so that nbconvert --execute on a Python notebook defaults to the Python nbconvert is running on

This sounds quite strange to me, TBH.

Why should a Python notebook have a different behavior than, say, a Julia notebook?
Shouldn't the fact that nbconvert is implemented in Python be an implementation detail?
Obviously Python 2 vs. Python 3 would be a problem, but Python 2 is dying, anyway.
But what happens if the notebook uses PyPy? Or some other Python implementation?

As kernel names proliferate, it's also less useful to store in notebook metadata the name of the last kernel it was run with, because that name may only be meaningful on one system.

That's unfortunate, but I don't think it's reason enough to simply ignore the information stored in the notebook.

Probably some fallback mechanism should be implemented for the case that the specified kernel is not found?
But then there should be at least a warning message. And probably there should even be an additional configuration parameter that enables and disables this behavior? And the solution shouldn't be Python-specific, because the same problem might happen with other languages as well, right?

This commit should fix the problem you described: takluyver/jupyter_kernel_mgmt@8a74331

Thanks, this indeed fixes the problem!

@takluyver
Copy link
Member Author

Shouldn't the fact that nbconvert is implemented in Python be an implementation detail?

Yes and no. I see quite a few people installing Jupyter into an environment, running it from that environment, and then getting confused that it's not seeing the things they installed in that environment. This is inconsistent and based on hidden state: neither the notebook metadata nor your installed kernelspecs are clearly visible. It's a recurring source of confusion.

The fact that it's Python matters to a degree, because it gives us an obvious default kernel to use: the one on the same Python interpreter as the parent process. There's no obvious default for other languages.

My plan for command-line tools like nbconvert is something like this:

  1. If the user explicitly specifies a kernel to run with (e.g. with a command-line parameter), use that.
  2. Otherwise, get the language from the notebook metadata.
  3. If there is exactly one kernel available for that language, run with that.
  4. If the language is Python, use the Python kernel in nbconvert's environment.
  5. If we have multiple kernels available for the language, tell the user what they are and how to explicitly specify one to use.

This will undoubtedly be less convenient in some cases, but it's hopefully easier to understand and more predictable.

@mgeier
Copy link
Contributor

mgeier commented Nov 13, 2018

neither the notebook metadata nor your installed kernelspecs are clearly visible.

But that's no reason to unconditionally ignore them, right?

If this "hidden" information is useless, it should be removed.
If not, it should be taken into account.

I'm not a conda user, so I don't know about those problems and I don't have an idea how to solve them, but blatantly ignoring stuff doesn't seem to be the right solution ...

I have the feeling that you should at least add one step to your 5-step-program mentioned above:

  1. [same as above]
  2. If a kernel is specified in the notebook metadata, check if it exists and if yes, use it
  3. [continue with 2. from above]

I still think that taking the fact that nbconvert is implemented in Python into account is an ugly hack, but if it really is the only way to provide a good experience to a large part of the user base, then I guess it could be justified.

@takluyver
Copy link
Member Author

If this "hidden" information is useless, it should be removed.

I do want to remove the precise kernel name from notebook metadata - if I run a notebook with kernel foo on my computer and send it to you, there's no guarantee that you have a kernel foo or that it is remotely compatible if you do. Kernel names are only meaningful in the context of a particular system/user/environment, whereas notebook metadata should be globally meaningful.

The reason this hasn't happened already is that it's not clear how to select a kernel instead when opening a notebook interactively. Should notebooks be associated with a particular environment, or just with a language? If they're associated with an environment, how does that link work? Or does associating a file with an environment break people's expectations about how they run the same code in different environments?

The kernelspecs are useful for telling Jupyter what kernels exist and how to start them. But they're a source of confusion because a notebook can behave in different ways depending on what kernelspecs you have installed. It's a particular issue with the default python3 (/python2) kernel - we special case it so Jupyter will work without explicitly installing it, but then if you do install it, you've effectively pinned Jupyter to whichever environment you installed it from.

We designed the kernelspec mechanism with the idea that one kernelspec would typically represent one language, or one major version of a language. So you might have your Python 2 and 3 kernels and an R kernel. We explicitly put off dealing with environments, because it had already been a long and tiring discussion, and we wanted to move forwards. So kernelspecs have ended up being used for environments, and they're not a good fit. That's what I'm now trying to address.

@takluyver
Copy link
Member Author

(Writing that has helped me further my thinking a bit. I'm going to add a post to my relevant notebook PR).

@mgeier
Copy link
Contributor

mgeier commented Nov 13, 2018

👍 That sounds good! Removing the kernel name is definitely better than ignoring it!

I don't really grok all that notebook metadata, but it always seemed to me that there is a lot of redundant information in there. It sounds good to me to remove some of it.

@mgeier
Copy link
Contributor

mgeier commented Nov 18, 2018

@takluyver

It may fix #878, but it will quite possibly break something else you rely on. ;-)

It turns out that you were absolutely right!

The extra_arguments argument to nbconvert.preprocessors.ExecutePreprocessor seems to be ignored when using this PR.

Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takluyver - I used the PR review to update status relative to async juptyer_kernel_mgmt PR takluyver/jupyter_kernel_mgmt#23 - in particular since a change was necessary to handle notebooks that already contain provider-prefixed kernel names.

We should probably rebase this PR with master before completing the exercise, but this appears to be a good first step.

finally:
for attr in ['nb', 'km', 'kc']:
delattr(self, attr)
kernel_name = 'spec/' + nb.metadata.get('kernelspec', {})[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will produce a duplicated 'spec/' prefix when the existing notebook file has been persisted using the new kernel providers. I modified this to the following when investigating how async support in jupyter_kernel_mgmt affects existing clients (per this comment).

        # Check for unset kernel_name first and attempt to pull from metadata.  Only then
        # should we check for provider id prefixes, otherwise we could end up with a double prefix.
        kernel_name = self.kernel_name
        if not kernel_name:
            try:
                kernel_name = nb.metadata.get('kernelspec', {})['name']
            except KeyError:
                kernel_name = 'pyimport/kernel'

        # Ensure kernel_name is of the new form in case of older metadata
        if '/' not in kernel_name:
            kernel_name = 'spec/' + kernel_name

kernel_name = 'pyimport/kernel'

self.log.info("Launching kernel %s to execute notebook" % kernel_name)
conn_info, self.km = kf.launch(kernel_name, cwd=path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address the async jupyter_kernel_mgmt changes, the only change required to get things working is replacing this line with the following (along with accompanying import asyncio statement)...

conn_info, self.km = asyncio.get_event_loop().run_until_complete(kf.launch(kernel_name, cwd=path))

However, what is strange (and independent of this change) is that when I run nbconvert via a debugger (pycharm), I get the following exception:

  File "/Users/kbates/repos/oss/gateway-experiments/nbconvert/nbconvert/exporters/exporter.py", line 315, in _preprocess
    nbc, resc = preprocessor(nbc, resc)
  File "/Users/kbates/repos/oss/gateway-experiments/nbconvert/nbconvert/preprocessors/base.py", line 47, in __call__
    return self.preprocess(nb, resources)
  File "/Users/kbates/repos/oss/gateway-experiments/nbconvert/nbconvert/preprocessors/execute.py", line 318, in preprocess
    nb.metadata['language_info'] = info_dict['language_info']
  File "/opt/anaconda3/envs/kernel-mgmt-dev/lib/python3.6/contextlib.py", line 92, in __exit__
    raise RuntimeError("generator didn't stop")

Since this occurs in either case, I figured its a side-effect of having a yield in a contextmanager - but I'm also a relative novice in python - so that may not be what's going on. When run from the command line, also in both cases, everything works.

These changes also ignore the case where the kernel manager instance could be passed as an argument. We should remove that parameter if that's no longer applicable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takluyver -

Since this occurs in either case, I figured its a side-effect of having a yield in a contextmanager - but I'm also a relative novice in python - so that may not be what's going on. When run from the command line, also in both cases, everything works.

Well, obviously a yield in a context manager is correct (yes, I've learned a little), but I suspect something is getting side-affected by the async changes in conjunction with the content manager. If I break things down to not use a context manager, nbconvert --execute no longer encounters the "generator didn't stop" issue.

I've gone ahead and created a PR against your branch in case you want to take these changes. If not, just close the PR. Thanks

@MSeal
Copy link
Contributor

MSeal commented Feb 4, 2020

As an FYI the execute preprocessor code has been lifted to https://github.com/jupyter/nbclient and shortly I'll be releasing the first version there and changing these files being touched to reference that library instead. I know this disrupts this PR severely in needing to move the PR to that repo instead, but that change for the 6.0 release finally got some tracking and this PR has been open for a very long time.

On a side note any extra input on the code sitting in nbclient before we release would be helpful. For now it's a faithful clone of functionality with a few minor interface changes from what's in master right now.

@kevin-bates
Copy link
Member

Thanks for the FYI @MSeal - that's good to know and makes sense. When/if nbclient moves to the new framework, it will need these changes.

@davidbrochart
Copy link
Member

What is the status of this PR, is somebody working on it?

@takluyver
Copy link
Member Author

Not at the moment, but there is an enhancement proposal under discussion (jupyter/enhancement-proposals#45 ) which would include adopting jupyter_kernel_mgmt as an official Jupyter package. That's probably necessary before any serious Jupyter infrastructure relies on it.

@davidbrochart
Copy link
Member

Thanks, looks like the JEP is moving forward.

@MSeal MSeal removed this from the 6.0 milestone Jul 2, 2020
@willingc willingc added the status:pending-jep Needs JEP acceptance or action label Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:pending-jep Needs JEP acceptance or action
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants