Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install kernelspecs to datarootdir #61

Closed
wants to merge 1 commit into from

Conversation

jdemeyer
Copy link
Contributor

There is currently no documented standard way of installing Jupyter kernels, leading to various different kernels doing it in a different way. The echo_kernel should be extended to a complete Python package, including setup.py.

The recommended way to install Jupyter kernel specs should be using distutils' data_files:

setup(...,
    data_files=("share/jupyter/kernels/MyKernel", [...])
)

We need to figure out how to emulate this behaviour for non-Python kernels.

@minrk
Copy link
Member

minrk commented Sep 24, 2015

In many cases, this gives the wrong value:

  • homebrew Python
  • debian system Python
  • Python.org Python on OS X

In all the above cases, /usr/local is the right value, and datarootdir is not.

@jdemeyer
Copy link
Contributor Author

On OS X, the value is

/System/Library/Frameworks/Python.framework/Versions/2.7/share

why is that "wrong"?

Isn't datarootdir meant to be used for installing data files? Surely, using a directory that Python suggests is better than the arbitrary hardcoded choice of /usr/local/share.

@minrk minrk added this to the no action milestone Sep 25, 2015
@minrk
Copy link
Member

minrk commented Sep 25, 2015

@jdemeyer nothing other than the system should ever write a file in /System on OS X. The same goes for /usr on Linux.

Surely, using a directory that Python suggests is better than the arbitrary hardcoded choice of /usr/local/share.

If you can provide an example where Python provides a reasonable value, that would be great. But I looked for a while (including sys.prefix, etc.) and could not find one that gives a sensible value in even the most common cases.

The issue is that there are many Python installations (e.g. a system Python on OS X or Linux, or homebrew Python) where Python's own installation path should not be the installation path for user-installed things, and there doesn't seem to be a standard way of implementing this. Many of these in fact patch Python itself to change the installation process, to ensure things don't go in these locations.

@jdemeyer
Copy link
Contributor Author

[...] should not be the installation path for user-installed things

We're talking about system-installed things here, not user-installed things. Maybe that's the main point of the confusion?

@jdemeyer
Copy link
Contributor Author

If you can provide an example where Python provides a reasonable value, that would be great.

The main motivation for this pull request was Sage. Within Sage, sysconfig.get_config_var("datarootdir") is certainly the right directory to use.

@minrk
Copy link
Member

minrk commented Sep 25, 2015

@jdemeyer system-installed is distinguished from system-wide install by user. That is, /usr should never be touched by a command other than apt on Debian, Similarly no action should ever modify /System on OS X. In general, no user-initiated system-wide installs should go in /System or /usr. On the vast majority of OS X and Linux, the correct prefix for a user-initiated system-wide install is /usr/local.

@jdemeyer
Copy link
Contributor Author

OK, I understand the problem. There are really 3 cases:

  • distro-initiated system install
  • user-initiated system install
  • user-initiated user install

There is a problem distinguishing in software between the first two cases. Suppose I am a distro developer and I want to install some Jupyter kernel files, then what should I do? Right now, the only solution seems patching the Jupyter sources or some manual copying.

I don't have a simple solution, but it's clear that there is a problem to be solved here.

@minrk
Copy link
Member

minrk commented Sep 25, 2015

Right now, the only solution seems patching the Jupyter sources or some manual copying.

You can specify --prefix for the installation location if the default isn't what you want. Packagers can use this to install to /usr for the apt-provided version of a package, for instance.

@jdemeyer
Copy link
Contributor Author

You can specify --prefix for the installation location if the default isn't what you want. Packagers can use this to install to /usr for the apt-provided version of a package, for instance.

The whole point of this pull request is that jupyter_core totally ignores the prefix and just uses /usr/local/share regardless.

@minrk
Copy link
Member

minrk commented Sep 25, 2015

@jdemeyer I'm not sure what you mean by ignoring it. While the default install path is /usr/local, it does add sys.prefix/share to the search path, specifically for system-provided files. Perhaps datarootdir should be added there as well (or instead), if it ever differs.

@jdemeyer
Copy link
Contributor Author

So, are you saying that it's wrong for install_kernel_spec() to default to SYSTEM_JUPYTER_PATH[0] for a system install? Because that's the actual problem. I wanted to fix this by changing SYSTEM_JUPYTER_PATH, but it can also be fixed by changing install_kernel_spec() (in jupyter_clien I think).

@minrk
Copy link
Member

minrk commented Sep 25, 2015

@jdemeyer if a system install should go in a different path, it can specify --prefix. For instance, in building a .deb, one could:

jupyter kernelspec install --prefix=/usr

Or more generally:

jupyter kernelspec install --prefix=`python -c 'import sys; print(sys.prefix)'

Only if you want to change the default behavior when users install things after the fact would modifying the code make sense.

@minrk
Copy link
Member

minrk commented Sep 25, 2015

Can you find documentation for whether third-parties should be considering Python's datarootdir as an install path? Because all I find are direct references to sys.prefix/share. The value is the same in every case I could find, but I'd rather match behavior for things like install_data, which uses sys.prefix/share, if we do come across cases where the two differ.

@jdemeyer
Copy link
Contributor Author

I agree with replacing datarootdir (which seems undocumented anyway) by sys.prefix/share.

@jdemeyer
Copy link
Contributor Author

See also jupyter/jupyter_client#75

@jdemeyer
Copy link
Contributor Author

I didn't understand your comment about jupyter kernelspec install. Are users who want to install third-party kernels supposed to manually run that command? Or it is supposed to be run by setup.py (but then, what's the prefix?).

I guess what I'm missing is information for kernel authors on how they should implement the installation of the kernel spec file. Since this is Python, there should be an easy way to do this using distutils. Something like this in setup.py should work and be The Right Thing:

from jupyter_core.paths import SYSTEM_JUPYTER_PATH

kernelpath = os.path.join(SYSTEM_JUPYTER_PATH[0], "kernels", "my_kernel_name")

setup(......,
    data_files=[(kernelpath, ["kernel.json"])]
)

So I see two possibilities:

  1. (this pull request) change SYSTEM_JUPYTER_PATH[0] to be some directory based on sys.prefix
  2. (Use ENV_JUPYTER_PATH as default system path for installing kernels jupyter_client#75) Keep SYSTEM_JUPYTER_PATH and make ENV_JUPYTER_PATH[0] the default for kernel specs.

@minrk
Copy link
Member

minrk commented Sep 25, 2015

If you go with data_files, you should use relative paths:

data_files = [
    ('share/jupyter/kernels/mykernel', ['/path/to/src/kernel.json'])
]

and there's nothing to import from Jupyter.

Alternately, you can install with Jupyter APIs, and there are three available choices:

  1. default (system-wide)
  2. user=True
  3. prefix=sys.prefix

prefix=sys.prefix makes sense if you want the kernel to only be available when the notebook server is an env of various sorts (conda, virtualenv, manual install), but it's frequently wrong for non-env cases, especially the case of the kernel being in an env that's different from the notebook server.

@jdemeyer
Copy link
Contributor Author

jdemeyer commented Oct 3, 2015

If you go with data_files, you should use relative paths:

data_files = [
('share/jupyter/kernels/mykernel', ['/path/to/src/kernel.json'])
]

According to distutils documentation, this might install stuff in sys.exec_prefix. I don't know when prefix and exec_prefix are different, but still...

In any case, I would prefer to get the path from Jupyter such that it's certainly correct.

Alternately, you can install with Jupyter APIs, and there are three available choices:

default (system-wide)
user=True
prefix=sys.prefix

Sure, there are many choices. But we don't need many choices, we need one good choice. I think it's not good if every kernel uses a different system to install the kernel specs.

As long as this isn't resolved, I am just going to use ENV_JUPYTER_PATH[0] for the PARI/GP kernel that I'm working on.

@vbraun
Copy link

vbraun commented Oct 17, 2015

IMHO /usr/local is never the right value, its cargo cult from a time before we had package management. Users should never do system-wide installations. Rely on your distributor (who will have to patch out the /local/ for packaging), or do a private install (~/.local). If you are maintaining a multiuser system then you should be packaging & installing as well, that is just standard devops best practices.

On a related note, the R ipython kernel also ignores sys.prefix and installs the to /usr/local. So you can't install it into a prefixed Python / virtualenv, yay.

@rgbkrk
Copy link
Member

rgbkrk commented Oct 17, 2015

On a related note, the R ipython kernel also ignores sys.prefix and installs the to /usr/local. So you can't install it into a prefixed Python / virtualenv, yay.

The sys.prefix versions of the paths setup should be deprecated.

@jdemeyer
Copy link
Contributor Author

We have been discussing some of this stuff at Sage Days 70 and there is a general agreement that sys.prefix is a better default than /usr/local/.

Regarding setup.py, relative paths for data_files actually works well using distutils. The documentation that it might install in sys.exec_prefix is simply wrong (https://bugs.python.org/issue25592). So something like this works:

kernelpath = os.path.join("share", "jupyter", "kernels", "pari_jupyter")

setup(...,
    data_files=[(kernelpath, glob("spec/*"))],
)

Creating and installing wheels also works this way.

Annoyingly, a plain installation does not work with setuptools. Actually, setuptools documents this as a feature.

A more robust version of setup.py would therefore look like

from setuptools import setup

kernelpath = os.path.join("share", "jupyter", "kernels", "pari_jupyter")

setup(...,
    data_files=[(kernelpath, glob("spec/*"))],   # For bdist_wheel
)

install_kernel_spec(...)   # For plain install

@jdemeyer
Copy link
Contributor Author

Here is a write-up of the Sage Days 70 discussion:

Jupyter Kernels

Follow-up on #61

  • Jupyter kernelspecs should default to install to python's sys.prefix.
  • There should be a convenience function in jupyter_client to construct the kernel spec path, which can be used in a package's setup.py, for example.
  • Figure out what works with Python wheels.
  • Document the work here in http://jupyter-client.readthedocs.org/en/latest/
  • Make the EchoKernel into a complete python package with a setup.py that installs the kernelspec, as an example.

Documentation

  • Overview documentation page that guides a user to the appropriate repo docs.

@minrk
Copy link
Member

minrk commented Nov 10, 2015

That's interesting. I guess we will just special-case all of the instances where /usr/local is actually the right choice (Linux, OS X, etc.).

@jasongrout
Copy link
Member

But /usr/local is the wrong choice on those platforms for Anaconda, for example. I don't think you can make a platform-wide decision.

@minrk
Copy link
Member

minrk commented Nov 10, 2015

@jasongrout sorry, I was referring to the system-wide cases (System Python on OS X and debian, Python.org, homebrew), not things like conda, envs, or pythonbrew. It's also certainly not objectively true that putting kernelspecs in the env is strictly preferable to leaving them system-wide.

Here's what I think makes the most sense: rather than using sys.prefix directly, we should ask where data_files would go, since that's actually what will be used to install. In some cases these differ, and everywhere they differ, sys.prefix is the wrong choice. This can be done with a tiny amount of distutils code. I've tested it on a few systems, and while it does give answers I don't like on a few cases, it's strictly better than using sys.prefix or datarootdir directly (same in most cases, always better when they differ), and where it gives answers I don't like, I can come to terms with saying "That's where Python puts data files, don't blame me."

A bonus to asking where data_files will go means we should get --user installs supported without any additional complication.

The rest - making a complete working example, simple API for adding to setup.py makes perfect sense to me.

These are the challenges I see of installing to Python's prefix:

  • Kernelspec files included with wheels cannot include the complete path to sys.executable Conda handles rewriting the magic paths for its packages, but pip is much more primitive, and only does this for the shebang line of scripts. This isn't a change, really, but it limits the scope of how much we can expect from wheels.
  • How do we make a kernelspec for one env available to a notebook server in another? This is made worse by the change, but isn't necessarily difficult. It could just be a matter of documentation. Installing with an alternate --prefix should do it, we just need to make it clear and well documented, since it will no longer work by default. The important thing here is that the env of the kernel and the env of the server should not be tied to each other.

@jdemeyer
Copy link
Contributor Author

Here's what I think makes the most sense: rather than using sys.prefix directly, we should ask where data_files would go, since that's actually what will be used to install.

Given that this is sys.prefix, I of course agree.

In some cases these differ

Are you sure? I'm not saying it's impossible to make them differ, I'm just wondering if it ever happens it practice. Note that you also need to make sure that Jupyter actually looks in that directory for kernel specs.

@jdemeyer
Copy link
Contributor Author

Two more arguments for using sys.prefix instead of /usr/local:

  • If you're running ./setup.py install in a Python package, the Python modules will be installed in sys.prefix anyway (that's what distutils/setuptools does). So it seems strange to install everything in sys.prefix except the Jupyter kernel spec.
  • Wheels uses sys.prefix (well, to be more precise, it gets the installation directory from distutils). If you want to be compatible with wheels, you cannot use a hard-coded directory.

@jdemeyer
Copy link
Contributor Author

Kernelspec files included with wheels cannot include the complete path to sys.executable

Why should kernelspecs include the full path to sys.executable? I don't think that's a good idea. What's wrong with python -m MyKernel?

How do we make a kernelspec for one env available to a notebook server in another?

Are you sure we want to do this? Isn't the whole purpose of envs that they are supposed to be separate installs?

@jdemeyer
Copy link
Contributor Author

rather than using sys.prefix directly, we should ask where data_files would go

Actually, we don't need to ask distutils, we just need to use data_files:

setup(...,
    data_files=("share/jupyter/kernels/MyKernel", [...])
)

(by definition, this will install the kernel spec in the directory where distutils installs data_files)

Problem: this doesn't work with setuptools :-( so we need some special-case for this.

@minrk
Copy link
Member

minrk commented Nov 11, 2015

Are you sure?

Yes. Debian and Homebrew are both examples.

Why should kernelspecs include the full path to sys.executable?

For the same reason that Python does it in scripts: a kernelspec should, generally, correspond to a particular Python. Changing your PATH shouldn't change the interpreter associated with the kernelspec. If all kernelspecs use only python, then all Python kernelspecs are actually the same, defeating most of the point.

Actually, we don't need to ask distutils, we just need to use data_files:

This is not true in general, because we need to ask at runtime to build the search path. But you are right that you won't need to ask in setup.py. There is also the fact that most kernelspecs are not Python packages, so setup.py install is not relevant to most kernelspec installation.

Wheels uses sys.prefix (well, to be more precise, it gets the installation directory from distutils). If you want to be compatible with wheels, you cannot use a hard-coded directory.

We've always included the env prefix in the search path, so installing with data_files already works; it's just not the default location. You've always been able to install in the env. The only question here is whether that should be the default behavior or not.

How do we make a kernelspec for one env available to a notebook server in another?

Are you sure we want to do this? Isn't the whole purpose of envs that they are supposed to be separate installs?

Yes, we certainly do want this, and it's why we have the current default behavior. It's quite sensible to run a webserver in one env, and use kernelspecs to run code in another. The primary goal of the kernelspec system is that you run one notebook webserver, and use kernelspecs to specify different execution environments, be they different languages, configurations, or environments. Switching the env for your kernel should not require installing and running multiple copies of the notebook server.

@jdemeyer
Copy link
Contributor Author

There is also the fact that most kernelspecs are not Python packages, so setup.py install is not relevant to most kernelspec installation.

Right, I always forget that. In any case, asking distutils isn't particularly hard, wheel already does that.

@jdemeyer
Copy link
Contributor Author

@jdemeyer
Copy link
Contributor Author

It's quite sensible to run a webserver in one env, and use kernelspecs to run code in another.

Right, but that's already advanced usage. I think the default should aim for the most basic usage, which is just one env (be it system or not) and one Python executable.

@minrk
Copy link
Member

minrk commented Nov 12, 2015

I started putting together a sketch of the proposal; I'll post something when it's together. I'm really on the fence, and keep leaning one direction or the other. It's certainly worth prototyping, though. And multiple envs is advanced usage, as you said.

Re: setuptools, under no circumstances should a non-pip setuptools install ever happen, so I'm less concerned about that one. This is just one more reason why it's broken.

@jdemeyer
Copy link
Contributor Author

I started putting together a sketch of the proposal

I'm looking forward to it :-)

<rant>
Let me also add that I got a bit frustrated by the more general mess of package installation in Python. I had to dive deep in these issues to study this ticket here. Python packages can be installed using distutils setup.py, setuptools setup.py, easy_install, pip, wheel and they all behave slightly different. I think upstream Python should more clearly make up their mind what they really want.
</rant>

@jdemeyer
Copy link
Contributor Author

Whatever you come up with, it should also support a --user installation in $HOME/.local. Again, distutils gets this right if you use data_files with a relative path.

@willingc
Copy link
Member

@jdemeyer Not to restart the "rant" from an earlier message, upstream CPython does publish an authoritative guide to packaging. Link to repo | Link to guide The pypa repos should be referred to for packaging in Python. Just an FYI for future reference.

@jdemeyer
Copy link
Contributor Author

Not to restart the "rant" from an earlier message, upstream CPython does publish an authoritative guide to packaging.

Are you sure this is "upstream CPython" and not some random people publishing something on the internet?

@takluyver
Copy link
Member

It's somewhere in between, as I understand it. The 'Python Packaging Authority' is not upstream CPython, but it has some amount of official blessing from core developers (and doubtless some overlap). PyPA maintains pip and setuptools, which are the primary packaging and installation tools.

@willingc
Copy link
Member

@jdemeyer Yes. CPython core devs, Donald Stufft and Nick Coghlan, are the leads of PyPA.

@jdemeyer jdemeyer changed the title Use datarootdir for SYSTEM_JUPYTER_PATH Improve installation of kernel specs Jan 14, 2016
@minrk minrk changed the title Improve installation of kernel specs install kernelspecs to datarootdir Jan 22, 2016
@minrk
Copy link
Member

minrk commented Jan 22, 2016

We can continue discussion in the proposal in #69.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants