Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a supported "high level" programmatic API for pip #3121

Closed
pfmoore opened this issue Sep 19, 2015 · 32 comments
Closed

Create a supported "high level" programmatic API for pip #3121

pfmoore opened this issue Sep 19, 2015 · 32 comments
Labels
auto-locked Outdated issues that have been locked by automation C: public api Public API stuff

Comments

@pfmoore
Copy link
Member

pfmoore commented Sep 19, 2015

This has come up a few times recently now that pip is available in a standard Python install.

Maybe we should formally support a programmatic API for pip that allows the high-level command line operations pip supports to be run from within a Python interpreter? That may simply mean blessing (and documenting) pip.main() as a supported API.

There may be some odd corner cases to take care with, if we're considering people running pip.main() from within a persistent interpreter (e.g. IPython or Idle). I'm thinking of cases where sys.modules caches something you're upgrading, for example. But that may just need documentation saying that those things need an interpreter restart, at least as a starting point.

@Ivoz
Copy link
Contributor

Ivoz commented Sep 21, 2015

I'd be hesistating to suggest an api for anything that needs an interpreter restart, because therefore it seems to me hardly more useful to run inline code but have to exit and restart your interpreter anyway.

If we were going to do something anyway, (perhaps, even, start off with operations that wouldn't warrant any interpreter restart, like package information querying), I'd suggest putting them under a fascade under a pip.api module or similar.

Also you'd want to be careful about not unconditionally relying on setuptools functionality for anything (without huge reason) and even when so, possibly not duplicating what api setupools already provides.

@RonnyPfannschmidt
Copy link
Contributor

If the only blessed api is pip.main, then there is no good reason to do it in-process some it won't have a advantage over just calling pip as subprocess

@dstufft
Copy link
Member

dstufft commented Sep 21, 2015

I'm not sure if pip should have an API or not, but certainly parts of what you'd use an API for will be shuffled over to packaging as we get new PEPs written.

@pfmoore
Copy link
Member Author

pfmoore commented Sep 21, 2015

I think there's enough people talking about using pip from within Python (either prompts or scripts) that we should provide something. Even if it's very limited, at least that will get people thinking about what exactly they need, and we'll be able to firm up on details as we go along.

@dstufft for lower-level API's based around the standards PEPs, we should absolutely be pointing people at packaging.

@RonnyPfannschmidt you have a good point. It's not at all unreasonable to tell people that to call pip they should be using

subprocess.check_call([sys.executable, '-m', 'pip', ...])

If that's not sufficient for them (for example, the format of stdout isn't useful for parsing) we can discuss options (for example, adding a --json argument to informational commands).

On the other hand, I think the big issue is that people expect to be able to do

pip.main(['install', 'foo'])
import foo

from within (say) an Idle session. This is actually no different in principle from running pip in a separate command window and not restarting your Idle session (and indeed it's exactly the same if you use subprocess) but maybe just documenting that is sufficient - after all, it's not really a pip issue that you have to take care if you change the contents of sys.path while your Python process is running.

I think I'll write up a new section for the pip documentation on the basis of subprocess being pip's supported API. That can be a starting point for people wanting to integrate pip into IDEs, and we can then address any issues as people find them.

@xavfernandez xavfernandez added the C: public api Public API stuff label Oct 8, 2015
@Rosuav
Copy link

Rosuav commented Jan 12, 2016

I was just tinkering with this kind of thing, and the API that I reached for was:

import pip
pip.install("psycopg2")

If there's a dedicated entrypoint like that, it could even choose whether to use subprocess or to do the work itself. Using pip.main(["install","psycopg2"]) does work (on Python 3.4.3, at least - on 3.5.1, there's no psycopg2 wheel, so my poor Windows VM got confuzzled and gave up), but given the concerns about subprocesses, I'm not sure that's sufficient.

Is there somewhere a summary of why it's important to call pip via subprocess rather than directly?

@dstufft
Copy link
Member

dstufft commented Jan 12, 2016

@Rosuav The short version is that Python, setuptools, and pip all have various, process level caches for different pieces of information that is affected by installing, upgrading, and uninstalling projects. Forcing pip to be called with a fresh process each time ensures that pip always sees a consistent view of the world instead of a possibly partially cached view of the world where it might get confused about the state of the system. It is likely possible to go through and make sure that these caches get invalidated or otherwise removed so that pip will always see a consistent state and can live longer than 1 process per invocation.

The other issue is that pip sort of purposely has no real public API that can be easily called from Python code. That is again because of the various caches that make it easy for things to go wrong. For example:

import foo.bar  # Assume foo 1.0 is installed, and Python will load and cache "foo" and "foo.bar" from 1.0 into sys.modules
import pip
pip.install("foo", upgrade=True)  # Pretend foo is now upgraded on disk to foo 2.0
import foo.other  # Python will return "foo" 1.0 from the sys.modules cache, and then will load and import "foo.other" from foo 2.0 and cache that in sys.modules

You now have a process that is in what I think most people would call a broken state. You have different modules from different parts of the system in the same process coming from different versions of foo. This situation is of course completely possible today, you just replace the pip.install with a call to subprocess, but one worry is that by making the programatic API easier, we're paving a cowpath that will incentivize people to do something like that which appears easy to do on the surface but which is very fragile.

@pfmoore
Copy link
Member Author

pfmoore commented Jan 12, 2016

@Rosuav To simplify even further, you should use subprocess because there's no documented or supported in-process pip API. (For why we don't provide a supported API, see @dstufft's answer).

And note that even calling pip via subprocess has risks that may bite you unless you're sure you know what you're doing (again as @dstufft explained)

@dstufft
Copy link
Member

dstufft commented Jan 12, 2016

To be more explicit, I meant to say (emphases on addition):

It is likely possible to go through and make sure that these caches get invalidated or otherwise removed so that pip will always see a consistent state and can live longer than 1 process per invocation, _however that has not been a priority of anyone working on pip, largely due to the fragility of trying to modify the installed packages at runtime_.

@Rosuav
Copy link

Rosuav commented Jan 12, 2016

The problem you describe would be the same regardless of how pip is invoked, though, wouldn't it? Whether you call pip.main as it currently is, or fire up a completely independent process in a separate terminal, upgrading and then importing can create exactly this problem.

But if pip has its own internal caching, then sure, let it fire off a subprocess. Even on Windows, the cost of starting a process won't be as much as the cost of downloading new software. It'd still be nice to be able to say pip.install(pkg, pkg, pkg) instead of subprocess.call([sys.executable, '-m', 'pip', 'install', pkg, pkg. pkg]) - which could be exactly how the install entrypoint is then implemented.

@Rosuav
Copy link

Rosuav commented Jan 12, 2016

@dstufft Fair enough re changing installed packages. TBH the only use-case I'm really looking at here is:

>>> import foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'foo'
>>> pip.install("foo")
# ... chug chug chug ...
>>> import foo
>>>

That would be the least risky operation. If anything beyond that is declared to be unsupported, fine; and if it's implemented by spawning a subprocess, no big deal. It would still be extremely convenient.

@dstufft
Copy link
Member

dstufft commented Jan 12, 2016

The problem you describe would be the same regardless of how pip is invoked, though, wouldn't it?

Yes, but by making an official top level API we signal to people that this is a supported thing they can do programmatically from within their own code. I'm of the opinion (and other pip developers may feel differently) that we should not go out of our way to provide APIs that are hard to use in a way that it is actually "safe" (e.g. not broken) to use. Especially when safe usage requires knowing if/how every single module you're importing uses that same API.

An easy thing to do would be for someone (doesn't even have to be a pip developer!) to provide a pip-api package on PyPI that did basically this. It'd be slightly worse since you wouldn't be able to do import pip and you'd need to instead do import pip_api as pip or something. It'd allow experimentation to see if A) many people would use it at all and B) when people do use it, are they able to grok how to use it safely or does it end up being a footgun. If it shows that I am wrong and my fears are unjustified, then it becomes much easier to justify adding it.

As always, I'm just a single pip developer and the other developers may feel differently and if they wanted to do this anyways, I wouldn't block them.

@pfmoore
Copy link
Member Author

pfmoore commented Jan 12, 2016

You could of course write a helper function for that yourself.

# Warning, untested code ahead!
def conditional_install(name, project_name=None):
    try:
        mod = importlib.import_module(name)
    except ImportError:
        subprocess.call([sys.executable, '-m', 'pip', 'install', project_name or name])
        mod = importlib.import_module(name)
    return name

# and call it like this
foo = conditional_install('foo')

Whoops, just saw @dstufft's response, I basically agree with him we don't want to deal with people getting into trouble using such an API, so supporting it within pip isn't something I'm keen on (in spite of me having raised this issue - I've since been persuaded it isn't a good idea). But a 3rd party module for this would be OK with me, it'd help us get a feel for what the support costs are like ;-)

@Rosuav
Copy link

Rosuav commented Jan 12, 2016

More and more I'm thinking that I need to provide all students with a little utilities module that loads stuff up on startup :) It's all very well to slap a personal function into a personal installation, but when I'm trying to walk someone through something, s/he won't have my toolbox handy. (Anyway, if I want to pip-install something, I'll just hit Ctrl-Alt-T to open up a new terminal, and run pip from there. Job done.)

@edmorley
Copy link
Contributor

Can't the students do something like...

git clone <tutorial_project>
pip install -r requirements.txt
python
<start tutorial from your slides/whiteboard/...>

@edmorley
Copy link
Contributor

Teaching them project packaging best practices (ie using a requirements file) seems like a valuable lesson in its own right :-)

@pfmoore
Copy link
Member Author

pfmoore commented Jan 12, 2016

Indeed - teaching them to install packages while working in an interactive interpreter seems to be leading them towards potential issues. "Hey, I went to do the import foo thing, remembered I hadn't done the install thing so I did that and now my import foo still won't work"...

@Rosuav
Copy link

Rosuav commented Jan 12, 2016

This is for early tinkering. I'm still debating with some of the other course writers about some points of best-practice, but for the really early tinkering (in interactive Python), there's no requirements.txt. But if we have to say git clone <stuff>; stuff/setup then so be it.

@Rosuav
Copy link

Rosuav commented Jan 12, 2016

@pfmoore Why would that still not work? If you attempt to import and it fails, wouldn't pip-installing the package make that then succeed?

@pfmoore
Copy link
Member Author

pfmoore commented Jan 12, 2016

Well, I over-simplified - I was thinking of something like "import foo.bar" / "import foo.baz" as per @dstufft's example above. You may be OK for your specific use cases, but I'm concerned you've taught users that it's OK to install stuff "behind the back of" a running interpreter, and this will later come back to bite them.

But once again, this is more a case of "not supported" than "won't work". You asked above for advice, and basically the advice is "don't do this". But you know your requirements better than we do - if you feel you can make things work in a way that's beneficial to your students, that's fine.

@Rosuav
Copy link

Rosuav commented Jan 12, 2016

Fair enough. It's not that I'm teaching people that it's a good idea to do this, more that I'm just trying to help people get started without too many hoops to jump through. Once they get a bit of the basics down, I can start explaining how to properly structure a project, how to get their SSH keys set up so they don't need to use passwords, how to refactor code to improve readability, how to migrate a database... and how to install/upgrade packages without causing problems.

The real solution is probably just to provide a ready-to-use system that has a bunch of packages preinstalled for them. Hence the git clone <stuff>; stuff/setup thought. That's probably going to end up the best and safest way to keep trouble away.

@dstufft
Copy link
Member

dstufft commented Jan 12, 2016

I think the simple case of:

import pip

try:
    import requests
except ImportError:
    pip.install("requests")
    import requests

will correctly work, but only because CPython doesn't cache failed imports in sys.modules. If it ever started to do that (or if another implementation does that) it would start to fail of course. It might be surprising to students (or it might not!) that something like (pretending interactive instead of a script):

import thing_that_can_optionally_use_requests
try:
    thing_that_can_optionally_use_requests.method_that_requires_requests()
except thing_that_can_optionally_use_requests.NeedsRequestsError:
    import pip
    pip.install("requests")
    thing_that_can_optionally_use_requests.method_that_requires_requests()

doesn't work when the first snippet does if the thing_that_can_optionally_use_requests uses the typical pattern for optional dependencies of:

try:
    import requests
    HAS_REQUESTS = True
except ImportError:
    HAS_REQUESTS = False

def method_that_requires_requests():
    if not HAS_REQUESTS:
        raise NeedsRequestsError

It might be helpful to you to know that pip can (unless I am remembering incorrectly) access a requirements file that is located on a remote server. I don't know if your only setup you need is to install pip packages or if there's more beyond that, but you could do something like:

$ pip install -r https://awesomeclass.example.com/course-intro-requirements.txt

I think pip will first download that file, then parse it and treat it as if it were a local requirements file. Of course if you have more steps you want to do than just install Python packages then a stufft/setup command might be overall more beneficial since you can do more to setup the environment that way.

@Rosuav
Copy link

Rosuav commented Jan 12, 2016

It might be helpful to you to know that pip can (unless I am remembering incorrectly) access a
requirements file that is located on a remote server.

Ooh, I didn't know that! I did have a few other things in mind (setting up PostgreSQL with autostarting, and others could be added), but if a single copy/paste of a single command can install a bunch of stuff, that might be sufficient! Thanks for the tip.

@Rosuav
Copy link

Rosuav commented Jan 20, 2016

Hmm. If the recommended interface is to be the subprocess module, I hope that check_call isn't the specific flavour chosen - in the event of an error, all information about what went wrong is discarded. It's the equivalent of try: ... except: raise Exception("pip failed"). Using subprocess.getstatusoutput is more informative; but quite a bit clunkier to use. This suggests that a wrapper is worth adding, and honestly, the obvious name for that wrapper is pip.install('packagename'). Even if all it does is a one-line subprocess call, it'd be of value; if it could turn a failure into an exception that incorporates some of stdout/stderr, that'd be far more helpful.

@pfmoore
Copy link
Member Author

pfmoore commented Jan 20, 2016

Agreed, check_call is over-simplified. A more robust wrapper is probably useful, but I'm not 100% convinced there's an "obvious" API for it. For example, with pip.install('packagename') where is the output going to go? How will errors be flagged? (Be careful here, IIRC there are some funky setup.py cases that can fail in ways that pip can't detect, so sometimes the user has to scan the output). You can cover all of these things easily in an application-specific wrapper, but in a library you need more general answers.

IMO, the obvious answer is to have a 3rd-party project that provides an API to run pip in a subprocess. Do the design in that project (which can be documented as experimental, unlike pip which has to consider backward compatibility) and once the project settles on a stable, widely useful API, then propose that API for inclusion into pip.

@reinout
Copy link

reinout commented Apr 6, 2016

A question that comes up from time to time is "why can't buildout use pip instead of setuptools for installing packages". The answer has always been "because pip doesn't have an api and we won't ever call it on the command line because that would be very weird".

You can debate whether setuptools has a proper api, but that's what buildout has been using till now.

Question: so, from pip's viewpoint, it is perfectly OK to subprocess.call() pip on the commandline? It sounds like this is the way pip is intended to be used?

(I'm leaving aside the question whether packaging should be used)

@Ivoz
Copy link
Contributor

Ivoz commented Apr 8, 2016

Question: so, from pip's viewpoint, it is perfectly OK to subprocess.call() pip on the commandline? It sounds like this is the way pip is intended to be used?

Sure. IIRC it might still have an issue that it always needs a valid stdout (and stderr?) to run, but apart from that, that's just another way of scripting the running of pip instead of typing in an invocation by hand.

@mauritsvanrees
Copy link
Contributor

I am experimenting with bringing pip support to buildout, probably not in core, but via a buildout recipe.

I understand from the above why everyone is hesitant on creating and/or blessing an api. So a subprocess call may still be the best.

But in this comment I simply want to present what currently works for me:

from pip import parseopts
from pip.commands import commands_dict
from pip.exceptions import PipError

pip_args = ['--install', '--disable-pip-version-check', 'requests']
try:
    cmd_name, cmd_args = parseopts(pip_args)
except PipError as exc:
    sys.exit(1)
command = commands_dict[cmd_name](isolated=False)
command.main(cmd_args)

See my buildout pip branch though that file contains two conflicting ideas.

This is nicer than subprocess because we can use the parseopts from pip and use it to improve the final command: "Hey, this pip version has --brand-new-option, so let's use it."

So for me, if an api would consist of those three imports (parseopts, commands_dict, and PipError) that would seem to do the trick. Out of these three, PipError and parseopts look like they will remain, but for commands_dict I am more concerned that this should be considered an implementation detail that may change without notice.

If someone now shouts "yes, let's bless these as api" then that would be great. But I won't hold my breath. And I understand the reasons.

@dstufft
Copy link
Member

dstufft commented Jan 20, 2017

I suspect PipError will stay, but parseopts and commands_dict are likely going to go away at some point when I get us off our home grown optparse shenagins and switch us to using click instead.

@mauritsvanrees
Copy link
Contributor

Okay, thank you for the warning.

@dstufft
Copy link
Member

dstufft commented Mar 31, 2017

I'm going to close this issue, there's not really anything actionable on it besides making a decision one way or the other about a public API and our defacto decision has been to not add one. I don't think holding open an issue any longer is of value even if we revisit that in the future.

@dstufft dstufft closed this as completed Mar 31, 2017
jhkennedy pushed a commit to LIVVkit/LIVVkit that referenced this issue May 9, 2018
The pip API is unsupported and subject to change, see:
    pypa/pip#3121

The recommended way to programmatically install and import a module is
to use subprocess and importlib
@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 3, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 3, 2019
@brainwane
Copy link
Contributor

The problem you describe would be the same regardless of how pip is invoked, though, wouldn't it?

Yes, but by making an official top level API we signal to people that this is a supported thing they can do programmatically from within their own code. I'm of the opinion (and other pip developers may feel differently) that we should not go out of our way to provide APIs that are hard to use in a way that it is actually "safe" (e.g. not broken) to use. Especially when safe usage requires knowing if/how every single module you're importing uses that same API.

An easy thing to do would be for someone (doesn't even have to be a pip developer!) to provide a pip-api package on PyPI that did basically this. It'd be slightly worse since you wouldn't be able to do import pip and you'd need to instead do import pip_api as pip or something. It'd allow experimentation to see if A) many people would use it at all and B) when people do use it, are they able to grok how to use it safely or does it end up being a footgun. If it shows that I am wrong and my fears are unjustified, then it becomes much easier to justify adding it.

As always, I'm just a single pip developer and the other developers may feel differently and if they wanted to do this anyways, I wouldn't block them.

If I understand correctly, pip-shims ("a set of compatibilty access shims to the pip internal API") now exists to work around the current lack of a high-level API for programmatic access to pip internals.

@pradyunsg
Copy link
Member

That is appropriate, as long as folks using it don't expect it to "just work". Further, as noted in the README of that package:

The authors of pip do not condone the use of this package. Relying on pip’s internals is a dangerous idea for your software as they are broken intentionally and regularly. This package may not always be completely updated [snip], so relying on it may break your code! User beware!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation C: public api Public API stuff
Projects
None yet
Development

No branches or pull requests