Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt to changing Pip internals #1436

Merged
merged 1 commit into from
Oct 29, 2019
Merged

Conversation

chadwhitacre
Copy link

@chadwhitacre chadwhitacre commented Oct 28, 2019

pypa/pip@09fd200 moved Pip's main function underneath a new pip._internal.main module. Playing a risky game, using pip._internal. ;-)

pypa/pip@09fd200 moved Pip's `main` function underneath a new `pip._internals.main` module. Playing a risky game, using `pip._internals`. ;-)
@pfmoore
Copy link
Member

pfmoore commented Oct 28, 2019

Why not use a subprocess call (the documented way of using pip from another program) instead of calling the internal main function?

@gaborbernat
Copy link
Contributor

Because subprocess is slow...

@pradyunsg
Copy link
Member

Use runpy?

@pradyunsg
Copy link
Member

https://github.com/pradyunsg/pip-cli/blob/master/src/_pip_cli.py#L75

An example of how to do that, if you're interested in doing that. That said, it's still an unsupported way of doing things -- a subprocess is slower but it's easier to maintain state of.

@gaborbernat
Copy link
Contributor

I don't want to make things slower. I think pip should expose it's cli interface directly, rather than just via an entry point, but that ship sailed a while ago.

@pfmoore
Copy link
Member

pfmoore commented Oct 28, 2019

I don't want to make things slower

I can understand that - virtual environment creation is frustratingly slow at times. But is invoking pip really the bottleneck here (as opposed to the actual installs)? I'd suggest doing some measurements. But I can accept the possibility that it might be worth saving the subprocess call (especially on Windows, where extra processes are more costly).

I think pip should expose it's cli interface directly

See pypa/pip#3121 for the last big debate on this.

It is possible to call pip's main() function without hitting issues (the various pip wrappers that exist are an obvious example, and virtualenv is another one). But it's not something we'd want to support, because people won't always use it in a safe manner.

Hmm, here's a thought. Maybe we could expose a pip.main API, that displayed a big warning saying "this usage is unsupported, use at your own risk", and then called pip's actual main function (which would remain internal). Applications could call that API, but they would have to use Python's warning mechanisms to suppress the warning, which acts as their acknowledgement that they understand the risks they are taking. I've no idea if anyone would consider that to be a good idea (I've literally given it less than a minute's thought) but it might be a compromise that would be useful to people who really feel that a subprocess call is too much.

@pradyunsg
Copy link
Member

Why is using runpy not an OK option here? There's very little overhead to it.

@pfmoore
Copy link
Member

pfmoore commented Oct 29, 2019

@pradyunsg Actually, agreed. runpy is a better approach here. I missed that. You're right I don't see any obvious reason why using runpy wouldn't be a (still unsupported, as pip isn't designed to be run in-process) suitable way of running pip if you don't want a subprocess.

@gaborbernat
Copy link
Contributor

still unsupported, as pip isn't designed to be run in-process

why?

@pfmoore
Copy link
Member

pfmoore commented Oct 29, 2019

why?

If you mean "why wasn't it designed that way?" then it just wasn't originally, and it would be a lot of work to change that now. Pip was always intended as a standalone utility rather than a library, for better or worse.

If you mean "why doesn't it work in-process" then see the issue I linked to, but briefly, from what I remember:

  1. It doesn't work if used in a threaded program. I don't recall why (something to do with thread-local values, I think) but it wasn't something we ever got to the bottom of, and as it wasn't a use case we were particularly interested in supporting, other things took priority.
  2. It uses the (global) logging infrastructure, and if you use it in a program that expects to control the logging config, there are conflicts.
  3. Python's import system itself does a lot of caching (as do other tools like pkg_resources), and if you modify site-packages from a running program, those caches get out of sync with the filesystem, which can cause weird behaviour. This isn't so much a problem with pip as a consequence of how Python works, but by not supporting the use of pip in-process, we avoid the need to get into subtle distinctions like that.
  4. Probably other things too. It's more that we assume we're in charge of global state so we don't take care to preserve it for callers. We could change that, but rooting out all the hidden assumptions that get made is a lot of work, and we don't have the resource to do it, even if we decided we wanted to.

@gaborbernat
Copy link
Contributor

gaborbernat commented Oct 29, 2019

I was referring to the later.

  1. It doesn't work if used in a threaded program

Fair enough, though here the expectation wouldn't be to work in parallel.

2. uses the (global) logging infrastructure

This shouldn't be an impediment as long as it uses it's own logging namespace. The only potential improvement could be to allow disable-ing the config setup.

3. Python's import system itself does a lot of caching

This is only an issue, if and only if, you're modifying the same site-package as the one you're running in (which wouldn't necessarily be the case with virtualenv), and furthermore this is just as much as a problem with subprocesses. If you target the same site-packages your imports might be out of date after invocation. Imports within pip shouldn't be an issue as pip vendors all its dependencies, and hopefully no one is altering the standard library with pip.

4. It's more that we assume we're in charge of global state

Now this part is more concerning. Going along this expectation this would make in process tests also a lot harder. And debugging subprocesses is not as straight forward and pleasent as with the in-process ones. I would expect ensure-ing we don't mutate global state to be a beneficial place as far as ease of testing and maintainability goes.

On the topic

That being said I'm not too tied to using either direct pip internals or runpy. The virtualenv rewrite forgoes entirely using pip for bootstrapping, it's just too slow (yeah I profiled it - initial progress on it):

rm perf -rf; time virtualenv --no-download perf --no-wheel
New python executable in /virtualenv/perf/bin/python
Installing setuptools, pip...
done.
        2.72 real         1.80 user         0.86 sys

New way:

rm perf -rf; time .tox/dev/bin/virtualenv --no-download perf                                                                                                                                                                                                                                        
create with venv /.pyenv/versions/3.7.4/bin/python3.7 -m venv --without-pip /virtualenv/perf
[2019-10-15 19:04:56,429] INFO [/virtualenv/src/virtualenv/interpreters/create/api.py:71]
        0.28 real         0.20 user         0.05 sys

Until that gets released I think I'm content to keep the current status quo of doing things.

@pfmoore
Copy link
Member

pfmoore commented Oct 29, 2019

Going along this expectation this would make in process tests also a lot harder.

Tell me about it :-)

The virtualenv rewrite forgoes entirely using pip for bootstrapping, it's just too slow (yeah I profiled it - initial progress on it):

Nice! So I assume you manually unpack the embedded wheels or something like that? That seems like an entirely sensible option, there's no need for the whole of pip's machinery if all you want to do is unpack a wheel. I'd actually like to see a general library that did that, as I think it's often a good choice. (It's a shame wheel decided not to support a programming API as it has this functionality internally - but I'm not in any position to complain about a decision like that 😉)

Until that gets released I think I'm content to keep the current status quo of doing things.

That sounds entirely reasonable.

@gaborbernat
Copy link
Contributor

gaborbernat commented Oct 29, 2019

Nice! So I assume you manually unpack the embedded wheels or something like that?

Yeah, though slightly more complicated (https://github.com/gaborbernat/virtualenv/blob/rewrite/src/virtualenv/seed/link_app_data.py#L33-L41):

  • extract,
  • add INSTALLER file,
  • generate console scripts,
  • extend RECORDS with the pycache files, plus INSTALLER plus console scripts.

I'm also experimenting on just linking the files/folders via symlink/junctions/hard link, rather than copy; though this seems a lot harder, especially on Windows, mostly because pip seems to want to do a lot more than asked, e.g. even if in the RECORDS.txt I say only remove the junction it goes and removes the entire tree the junction entree points too 😢 Also before 19.3 resolves symlinks and removes target, rather than just the symlink. Still thinking about how can I side step those limitations.

@gaborbernat gaborbernat merged commit 43ccf4a into pypa:master Oct 29, 2019
@pfmoore
Copy link
Member

pfmoore commented Oct 29, 2019

generate console scripts,

So for this you presumably need to vendor distlib or similar, to get the required .exe files for Windows? (I'd forgotten, that's where wheel unpacking is a bit tricky, which is why I think a library that handles it would be useful).

@gaborbernat
Copy link
Contributor

I've put the windows console script generation on the TODO list 🤔 from what I saw in pip it's done via distlib that's part of stdlib, not?

@xavfernandez
Copy link
Member

via distlib that's part of stdlib, not?

Not: https://bitbucket.org/pypa/distlib/src/default/ ^^

@gaborbernat
Copy link
Contributor

Oh thanks, no vendoring needed luckily 😄 as the new virtualenv can have install requires 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants