Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python3] Do not ensurepip. Provide venv instructions. #24906

Merged

Conversation

ras0219-msft
Copy link
Contributor

This PR removes pip entirely from our python3 package, because it is by design not relocatable1. Instead, we add usage instructions for users that want pip about how to build a full virtual environment, enabling them to get a pip and install packages without modifying the vcpkg installed folder.

Additional Unix usage:

The package python3 provides a python interpreter that supports virtual environments:

    $ tools/python3/python3.10 -m venv /path/to/venv
    $ export VIRTUAL_ENV=/path/to/venv
    $ export PATH=/path/to/venv/bin:$PATH
    $ export -n PYTHONHOME
    $ unset PYTHONHOME

    See https://docs.python.org/3/library/venv.html for more details.

Additional Windows usage:

The package python3 provides a python interpreter that supports virtual environments:

    >tools\python3\python3.10 -m venv c:\path\to\venv
    >set VIRTUAL_ENV=c:\path\to\venv
    >set PATH=c:\path\to\venv\bin;%PATH%
    >set PYTHONHOME=

    See https://docs.python.org/3/library/venv.html for more details.

This PR is an alternative followup from #22386.

+@mkhon +@Hoikas +@dg0yt for comments on this approach

@JackBoosY JackBoosY added category:port-bug The issue is with a library, which is something the port should already support info:internal This PR or Issue was filed by the vcpkg team. labels May 25, 2022
@ras0219-msft
Copy link
Contributor Author

+@ResearchDaniel to chime in since they're actively using embedded python via vcpkg :)

@ResearchDaniel
Copy link

ResearchDaniel commented May 26, 2022

So, if I understand everything correctly one would:

  1. During development: Create a virtual environment, for example in the build folder, and make the application use the virtual environment.
  2. During install: Bundle the Python interpreter and use it to create a virtual environment in the installed folder using some kind of post-install script. Then install packages and use the virtual environment. Or perhaps it is not necessary to do the virtual environment in the installed setup since it is not intended to be moved anyway?

@ras0219-msft
Copy link
Contributor Author

I don't fully know how things would work in the install case. Maybe you can simply take the modules out of the "build" virtual environment and ship them with your application? I assume there's some C api for the python interpreter that you would call to point it at that folder to find modules.

@ResearchDaniel
Copy link

Would not the “build” virtual environment contain the non-relocatable pip in that case?

I want to clarify that I think the PR sounds reasonable. I am mainly thinking about how to deal with it in an application. I have made an example setup using CMake, vcpkg and Python that I would like to adjust to properly include pip. My example works on Mac/Windows/Linux, but I do not test adding new packages via pip after the application has been installed so it might not catch the case described in this PR.
https://github.com/ResearchDaniel/python3-embedded-example

@ras0219-msft
Copy link
Contributor Author

ras0219-msft commented May 26, 2022

the non-relocatable pip in that case?

I assume that the user doesn't want pip as part of their final install :) If you're developing an application, I assume you want to use pip at build time to install whatever python modules you want to be available at runtime/install time -- then "extract" those out to include them in your installer.

If you actually want your installed product to have a functioning pip, then I do suppose the final application would want to use a venv as well; but that's several leaps outside my experience.

edit:

Just to make sure we're on the same page with vocabulary:

  1. starting from the python3 port
  2. vcpkg builds python3 into packages/
  3. vcpkg puts it into vcpkg_installed/ (or installed/, etc)
  4. The dev, in their dev environment, may use venv -> pip to install things
  5. The dev runs their build (maybe before, maybe after step 4)
  6. The dev tests their build, using the modules from 4
  7. The dev bundles up their application along with some of the files acquired in 4 into some final installer .msi/.rpm/.deb/.zip that they distribute
  8. The end user unpacks/installs the application bundle which contains an embedded copy of python and whatever preinstalled libraries were baked in.

@ResearchDaniel
Copy link

ResearchDaniel commented May 27, 2022

I assume that the user doesn't want pip as part of their final install

perhaps I missed something, but why is pip not being re-locatable a problem in that case? This solution would only move the problem to the virtual environment?

Our use case is probably not the most common one, but we develop a visual analysis tool which comes with pre-existing algorithms and users can also write new ones (optionally using Python). In some cases they might want to bring in new modules to load/process data which means that including pip in the installed app is a nice to have feature.

@ras0219-msft
Copy link
Contributor Author

An essential feature of vcpkg is binary caching, which assume packages can be moved to different paths on the machine. Any binary that hardcodes paths into its build folder or install prefix is incompatible with that model (without performing "fixups" at install time, which vcpkg strongly wishes to avoid).

This solution would only move the problem to the virtual environment?

In a certain sense yes; because pip is unrelocatable, we're explaining to the user how to perform the fixups for their environment (i.e. the venv command, which internally calls ensurepip which fixups pip to the user's chosen location).

I don't know your application's distribution model, but it would be very strange to have a mechanism that writes back to Program Files. I would guess that in your case you would, on first start and as a "factory reset" option, create a virtual env inside %ProgramData%/%AppDataLocal%/$HOME/.myapp/$XDG_DATA_HOME/??? and instruct users that they can mess with that to extend the system. I don't know the full structure of a python virtual env, but I assume it should be reasonably straightforward to copy some bundled set of packages from your app's data into that user data location.

The bonus of having this be in a user data location is because it enables each user to extend the product without being a system administrator. There might be some much more clever PYTHON_PATH manipulation that I don't know about to chain your Program Files location and avoid copying it into the user's private environment.

@Neumann-A
Copy link
Contributor

stupid question but isn't the pip executable/binary the only thing which has the embedded path?
If you run the pip python module via python -m pip it should run the script which shouldn't have a full path.

@ResearchDaniel
Copy link

The virtual environment will in some cases point to the vcpkg_install/tools/python3 directory (if symlinks are used), so I think that moving it would break the virtual environment unless it is possible to force it to copy the binaries.

https://docs.python.org/3/library/venv.html
“It also creates a bin (or Scripts on Windows) subdirectory containing a copy/symlink of the Python binary/binaries (as appropriate for the platform or arguments used at environment creation time)”

@ResearchDaniel
Copy link

I would guess that in your case you would, on first start and as a "factory reset" option, create a virtual env inside %ProgramData%/%AppDataLocal%/$HOME/.myapp/$XDG_DATA_HOME/??? and instruct users that they can mess with that to extend the system.

Yes, sounds like a good solution. We do something in line with that for other user-created content.

I don't know the full structure of a python virtual env, but I assume it should be reasonably straightforward to copy some bundled set of packages from your app's data into that user data location.

We will in that case add both the bundled and the user-local Python module directories to the Python sys path so I don’t think copying is necessary, but have not tested yet. We are aiming to use isolated mode, so it should not intervene with other Python environments.

@mkhon
Copy link
Contributor

mkhon commented May 31, 2022

I should mention that the initial version of #22386 did exactly what this PR does (without the instruction): pip installation was postponed and this is the only adequate thing we can do about pip

@ras0219-msft ras0219-msft merged commit f78f444 into microsoft:master May 31, 2022
@ras0219-msft
Copy link
Contributor Author

Ok, sounds like this is an improvement over the current situation. Thanks everyone!

@ras0219-msft ras0219-msft deleted the dev/roschuma/disable-ensurepip branch May 31, 2022 23:21
Jimmy-Hu added a commit to Jimmy-Hu/vcpkg that referenced this pull request May 31, 2022
[python3] Do not ensurepip. Provide venv instructions. (microsoft#24906)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:port-bug The issue is with a library, which is something the port should already support info:internal This PR or Issue was filed by the vcpkg team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants