Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribute a zipapp version of pip #158

Merged
merged 10 commits into from
Sep 18, 2022
2 changes: 1 addition & 1 deletion noxfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def check(session):
@nox.session
def generate(session):
"""Update the scripts, to the latest versions."""
session.install("packaging", "requests", "cachecontrol[filecache]", "rich")
session.install("packaging", "requests", "cachecontrol[filecache]", "rich", "pkg_metadata")

session.run("python", "scripts/generate.py")

Expand Down
Binary file added public/pip-22.2.2.pyz
Binary file not shown.
Binary file added public/pip.pyz
Binary file not shown.
89 changes: 88 additions & 1 deletion scripts/generate.py
Original file line number Diff line number Diff line change
@@ -1,20 +1,23 @@
"""Update all the get-pip.py scripts."""

import io
import itertools
import operator
import re
import shutil
from base64 import b85encode
from functools import lru_cache
from io import BytesIO
from pathlib import Path
from typing import Dict, Iterable, List, TextIO, Tuple
from zipfile import ZipFile
from zipfile import ZipFile, ZipInfo

import requests
from cachecontrol import CacheControl
from cachecontrol.caches.file_cache import FileCache
from packaging.specifiers import SpecifierSet
from packaging.version import Version
from pkg_metadata import bytes_to_json
from rich.console import Console

SCRIPT_CONSTRAINTS = {
Expand Down Expand Up @@ -268,6 +271,87 @@ def generate_moved(destination: str, *, location: str, console: Console):
with open(destination, "w", newline=newline) as f:
f.write(rendered_template)

ZIPAPP_MAIN = """\
#!/usr/bin/env python
Copy link
Member

@pradyunsg pradyunsg Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this string, to be loaded from a file kept in the templates directory? It’ll need a change in line 122 as well.

It might be worth inlining the version check and allowing None as a value on the variable post-render of the main script, to indicate skipping the version check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with doing this. However:

  1. The template processing code (including the detect_newline calls) will then be duplicated in 3 places, making it a good candidate for refactoring.
  2. There's a possible bug in the code as it uses a raw open() without a specified encoding. We should probably review and fix this (even if the "fix" is simply to comment that omitting the encoding is OK) when we refactor.

I'd be happier to handle this refactoring as a follow-up PR, so we don't block this change on getting the details of the template handling right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I should have said "I'd be happier to handle switching to using a template here and refactoring the common code out as a follow-up PR". In other words, can we leave the code as an inline string until we refactor the template code out (at which point I won't have to work out what parts of the current approach matter, or do my own inconsistent version)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I’m on board for this being a follow up — I should’ve been more explicit. This isn’t a blocking concern, hence the approval. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, having played around with this, I did move the code to a template, without the newline-matching code. Let me know what you think, I'm OK with it but I'm neutral on whether it's an improvement.


import os
import runpy
import sys

lib = os.path.join(os.path.dirname(__file__), "lib")
sys.path.insert(0, lib)

{version_check}
runpy.run_module("pip", run_name="__main__")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need some sort of minimum python version compatibility check here ? Or is it done by pip somewhere else ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what you mean here. Are you concerned about someone running pip.pyz with Python 3.5? I'm fairly tempted to just say "don't do that". I don't know if pip has such a check, because the Python-requires metadata should stop it being installed for an unsupported version. The zipapp wouldn't have that protection, but is that a major problem?

I guess I could extract Requires-Python from the wheel I'm packing into the zipapp, and embed that into the __main__.py script so I could do a test against sys.executable before calling runpy. Is it worth it? I don't know. Reading metadata in a standards-compliant way is pretty messy. I could probably do it by hacking around with semi-documented bits of importlib.metadata. Or I could just add a dependency on pkg_metadata to the generate script, that would only be at build time. Or I could go full-on hack mode and just look for a line in pip-*.dist-info/METADATA that starts with "Requires-Python:" That would look something like

elif info.filename.endswith(".dist-info/METADATA"):
    data = src.read(info).decode("utf-8")
    m = re.search(r"^Requires-Python:\s*(.*)$", data, re.MULTILINE|re.IGNORECASE)
    rp = None
    if m:
        rp = m.group(1)
    console.log(f"  Zipapp requires Python {rp!r}")

The __main__.py script in the zipapp could use pip's vendored copy of packaging to do the actual check.

What do you think? I'm willing to include one of those approaches if you feel it's worthwhile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: The check itself isn't too hard, and on reflection I'd just parse the metadata file using pkg_metadata, so implementing this isn't a major issue, the main question is whether it's worth it in the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main question is whether it's worth it in the first place.

Well, even today we sometimes get support requests for users who manage to run pip with an unsupported python.
With this approach I can see myself wrapping the zipapp in a global pip script and launching it from active virtual environments. I'm sure I'll end up myself launching it by accident from older python 2 or python 3.5 environment so...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fair point. I'll add it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbidoul coming round to this again, the code for the version check here is not compatible with older Python versions. This is the same issue as we had with __pip-runner__.py. However, in this case, I'm a lot less comfortable trying to hard code the version as a tuple, because we'd have to either keep it in sync with the pip repository, or do something nasty like try to read the tuple from __pip-runner__.py.

What are your thoughts here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a version of the check which looks at Python-Requires, and as long as it has the form ">= X.Y" it adds a compatible version check like we do in __pip-runner__.py. If the requirement is more complex than that (which hopefully it won't ever be) the code simply omits the check with a warning (in the generate script, not in the zipapp itself).

We'd have to revisit this if we ever needed a more complex Python requirement in pip in the future. Maybe we should therefore abort the generate rather than just warn? But given that we don't generate here until release day, it might be a bit late to find out at that point 🙁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the zipapp use __pip_runner__.py ? (Disclaimer: I've not looked closely at all, just throwing the idea, which maybe complete nonsense)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that, and I couldn't work out how. Because __pip-runner__.py locates pip from its own location, it has to be run in place. I tried to get runpy.run_script to work, but it wouldn't co-operate, and I honestly don't know why :-( And I don't reqlly want to invoke pip via an extra subprocess, that just seems wasteful.

I think it should be possible to use runpy.run_script for this, but the runpy docs are so obscure I find it really hard to work out how to do anything with them :-( I'll do some further tests, though - I might be able to get it to work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it appears that neither runpy.run_script nor the Python interpreter itself will run a file as python arc.zip/something.py. So I think we have to copy the check.

I've updated the check to be 2.x compatible, by parsing a Requires-Python of ">=X.Y". I'm happy to improve this, but I don't have any particularly good ideas of what would be better.

"""

# /!\ This version compatibility check section must be Python 2 compatible. /!\
VERSION_CHECK = """\
PYTHON_REQUIRES = ({major}, {minor})

def version_str(version): # type: ignore
return ".".join(str(v) for v in version)

if sys.version_info[:2] < PYTHON_REQUIRES:
raise SystemExit(
"This version of pip does not support python " +
version_str(sys.version_info[:2]) +
" (requires >= " +
version_str(PYTHON_REQUIRES) +
")."
)

"""

def generate_zipapp(pip_version: Version, *, console: Console, pip_versions: Dict[Version, Tuple[str, str]]) -> None:
wheel_url, wheel_hash = pip_versions[pip_version]
console.log(f" Downloading [green]{Path(wheel_url).name}")
original_wheel = download_wheel(wheel_url, wheel_hash)

zipapp_name = f"public/pip-{pip_version}.pyz"

console.log(f" Creating [green]{zipapp_name}")
with open(zipapp_name, "wb") as f:
# Write shebang at the start of the file
f.write(b"#!/usr/bin/env python\n")

# Write the remainder of the zipapp as a zipfile
with ZipFile(f, mode="w") as dest:
console.log(" Copying pip from original wheel to zipapp")

version_check = ""
with ZipFile(io.BytesIO(original_wheel)) as src:
for info in src.infolist():
# Ignore all content apart from the "pip" subdirectory
if info.filename.startswith("pip/"):
data = src.read(info)
info.filename = "lib/" + info.filename
dest.writestr(info, data)
elif info.filename.endswith(".dist-info/METADATA"):
data = bytes_to_json(src.read(info))
if "requires_python" in data:
py_req = data["requires_python"]
py_req = py_req.replace(" ", "")
m = re.match(r"^>=(\d+)\.(\d+)$", py_req)
if m:
major, minor = map(int, m.groups())
version_check = VERSION_CHECK.format(major=major, minor=minor)
console.log(f" Zipapp requires Python {py_req}")
else:
console.log(f" Python requirement {py_req} too complex - check skipped")

# Write the main script
# Use a ZipInfo object to ensure reproducibility - otherwise the current time
# is embedded in the file. We also set the create_system to 0 (DOS), as otherwise
# it defaults to a value that depends on the OS we're running on.
main_info = ZipInfo()
main_info.filename = "__main__.py"
main_info.create_system = 0
dest.writestr(main_info, ZIPAPP_MAIN.format(version_check=version_check))

# Make the unversioned pip.pyz
shutil.copyfile(zipapp_name, "public/pip.pyz")


def main() -> None:
console = Console()
Expand All @@ -290,6 +374,9 @@ def main() -> None:
status.update(f"Working on [magenta]{legacy}")
generate_moved(legacy, console=console, location=current)

with console.status("Generating zipapp...") as status:
generate_zipapp(max(pip_versions), console=console, pip_versions=pip_versions)


if __name__ == "__main__":
main()