Distribute a zipapp version of pip #158

pfmoore · 2022-07-11T10:43:23Z

This PR adds generation of a .pyz version of pip to the build script. Two copies of the .pyz are created in the public directory - an unversioned pip.pyz and a versioned pip-22.1.2.pyz. My assumption is that users will typically fetch the unversioned zipapp (and that's what our documentation should suggest) but people who want to pin a specific pip version can fetch the versioned zipapp.

As we don't do any deletion of generated files, older versioned zipapps will accumulate over time.

We could generate older versions, by fetching the wheels and rebuilding the zipapp. I'm not sure if this is worth it, but it might be if we ever plan on adding removal of previously generated files. In that case, we should probably decide on a policy for what zipapp versions we retain (regenerate) - I'd suggest versions for the current and previous year, as a simple rule.

This PR doesn't include a generated zipapp. I wanted to get the generation script sorted out and agreed, and then work on publicising the new method in the pip documentation, before actually publishing anything. Note we should probably not merge this before the pip 22.2 release, to give us time to do the pip side of things (which involves documenting the new distribution option, basically - but getting the messaging right is important so I don't think we want to rush this into 22.2).

sbidoul · 2022-07-11T11:22:58Z

scripts/generate.py

+lib = os.path.join(os.path.dirname(__file__), "lib")
+sys.path.insert(0, lib)
+
+runpy.run_module("pip", run_name="__main__")


Do we need some sort of minimum python version compatibility check here ? Or is it done by pip somewhere else ?

I'm not quite sure what you mean here. Are you concerned about someone running pip.pyz with Python 3.5? I'm fairly tempted to just say "don't do that". I don't know if pip has such a check, because the Python-requires metadata should stop it being installed for an unsupported version. The zipapp wouldn't have that protection, but is that a major problem?

I guess I could extract Requires-Python from the wheel I'm packing into the zipapp, and embed that into the __main__.py script so I could do a test against sys.executable before calling runpy. Is it worth it? I don't know. Reading metadata in a standards-compliant way is pretty messy. I could probably do it by hacking around with semi-documented bits of importlib.metadata. Or I could just add a dependency on pkg_metadata to the generate script, that would only be at build time. Or I could go full-on hack mode and just look for a line in pip-*.dist-info/METADATA that starts with "Requires-Python:" That would look something like

elif info.filename.endswith(".dist-info/METADATA"): data = src.read(info).decode("utf-8") m = re.search(r"^Requires-Python:\s*(.*)$", data, re.MULTILINE|re.IGNORECASE) rp = None if m: rp = m.group(1) console.log(f" Zipapp requires Python {rp!r}")

The __main__.py script in the zipapp could use pip's vendored copy of packaging to do the actual check.

What do you think? I'm willing to include one of those approaches if you feel it's worthwhile.

Update: The check itself isn't too hard, and on reflection I'd just parse the metadata file using pkg_metadata, so implementing this isn't a major issue, the main question is whether it's worth it in the first place.

the main question is whether it's worth it in the first place.

Well, even today we sometimes get support requests for users who manage to run pip with an unsupported python.
With this approach I can see myself wrapping the zipapp in a global pip script and launching it from active virtual environments. I'm sure I'll end up myself launching it by accident from older python 2 or python 3.5 environment so...

OK, fair point. I'll add it.

@sbidoul coming round to this again, the code for the version check here is not compatible with older Python versions. This is the same issue as we had with __pip-runner__.py. However, in this case, I'm a lot less comfortable trying to hard code the version as a tuple, because we'd have to either keep it in sync with the pip repository, or do something nasty like try to read the tuple from __pip-runner__.py.

What are your thoughts here?

I have a version of the check which looks at Python-Requires, and as long as it has the form ">= X.Y" it adds a compatible version check like we do in __pip-runner__.py. If the requirement is more complex than that (which hopefully it won't ever be) the code simply omits the check with a warning (in the generate script, not in the zipapp itself).

We'd have to revisit this if we ever needed a more complex Python requirement in pip in the future. Maybe we should therefore abort the generate rather than just warn? But given that we don't generate here until release day, it might be a bit late to find out at that point 🙁

Could the zipapp use __pip_runner__.py ? (Disclaimer: I've not looked closely at all, just throwing the idea, which maybe complete nonsense)

I tried that, and I couldn't work out how. Because __pip-runner__.py locates pip from its own location, it has to be run in place. I tried to get runpy.run_script to work, but it wouldn't co-operate, and I honestly don't know why :-( And I don't reqlly want to invoke pip via an extra subprocess, that just seems wasteful.

I think it should be possible to use runpy.run_script for this, but the runpy docs are so obscure I find it really hard to work out how to do anything with them :-( I'll do some further tests, though - I might be able to get it to work.

No, it appears that neither runpy.run_script nor the Python interpreter itself will run a file as python arc.zip/something.py. So I think we have to copy the check.

I've updated the check to be 2.x compatible, by parsing a Requires-Python of ">=X.Y". I'm happy to improve this, but I don't have any particularly good ideas of what would be better.

pradyunsg · 2022-07-15T13:34:55Z

Whelp, before we merge this, we need to update the CI checks to account for untracked files being generated by the generate script.

pfmoore · 2022-07-15T15:35:22Z

Thanks, I wasn't aware of that. Added to the list. My current plan is to pick this up after the 22.2 release, at which time we can sort out details like

documenting and publicising the zipapp
finalising the deployment process and merging this PR
deciding whether we want to test the zipapp in pip's CI.

Edit: Wait, CI ran on this PR and didn't fail. So what exactly do you mean by accounting for untracked files?

pradyunsg · 2022-07-15T21:41:12Z

I think the CI should fail, if files that are going to be generated by the generation script are not checked in. Right now, it's not doing that.

In other words, the zipapps generated are not staged/committed in this PR right now. Thus, when running the script would generate it, there are new files generated in the working/public directory that aren't checked in. This means that we can have a discrepency between what's checked in and what's auto-generated, which isn't great.

The CI check is designed to ensure that our copies of get-pip.py are correct/up-to-date, for the version of pip that's been released. It, however, does not ensure that the newly-generated files are also included in the repository. I reckon, it should -- which is what I was describing in the previous comment about it needing an update. The zipapp files that this would generate would be marked as "untracked" by git right now and we should have the CI fail when this happens because it's got untracked files in the working directory after generation (similar to how pip's vendoring would fail if you modified a file in the _vendor directory).

pradyunsg · 2022-07-15T21:44:47Z

FWIW, another example of the sort of situation this would trip on would be when adding 3.7 to the mapping on top (i.e. logic for generating public/3.7/get-pip.py), but not adding the generated file. That's a similar sort of "generation script generates an untracked file and CI does not complain".

pfmoore · 2022-07-15T22:12:51Z

Ah, OK, so you're saying we need to improve CI, not specifically for this PR but in general. And we should do that before merging this because - I'm not quite sure why, but because this PR means we'll have more risk of forgetting to check in the newly generated files?

I think I get this, but I'm somewhat hazy on the release process, and I'm not sure I can safely play around with it because the noxfile seems to actually do a push - plus, it's doing signed releases and I'm not set up with any signing mechanism, so I don't think I can even sign a release in the first place...?

I guess this has changed a lot singe I last did a release, as I don't remember any of this. But as I say, I'll pick this up after 22.2 is out.

pradyunsg · 2022-07-16T08:46:58Z

I'm not quite sure why, but because this PR means we'll have more risk of forgetting to check in the newly generated files?

Because this PR is adding new generated files (to the generation scripts) but not to the actual repository. This would mean that the next time someone runs the release automation, they'll also check in the additional zip app (if we land this before 22.2).

pfmoore · 2022-09-03T10:30:59Z

OK, I'm just coming back to this.

@pradyunsg what do you want to do about untracked files? As things stand, everything is fine (as far as I can tell by simply desk-checking the release scripts - see above regarding how it doesn't seem possible to test them). The generate script creates the new zipapp, the release process will add it (using git add .). The CI doesn't check new files, I understand that. But I don't see that as blocking this PR - instead, I see this PR as illustrating that the approach in #159 needs some extra thought about how we handle new files being created by the generate process. I'm happy to have that discussion, but I see it as independent of this PR.

I'll add some thoughts to #159. But are you going to insist that this PR must wait until #159 is merged in a form that handles new, untracked files?

pfmoore · 2022-09-03T15:26:38Z

Test failures are because of #170.

pfmoore · 2022-09-04T16:35:34Z

Added the zipapps to the PR, as per comments on #159

pfmoore · 2022-09-04T16:39:09Z

Hmm, why are the zipapps different when generated here? Maybe Windows vs Unix weirdness?

pfmoore · 2022-09-04T16:49:33Z

Grr, it's file timestamps.

pfmoore · 2022-09-04T16:56:44Z

Sigh, something else as well. I'll have to look at this later. But I will note that I don't think that checking that the files are binary-identical is a particularly good test here. It may be that it's flagging a real issue, but (as the timestamps show) it's just as likely it's a functionally irrelevant discrepancy.

pradyunsg · 2022-09-04T17:30:41Z

I’d argue that we should also have signatures and checksums on these files, at some point in the future (like we should for get-pip.py, as the open issues suggest), so generating them in a stable/reproducible manner is important. Besides, if we don’t have this be reproducible, it’d mean that we’d regenerate all of these files on every instance that we run the generate command (i.e. new binary files being checked into git for every pip release).

Finally, I’ll remind you that all these checks and mechanisms were originally designed for the existing text files and making sure those are kept up to date (those are generated in a reproducible manner, though they bake in an existing “stable” wheel). I didn’t think of binary blobs when I was writing this CI. :P

pfmoore · 2022-09-04T20:16:18Z

Finally, I’ll remind you that all these checks and mechanisms were originally designed for the existing text files and making sure those are kept up to date (those are generated in a reproducible manner, though they bake in an existing “stable” wheel). I didn’t think of binary blobs when I was writing this CI. :P

Understood, I wasn't suggesting the checks themselves were at fault, just that they don't take into account the subtleties of binary files - and I'd not been aware of the check when I made this change, so I didn't anticipate needing to deal with it.

Regarding signatures and checksums, I see no issue with checksums, but I'm not sure how signatures would work. Do we have a "PyPA/pip" signature that we'd use for this? Expecting individual committers to use a personal signature doesn't seem ideal (I note that I can't currently do a release here, as the release automation does git tag --sign and I don't have a GPG signature). I think this is something that probably needs discussion between the pip committers somewhere - but I don't know what would be a good forum for that.

pfmoore · 2022-09-04T20:24:38Z

OK, that's fixed the reproducibility issue. Sorry about the complaints re the checks. I'm still not sure I follow the logic where we check in the generate script and the files, and then we have CI that checks that the generate produces the files we just added (I don't really understand what we're trying to check against) but that doesn't mean it's OK for me to come in after the fact and complain about it just because I don't like doing what it requires. Blame the fact that I was trying to do things in a hurry and got frustrated (not an excuse, I know).

Let me know if you have any further concerns with this PR.

pfmoore · 2022-09-17T09:47:42Z

As there have been no further comments, and I don't want to leave this until the next pip release is looming, I'd like to merge this as it stands and address any CI questions via followups. @pradyunsg are you OK with that? If so, could you approve the PR? Having requested your review, I'd rather not merge without your approval.

pradyunsg

LGTM, with one note/concern around inlined code in a string.

pradyunsg · 2022-07-15T13:27:13Z

scripts/generate.py

+ZIPAPP_MAIN = """\
+#!/usr/bin/env python


Let's move this string, to be loaded from a file kept in the templates directory? It’ll need a change in line 122 as well.

It might be worth inlining the version check and allowing None as a value on the variable post-render of the main script, to indicate skipping the version check.

I'm OK with doing this. However:

The template processing code (including the detect_newline calls) will then be duplicated in 3 places, making it a good candidate for refactoring.

There's a possible bug in the code as it uses a raw open() without a specified encoding. We should probably review and fix this (even if the "fix" is simply to comment that omitting the encoding is OK) when we refactor.

I'd be happier to handle this refactoring as a follow-up PR, so we don't block this change on getting the details of the template handling right.

... I should have said "I'd be happier to handle switching to using a template here and refactoring the common code out as a follow-up PR". In other words, can we leave the code as an inline string until we refactor the template code out (at which point I won't have to work out what parts of the current approach matter, or do my own inconsistent version)?

Yup, I’m on board for this being a follow up — I should’ve been more explicit. This isn’t a blocking concern, hence the approval. :)

OK, having played around with this, I did move the code to a template, without the newline-matching code. Let me know what you think, I'm OK with it but I'm neutral on whether it's an improvement.

…pp works

pfmoore · 2022-09-17T16:17:51Z

Unfortunately, I just found some issues in manual testing. I could have sworn I tested with 3.9/3.8, but maybe not (and what's more, the pip zipapp tests are only for 3.10).

The problem is in the stdlib zipimport with a subdirectory:

import sys
import importlib.resources as ir

sys.path.insert(0,"../public/pip.pyz/lib")
print(len(ir.read_text("pip._vendor.certifi", "cacert.pem")))

This fails with an exception in Python 3.9 and earlier. As 3.9 is no longer receiving non-security fixes, we're not going to get this fixed.

The issue seems to be with using a "lib" subdirectory in the zipapp and adding that to sys.path. So I've removed that directory layer from the zipfile and the top level now contains __main__.py and pip/, and we just add the pyz to sys.path. I can't think of a way that this will cause an issue.

I've also added a (very simplistic) test that the zipapp works (i.e., runs pip or reports an incompatibility error) on all Python versions we test against, just to ensure we don't break this in the future.

pradyunsg · 2022-09-17T19:52:34Z

then we have CI that checks that the generate produces the files we just added (I don't really understand what we're trying to check against)

I guess I never responded to this and I'll try one last time...

Think of this similarly to pip's vendoring CI check. It's primarily because we aren't the only people who might open a PR here (wherein we know exactly how stuff works) and to reduce the effort on the reviewers' end that the scripts included in the commit match what would be generated + work as advertised.

Over on pip, we check that every PR that modifies vendoring files changes them in a way that's repeatable through our vendoring process which maintains clear patches and has pinned Python versions. This is validated in CI, so that means that any changes made in vendored files are clearly tracked in our patches and vendor.txt file.

It's the same here even though it's useful for somewhat different reasons -- we're effectively embedding blurbs of opaque data into scripts that'll get executed in thousands of machines daily, and it's useful to have those opaque blurbs be generated in a manner that they can be validated and to ensure that the changes are reflected in both the templates and the generated scripts simultaneously. It also ensures that if we change either, we change both and that they evolve in sync.

In effect, the public directory is like pip's own _vendor directory. If we make changes there, we want those to happen because the generation inputs were changed (here: templates and the script, there: vendor.txt and patches) and that changes to those inputs are reflected in the output.

Distribute a zipapp version of pip

0c68c41

pfmoore requested a review from pradyunsg July 11, 2022 10:43

sbidoul reviewed Jul 11, 2022

View reviewed changes

Add a version check for the running python

9b0ec40

pradyunsg mentioned this pull request Jul 15, 2022

Error out on any changes to files in the repository after generation #159

Merged

pfmoore mentioned this pull request Jul 26, 2022

Ship pip as a standalone application pypa/pip#11243

Closed

Make the version check compatible with unsupported Pythons

048fdb1

pfmoore added 2 commits September 4, 2022 14:08

Merge branch 'main' into zipapp

2343a9b

Include the generated zipapps in the PR

ee477d8

Ensure zipapp generation is reproducible

e6725af

pfmoore force-pushed the zipapp branch from ee8a69d to e6725af Compare September 4, 2022 20:19

pradyunsg approved these changes Sep 17, 2022

View reviewed changes

pfmoore added 2 commits September 17, 2022 14:55

Move the zipapp main script to a template file

ed6a418

Remove the lib directory from the zipapp and add a test that the zipa…

91ade98

…pp works

Merge branch 'main' into zipapp

85cdef6

pfmoore force-pushed the zipapp branch 3 times, most recently from 9ffde85 to 6291294 Compare September 17, 2022 17:04

Fix the test script to be compatible with older Pythons

dc01b59

pfmoore force-pushed the zipapp branch from 6291294 to dc01b59 Compare September 17, 2022 19:03

pfmoore merged commit 56695dc into pypa:main Sep 18, 2022

pfmoore deleted the zipapp branch September 18, 2022 09:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distribute a zipapp version of pip #158

Distribute a zipapp version of pip #158

pfmoore commented Jul 11, 2022

sbidoul Jul 11, 2022

pfmoore Jul 11, 2022

pfmoore Jul 11, 2022

sbidoul Jul 11, 2022

pfmoore Jul 11, 2022

pfmoore Sep 3, 2022

pfmoore Sep 3, 2022

sbidoul Sep 3, 2022

pfmoore Sep 3, 2022

pfmoore Sep 3, 2022

pradyunsg commented Jul 15, 2022

pfmoore commented Jul 15, 2022 •

edited

Loading

pradyunsg commented Jul 15, 2022 •

edited

Loading

pradyunsg commented Jul 15, 2022 •

edited

Loading

pfmoore commented Jul 15, 2022

pradyunsg commented Jul 16, 2022

pfmoore commented Sep 3, 2022

pfmoore commented Sep 3, 2022

pfmoore commented Sep 4, 2022

pfmoore commented Sep 4, 2022

pfmoore commented Sep 4, 2022

pfmoore commented Sep 4, 2022

pradyunsg commented Sep 4, 2022 •

edited

Loading

pfmoore commented Sep 4, 2022

pfmoore commented Sep 4, 2022

pfmoore commented Sep 17, 2022

pradyunsg left a comment

pradyunsg Jul 15, 2022 •

edited

Loading

pfmoore Sep 17, 2022

pfmoore Sep 17, 2022

pradyunsg Sep 17, 2022

pfmoore Sep 17, 2022

pfmoore commented Sep 17, 2022

pradyunsg commented Sep 17, 2022 •

edited

Loading

		ZIPAPP_MAIN = """\
		#!/usr/bin/env python

Distribute a zipapp version of pip #158

Distribute a zipapp version of pip #158

Conversation

pfmoore commented Jul 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pradyunsg commented Jul 15, 2022

pfmoore commented Jul 15, 2022 • edited Loading

pradyunsg commented Jul 15, 2022 • edited Loading

pradyunsg commented Jul 15, 2022 • edited Loading

pfmoore commented Jul 15, 2022

pradyunsg commented Jul 16, 2022

pfmoore commented Sep 3, 2022

pfmoore commented Sep 3, 2022

pfmoore commented Sep 4, 2022

pfmoore commented Sep 4, 2022

pfmoore commented Sep 4, 2022

pfmoore commented Sep 4, 2022

pradyunsg commented Sep 4, 2022 • edited Loading

pfmoore commented Sep 4, 2022

pfmoore commented Sep 4, 2022

pfmoore commented Sep 17, 2022

pradyunsg left a comment

Choose a reason for hiding this comment

pradyunsg Jul 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pfmoore commented Sep 17, 2022

pradyunsg commented Sep 17, 2022 • edited Loading

pfmoore commented Jul 15, 2022 •

edited

Loading

pradyunsg commented Jul 15, 2022 •

edited

Loading

pradyunsg commented Jul 15, 2022 •

edited

Loading

pradyunsg commented Sep 4, 2022 •

edited

Loading

pradyunsg Jul 15, 2022 •

edited

Loading

pradyunsg commented Sep 17, 2022 •

edited

Loading