-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distribute a zipapp version of pip #158
Conversation
scripts/generate.py
Outdated
lib = os.path.join(os.path.dirname(__file__), "lib") | ||
sys.path.insert(0, lib) | ||
|
||
runpy.run_module("pip", run_name="__main__") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need some sort of minimum python version compatibility check here ? Or is it done by pip somewhere else ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what you mean here. Are you concerned about someone running pip.pyz
with Python 3.5? I'm fairly tempted to just say "don't do that". I don't know if pip has such a check, because the Python-requires
metadata should stop it being installed for an unsupported version. The zipapp wouldn't have that protection, but is that a major problem?
I guess I could extract Requires-Python
from the wheel I'm packing into the zipapp, and embed that into the __main__.py
script so I could do a test against sys.executable
before calling runpy
. Is it worth it? I don't know. Reading metadata in a standards-compliant way is pretty messy. I could probably do it by hacking around with semi-documented bits of importlib.metadata
. Or I could just add a dependency on pkg_metadata
to the generate script, that would only be at build time. Or I could go full-on hack mode and just look for a line in pip-*.dist-info/METADATA
that starts with "Requires-Python:" That would look something like
elif info.filename.endswith(".dist-info/METADATA"):
data = src.read(info).decode("utf-8")
m = re.search(r"^Requires-Python:\s*(.*)$", data, re.MULTILINE|re.IGNORECASE)
rp = None
if m:
rp = m.group(1)
console.log(f" Zipapp requires Python {rp!r}")
The __main__.py
script in the zipapp could use pip's vendored copy of packaging
to do the actual check.
What do you think? I'm willing to include one of those approaches if you feel it's worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: The check itself isn't too hard, and on reflection I'd just parse the metadata file using pkg_metadata
, so implementing this isn't a major issue, the main question is whether it's worth it in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main question is whether it's worth it in the first place.
Well, even today we sometimes get support requests for users who manage to run pip with an unsupported python.
With this approach I can see myself wrapping the zipapp in a global pip
script and launching it from active virtual environments. I'm sure I'll end up myself launching it by accident from older python 2 or python 3.5 environment so...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, fair point. I'll add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbidoul coming round to this again, the code for the version check here is not compatible with older Python versions. This is the same issue as we had with __pip-runner__.py
. However, in this case, I'm a lot less comfortable trying to hard code the version as a tuple, because we'd have to either keep it in sync with the pip repository, or do something nasty like try to read the tuple from __pip-runner__.py
.
What are your thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a version of the check which looks at Python-Requires, and as long as it has the form ">= X.Y" it adds a compatible version check like we do in __pip-runner__.py
. If the requirement is more complex than that (which hopefully it won't ever be) the code simply omits the check with a warning (in the generate script, not in the zipapp itself).
We'd have to revisit this if we ever needed a more complex Python requirement in pip in the future. Maybe we should therefore abort the generate rather than just warn? But given that we don't generate here until release day, it might be a bit late to find out at that point 🙁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the zipapp use __pip_runner__.py
? (Disclaimer: I've not looked closely at all, just throwing the idea, which maybe complete nonsense)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that, and I couldn't work out how. Because __pip-runner__.py
locates pip from its own location, it has to be run in place. I tried to get runpy.run_script
to work, but it wouldn't co-operate, and I honestly don't know why :-( And I don't reqlly want to invoke pip via an extra subprocess, that just seems wasteful.
I think it should be possible to use runpy.run_script
for this, but the runpy
docs are so obscure I find it really hard to work out how to do anything with them :-( I'll do some further tests, though - I might be able to get it to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it appears that neither runpy.run_script
nor the Python interpreter itself will run a file as python arc.zip/something.py
. So I think we have to copy the check.
I've updated the check to be 2.x compatible, by parsing a Requires-Python
of ">=X.Y". I'm happy to improve this, but I don't have any particularly good ideas of what would be better.
Whelp, before we merge this, we need to update the CI checks to account for untracked files being generated by the generate script. |
Thanks, I wasn't aware of that. Added to the list. My current plan is to pick this up after the 22.2 release, at which time we can sort out details like
Edit: Wait, CI ran on this PR and didn't fail. So what exactly do you mean by accounting for untracked files? |
I think the CI should fail, if files that are going to be generated by the generation script are not checked in. Right now, it's not doing that. In other words, the zipapps generated are not staged/committed in this PR right now. Thus, when running the script would generate it, there are new files generated in the working/public directory that aren't checked in. This means that we can have a discrepency between what's checked in and what's auto-generated, which isn't great. The CI check is designed to ensure that our copies of get-pip.py are correct/up-to-date, for the version of pip that's been released. It, however, does not ensure that the newly-generated files are also included in the repository. I reckon, it should -- which is what I was describing in the previous comment about it needing an update. The zipapp files that this would generate would be marked as "untracked" by git right now and we should have the CI fail when this happens because it's got untracked files in the working directory after generation (similar to how pip's vendoring would fail if you modified a file in the |
FWIW, another example of the sort of situation this would trip on would be when adding 3.7 to the mapping on top (i.e. logic for generating public/3.7/get-pip.py), but not adding the generated file. That's a similar sort of "generation script generates an untracked file and CI does not complain". |
Ah, OK, so you're saying we need to improve CI, not specifically for this PR but in general. And we should do that before merging this because - I'm not quite sure why, but because this PR means we'll have more risk of forgetting to check in the newly generated files? I think I get this, but I'm somewhat hazy on the release process, and I'm not sure I can safely play around with it because the noxfile seems to actually do a push - plus, it's doing signed releases and I'm not set up with any signing mechanism, so I don't think I can even sign a release in the first place...? I guess this has changed a lot singe I last did a release, as I don't remember any of this. But as I say, I'll pick this up after 22.2 is out. |
Because this PR is adding new generated files (to the generation scripts) but not to the actual repository. This would mean that the next time someone runs the release automation, they'll also check in the additional zip app (if we land this before 22.2). |
OK, I'm just coming back to this. @pradyunsg what do you want to do about untracked files? As things stand, everything is fine (as far as I can tell by simply desk-checking the release scripts - see above regarding how it doesn't seem possible to test them). The generate script creates the new zipapp, the release process will add it (using I'll add some thoughts to #159. But are you going to insist that this PR must wait until #159 is merged in a form that handles new, untracked files? |
Test failures are because of #170. |
Added the zipapps to the PR, as per comments on #159 |
Hmm, why are the zipapps different when generated here? Maybe Windows vs Unix weirdness? |
Grr, it's file timestamps. |
Sigh, something else as well. I'll have to look at this later. But I will note that I don't think that checking that the files are binary-identical is a particularly good test here. It may be that it's flagging a real issue, but (as the timestamps show) it's just as likely it's a functionally irrelevant discrepancy. |
I’d argue that we should also have signatures and checksums on these files, at some point in the future (like we should for get-pip.py, as the open issues suggest), so generating them in a stable/reproducible manner is important. Besides, if we don’t have this be reproducible, it’d mean that we’d regenerate all of these files on every instance that we run the generate command (i.e. new binary files being checked into git for every pip release). Finally, I’ll remind you that all these checks and mechanisms were originally designed for the existing text files and making sure those are kept up to date (those are generated in a reproducible manner, though they bake in an existing “stable” wheel). I didn’t think of binary blobs when I was writing this CI. :P |
Understood, I wasn't suggesting the checks themselves were at fault, just that they don't take into account the subtleties of binary files - and I'd not been aware of the check when I made this change, so I didn't anticipate needing to deal with it. Regarding signatures and checksums, I see no issue with checksums, but I'm not sure how signatures would work. Do we have a "PyPA/pip" signature that we'd use for this? Expecting individual committers to use a personal signature doesn't seem ideal (I note that I can't currently do a release here, as the release automation does |
OK, that's fixed the reproducibility issue. Sorry about the complaints re the checks. I'm still not sure I follow the logic where we check in the generate script and the files, and then we have CI that checks that the generate produces the files we just added (I don't really understand what we're trying to check against) but that doesn't mean it's OK for me to come in after the fact and complain about it just because I don't like doing what it requires. Blame the fact that I was trying to do things in a hurry and got frustrated (not an excuse, I know). Let me know if you have any further concerns with this PR. |
As there have been no further comments, and I don't want to leave this until the next pip release is looming, I'd like to merge this as it stands and address any CI questions via followups. @pradyunsg are you OK with that? If so, could you approve the PR? Having requested your review, I'd rather not merge without your approval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with one note/concern around inlined code in a string.
scripts/generate.py
Outdated
ZIPAPP_MAIN = """\ | ||
#!/usr/bin/env python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this string, to be loaded from a file kept in the templates directory? It’ll need a change in line 122 as well.
It might be worth inlining the version check and allowing None
as a value on the variable post-render of the main script, to indicate skipping the version check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with doing this. However:
- The template processing code (including the
detect_newline
calls) will then be duplicated in 3 places, making it a good candidate for refactoring. - There's a possible bug in the code as it uses a raw
open()
without a specified encoding. We should probably review and fix this (even if the "fix" is simply to comment that omitting the encoding is OK) when we refactor.
I'd be happier to handle this refactoring as a follow-up PR, so we don't block this change on getting the details of the template handling right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I should have said "I'd be happier to handle switching to using a template here and refactoring the common code out as a follow-up PR". In other words, can we leave the code as an inline string until we refactor the template code out (at which point I won't have to work out what parts of the current approach matter, or do my own inconsistent version)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I’m on board for this being a follow up — I should’ve been more explicit. This isn’t a blocking concern, hence the approval. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, having played around with this, I did move the code to a template, without the newline-matching code. Let me know what you think, I'm OK with it but I'm neutral on whether it's an improvement.
Unfortunately, I just found some issues in manual testing. I could have sworn I tested with 3.9/3.8, but maybe not (and what's more, the pip zipapp tests are only for 3.10). The problem is in the stdlib zipimport with a subdirectory: import sys
import importlib.resources as ir
sys.path.insert(0,"../public/pip.pyz/lib")
print(len(ir.read_text("pip._vendor.certifi", "cacert.pem"))) This fails with an exception in Python 3.9 and earlier. As 3.9 is no longer receiving non-security fixes, we're not going to get this fixed. The issue seems to be with using a "lib" subdirectory in the zipapp and adding that to I've also added a (very simplistic) test that the zipapp works (i.e., runs pip or reports an incompatibility error) on all Python versions we test against, just to ensure we don't break this in the future. |
9ffde85
to
6291294
Compare
I guess I never responded to this and I'll try one last time... Think of this similarly to pip's vendoring CI check. It's primarily because we aren't the only people who might open a PR here (wherein we know exactly how stuff works) and to reduce the effort on the reviewers' end that the scripts included in the commit match what would be generated + work as advertised. Over on pip, we check that every PR that modifies vendoring files changes them in a way that's repeatable through our vendoring process which maintains clear patches and has pinned Python versions. This is validated in CI, so that means that any changes made in vendored files are clearly tracked in our patches and vendor.txt file. It's the same here even though it's useful for somewhat different reasons -- we're effectively embedding blurbs of opaque data into scripts that'll get executed in thousands of machines daily, and it's useful to have those opaque blurbs be generated in a manner that they can be validated and to ensure that the changes are reflected in both the templates and the generated scripts simultaneously. It also ensures that if we change either, we change both and that they evolve in sync. In effect, the public directory is like pip's own |
This PR adds generation of a
.pyz
version of pip to the build script. Two copies of the.pyz
are created in thepublic
directory - an unversionedpip.pyz
and a versionedpip-22.1.2.pyz
. My assumption is that users will typically fetch the unversioned zipapp (and that's what our documentation should suggest) but people who want to pin a specific pip version can fetch the versioned zipapp.As we don't do any deletion of generated files, older versioned zipapps will accumulate over time.
We could generate older versions, by fetching the wheels and rebuilding the zipapp. I'm not sure if this is worth it, but it might be if we ever plan on adding removal of previously generated files. In that case, we should probably decide on a policy for what zipapp versions we retain (regenerate) - I'd suggest versions for the current and previous year, as a simple rule.
This PR doesn't include a generated zipapp. I wanted to get the generation script sorted out and agreed, and then work on publicising the new method in the pip documentation, before actually publishing anything. Note we should probably not merge this before the pip 22.2 release, to give us time to do the pip side of things (which involves documenting the new distribution option, basically - but getting the messaging right is important so I don't think we want to rush this into 22.2).