Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor venv export #17098

Merged
merged 6 commits into from
Oct 5, 2022
Merged

Refactor venv export #17098

merged 6 commits into from
Oct 5, 2022

Conversation

benjyw
Copy link
Contributor

@benjyw benjyw commented Oct 4, 2022

Unifies the resolve and tool exports, resolving a TODO to that effect.

A future change may re-introduce the venv symlink trick behind an option, for when faster export is required.

Saves a lot of superfluous zipping and unzipping, so
should speed export up quite a bit.

However note that copying a venv to dist/ takes non-trivial
time for larger venvs, so the symlink trick should be re-introduced
in a followup change.

This change also refactors the export code to unify the resolve
and tool exports, resolving a TODO to that effect.
@benjyw benjyw added the category:internal CI, fixes for not-yet-released features, etc. label Oct 4, 2022
@benjyw benjyw requested review from danxmoran and jsirois October 4, 2022 02:46
@jsirois
Copy link
Contributor

jsirois commented Oct 4, 2022

so should speed export up quite a bit

Do you have any anecdotes? Although this definitely should be faster, it'd be pretty embarrasing if not and we overlooked some subtlety. A comparison of cold and warm export times against even pants itself would be re-assuring.

@benjyw
Copy link
Contributor Author

benjyw commented Oct 4, 2022

Do you have any anecdotes? Although this definitely should be faster, it'd be pretty embarrasing if not and we overlooked some subtlety. A comparison of cold and warm export times against even pants itself would be re-assuring.

It is in fact, unexpectedly, a little slower on pants's own deps.

The invocation of python -m pex.tools venv, which is all the work in the warm case, is ~500ms slower with the VenvPex... very consistently so.

I do not understand this.

@benjyw
Copy link
Contributor Author

benjyw commented Oct 4, 2022

I wonder if this is due to materializing many small files from the original requirements VenvPex. That is probably unnecessary since I suppose we can execute pex.tools venv directly against the cached VenvPex? Will try that.

@benjyw
Copy link
Contributor Author

benjyw commented Oct 4, 2022

Yep, that is 1s faster than current main (3.5s instead of 4.5s to run pex.tools venv), so 1.5s faster than the current iteration of this PR. Will put up another commit with this improvement.

Also, optionally allowing the symlink trick will now be a lot easier, since we're already grabbing the full path to the venvpex.

@benjyw
Copy link
Contributor Author

benjyw commented Oct 4, 2022

Oh, hmm, it doesn't actually work properly, so that may be why it's faster...

@benjyw
Copy link
Contributor Author

benjyw commented Oct 4, 2022

@jsirois looks like python -m pex.tools ~/.cache/pants/named_caches/pex_root/venvs/XX/YY venv --pip --collisions-ok /tmp/venv_pex creates a venv in /tmp/venv_pex that does not have any of XX/YY's site-packages in its own site-packages.

Is that expected?

@benjyw
Copy link
Contributor Author

benjyw commented Oct 4, 2022

And note that this is the case when I explicitly use ~/.cache/pants/named_caches/pex_root/venvs/XX/YY/bin/python as the interpreter

@benjyw
Copy link
Contributor Author

benjyw commented Oct 4, 2022

On investigation, this because pex_root/venvs/XX/YY has no .deps, which is where pex.tools venv looks for dists in _populate_deps(). That XX/YY dir does have a PEX-INFO, so pex tools venv can create a PEX object from it and use that to enumerate the deps, it just assumes the deps live in .deps, which they don't for a venv pex.

Do I have that about right?

@jsirois
Copy link
Contributor

jsirois commented Oct 4, 2022

You do, but more generally you can only create a venv from a PEX - be it loose, packed or in zipapp form. You cannot create another venv from an existing venv using PEX_TOOLS. Your attempted use case should be failing fast ideally.

The point of using VenvPex internally would just be to be:

  1. Leveraging all the pre-lmdb'd wheel zips of the packed layout these use.
  2. Leveraging all the pre-installing of the wheel zips in the PEX_ROOT Pants uses.

It's 1 + 2 that should allow for the minimal amount of work creating a venv using PEX_TOOLS and the fully or partially pre-existing components of a packed PEX.

Ideally, you could just do this as a single step that turns a lockfile (or portions thereof) directly into a venv: pex-tool/pex#1752. You're effectively trying to do line 2 of the example here: pex-tool/pex#1752 (comment)

That line is complicated! and 1st creates a full PEX then builds a venv from it, but all in 1 line.

@benjyw
Copy link
Contributor Author

benjyw commented Oct 4, 2022

Yeah, hmm. Another reason to use a VenvPex with all the requirements is that it's somewhat likely that one already exists in the cache. But it sounds like there is no alternative to creating a throwaway real PEX?

@jsirois
Copy link
Contributor

jsirois commented Oct 4, 2022

Yeah, hmm. Another reason to use a VenvPex with all the requirements is that it's somewhat likely that one already exists in the cache.

I think that was only likely in repository.pex days or for tool PEXes. Unless there is a some target that depends on the entirety of a resolve, you will not get a cache hit for the user resolves.

But it sounds like there is no alternative to creating a throwaway real PEX?

Correct. The mitigations are items 1 & 2 I listed above - both packed wheel zips (and the bootstrap zip) and PEX_ROOT wheel installs are all re-used and there is no overall compression, its just a packed directory PEX. The final mitigation comes once pex-tool/pex#1752 is implemented.

@benjyw benjyw changed the title Use a VenvPex when exporting python venvs. Refactor venv export Oct 5, 2022
@benjyw
Copy link
Contributor Author

benjyw commented Oct 5, 2022

OK, this is now back to using a regular Pex, so the purpose of the change is now just the refactoring.

description="Get interpreter path and version",
argv=[
"-c",
"import sys; print(sys.executable); print('.'.join(str(x) for x in sys.version_info[0:3]))",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like a more robust way to get a kosher interpreter then the previous line interpreter = cast(PythonExecutable, requirements_pex.python) and its clarifying comment below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure about that. The internal-onlyness is a critical thing here that is just as obscure still.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal-onlyness was critical in the previous method of using requirements_pex.python. Why is it critical here? This must give us some compatible interpreter in any case, surely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to point out you replaced a comment and re-use of an already computed and correct interpreter with another comment elsewhere about internal onlyness and a re-compute.

As far as criticality goes, it's critical always and until creating and storing a fully zipped PEX is fast and efficient. It never has been either. Zipping is always slow, and packed is needed for remote caching to work ~at all / not be storing both a wheel zip and a copy of that zipped up in 18 different non-packed PEX zips in LMDB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair enough. In this case I interpreted "critical" to mean "for correctness" - i.e., requirements_pex.python is only guaranteed to be populated because of internal-onlyness. This code will blow up otherwise. Getting the path to an interpreter by running the pex instead, as this now does, sidesteps the need for this assumption and won't blow up if the Pex is not internal-only.

You're referring to internal-onlyness being critical for performance, which it definitely is.

But since performance is important, it may be better to blow up, as it indicates a code bug. I'll change this.

)
interpreter_path, py_version = res.stdout.strip().decode().split("\n")

# NOTE: We add a unique prefix to the pex_pex path to avoid conflicts when multiple
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably materialize the Pex PEX once, to some location under .pants.d say, rather than once, in a temporary location, per export? But that can wait.

@jsirois
Copy link
Contributor

jsirois commented Oct 5, 2022

A regular PEX incurs zipping overhead which can be large for large PEXes. It also means that large PEX blob gets LMDB'd. Did you try a layout="packed" regular PEX? That definitely should be faster to build than a regular PEX.

@benjyw
Copy link
Contributor Author

benjyw commented Oct 5, 2022

This is always an internal_only Pex, so it should always be packed.

@benjyw
Copy link
Contributor Author

benjyw commented Oct 5, 2022

I'll add a comment to that effect.

@jsirois
Copy link
Contributor

jsirois commented Oct 5, 2022

Ok, that's pretty indirect, but verified.

description="Get interpreter path and version",
argv=[
"-c",
"import sys; print(sys.executable); print('.'.join(str(x) for x in sys.version_info[0:3]))",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure about that. The internal-onlyness is a critical thing here that is just as obscure still.

@benjyw
Copy link
Contributor Author

benjyw commented Oct 5, 2022

OK, restored to previous logic, with comment.

@benjyw benjyw merged commit fc26779 into pantsbuild:main Oct 5, 2022
@benjyw benjyw deleted the refactor_venv_export branch October 5, 2022 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:internal CI, fixes for not-yet-released features, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants