Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pip_repository): Support pip parse cycles #1166

Merged

Conversation

arrdem
Copy link
Contributor

@arrdem arrdem commented Apr 12, 2023

This patch reworks the pip_repository machinery to allow users to manually annotate groups of libraries which form packaging cycles in PyPi and must be simultaneously installed.

The strategy here is to transform any dependencies A and B which have dependencies and are mutually dependent

graph LR;
    A-->B;
    A-->D;
    A-->E;
    B-->A;
    B-->F;
    B-->G;
Loading

into a new "dependency group" C which has A* and B* as dependencies, defined as A and B less any direct dependencies which are members of the group. This is viable for python because Python files just need to be emplaced into a runfiles directory for the interpreter. We don't actually have a true hard dependency between the build definition of A requiring the build product B be available which requires that the build product of A be available.

graph LR
     C-->A*;
     A*-->D;
     A*-->E;
     C-->B*;
     B*-->F;
     B*-->G;
Loading

This gets us most of the way there, as a user can now safely write requirement("A") and we can provide them with C, which has the desired effect of pulling in A, B and their respective transitives.

There is one remaining problem - a user writing deps = [requirement("A"), requirement("B")] will take a double direct dependency on C. So we need to insert a layer of indirection, generating C_A and C_B which serve only as unique aliases for C so that we can support the double dependency. Our final dependency graph then is as follows

graph LR
     C_A-->C;
     C_B-->C;
     C-->A*;
     A*-->D;
     A*-->E;
     C-->B*;
     B*-->F;
     B*-->G;
Loading

Addresses #1076, #1188

To do

  • Get rebased
  • Get re-validated manually
  • Buildifier
  • Get CI happy
  • Update documentation
  • Update changelog

@arrdem arrdem requested a review from hrfuller as a code owner April 12, 2023 16:31
@arrdem arrdem changed the title [EE-603] pip parse cycles Support pip parse cycles Apr 12, 2023
@arrdem arrdem force-pushed the arrdem/EE-603-pip-parse-cycles branch 4 times, most recently from a3f2800 to a03018d Compare April 12, 2023 19:26
@arrdem arrdem requested a review from rickeylev as a code owner April 12, 2023 19:27
@arrdem
Copy link
Contributor Author

arrdem commented Apr 12, 2023

Hrm. Maybe I'm missing something here but it looks like bazel-under-bazel for the tests isn't happy with the two new templates. Not sure how to proceed with a fix.

@arrdem arrdem changed the title Support pip parse cycles feat(pip_repository): Support pip parse cycles Apr 17, 2023
@chrislovecnm
Copy link
Collaborator

@arrdem The change you made is causing

(19:31:08) ERROR: error loading package under directory '': no such package '@rules_pythonoverridepip~pip//': in call to file(), parameter 'content' got value of type 'Label', want 'string'

--
  | (19:31:08) ERROR: An error occurred during the fetch of repository 'rules_python~override~pip~pip':
  | Traceback (most recent call last):
  | File "/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/31068de31d119d275d02c5ef2e251165/external/rules_python~override/python/pip_install/pip_repository.bzl", line 362, column 14, in _pip_repository_bzlmod_impl
  | rctx.file("BUILD.bazel", rctx.attr._build_template, substitutions = {
  | Error in file: in call to file(), parameter 'content' got value of type 'Label', want 'string'
  | (19:31:08) ERROR: <builtin>: fetching pip_repository_bzlmod rule @rules_python~override//python:rules_python~override~pip~pip: Traceback (most recent call last):
  | File "/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/31068de31d119d275d02c5ef2e251165/external/rules_python~override/python/pip_install/pip_repository.bzl", line 362, column 14, in _pip_repository_bzlmod_impl
  | rctx.file("BUILD.bazel", rctx.attr._build_template, substitutions = {
  | Error in file: in call to file(), parameter 'content' got value of type 'Label', want 'string'
  | (19:31:08) INFO: Repository gazelle~0.29.0 instantiated at:
  | callstack not available

I'm seeing it here: https://buildkite.com/bazel/rules-python-python/builds/4591#018776ef-c472-4122-8b4b-2e30c38bee88

@rickeylev
Copy link
Collaborator

@groodt

@arrdem arrdem force-pushed the arrdem/EE-603-pip-parse-cycles branch 6 times, most recently from 1b46a33 to 3a3b0f5 Compare May 18, 2023 02:09
@arrdem
Copy link
Contributor Author

arrdem commented May 18, 2023

Okay - I think this is ready for a review. Includes test updates, passes the test updates. The bzlmod story for this is ... there isn't one yet. If there's a way to map this strategy onto bzlmod I'm all ears, I'm just not familiar with that machinery and opted to partition it and retain current main behavior.

Copy link
Collaborator

@chrislovecnm chrislovecnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate your work! We are having problems on windows with this PR. Please take a look at CI - https://buildkite.com/bazel/rules-python-python/builds/4896#01882c9e-c77f-4596-bfbc-8ca5b5c2931a

@chrislovecnm
Copy link
Collaborator

@arrdem please rebase.

@arrdem arrdem force-pushed the arrdem/EE-603-pip-parse-cycles branch from 3a3b0f5 to bad3c20 Compare May 18, 2023 16:33
Copy link
Collaborator

@chrislovecnm chrislovecnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, thank you for the work. Second, I need your thoughts about how we will do multiple pip support in bzlmod. With these changes, you did you have knowledge that I need to learn about this code.

My overall comments:

  • I love to see over-documented code. You can blame Tim Hockin for this. He taught me we cannot remember why we wrote code changes three months ago.
  • Have we tested this with Gazelle? How does Gazelle interact with this new code? Hopefully, it ignores it, but we may want to document that Gazelle does not support this and file an issue.
  • We have to have test coverage with bzlmod. Please update the existing examples to use this new code.

Kudos on the PR description. That is awesome.

examples/pip_parse_vendored/requirements.bzl Outdated Show resolved Hide resolved
examples/pip_parse_vendored/requirements.bzl Outdated Show resolved Hide resolved
python/pip_install/pip_repository.bzl Outdated Show resolved Hide resolved
python/pip_install/pip_repository.bzl Outdated Show resolved Hide resolved
python/pip_install/pip_repository.bzl Outdated Show resolved Hide resolved
python/pip_install/tools/dependency_resolver/__init__.py Outdated Show resolved Hide resolved
@chrislovecnm
Copy link
Collaborator

Will this address #1188?

@chrislovecnm
Copy link
Collaborator

There is one remaining problem - a user writing deps = [requirement("A"), requirement("B")] will take a double direct dependency on C. So we need to insert a layer of indirection, generating C_A and C_B which serve only as shims to C so that we can support the double dependency.

If we are not addressing this problem, do we have an open issue?

@chrislovecnm
Copy link
Collaborator

@aignas, do you mind taking a look?

@arrdem
Copy link
Contributor Author

arrdem commented May 18, 2023

Will this address #1188?

I believe it will.

There is one remaining problem - a user writing deps = [requirement("A"), requirement("B")] will take a double direct dependency on C. So we need to insert a layer of indirection, generating C_A and C_B which serve only as shims to C so that we can support the double dependency.

If we are not addressing this problem, do we have an open issue?

I'll edit the PR description a bit to clarify. This problem was present in the original PR I submitted and was fixed yesterday. Note that requirement is generating a reference to an alias rule to resolve this.

@arrdem arrdem force-pushed the arrdem/EE-603-pip-parse-cycles branch 2 times, most recently from 6324e15 to addc079 Compare May 18, 2023 19:55
@arrdem
Copy link
Contributor Author

arrdem commented May 18, 2023

Tightened this up a bunch. Removed all unneeded changes that impacted the bzlmod machinery. Backed out some of the formatting changes. Eliminated the generated lib.bzl file. Updated all the code, comments and attrs to consistently refer to requirement clusters.

As far as I can determine at this point the bzlmod CI failures on windows are unrelated. I don't have a windows test environment available and I've eliminated all the parts of this changeset which impact bzlmod machinery. Guidance on getting to green there would be appreciated.

@arrdem arrdem force-pushed the arrdem/EE-603-pip-parse-cycles branch from 4db046a to 4284fe0 Compare November 25, 2023 00:21
@rickeylev
Copy link
Collaborator

This is some pretty solid work. Thanks, @arrdem!

@rickeylev rickeylev added this pull request to the merge queue Nov 28, 2023
Merged via the queue into bazelbuild:main with commit 4725d98 Nov 28, 2023
3 checks passed
@arrdem arrdem deleted the arrdem/EE-603-pip-parse-cycles branch November 28, 2023 21:33
github-merge-queue bot pushed a commit that referenced this pull request Nov 30, 2023
…1588)

This PR updates the docs for
#1166 since there were
some inconsistencies about the naming (`experimental_requirement_cycles`
vs `requirement_cycles`). Also moved the entry in the changelog from
`0.27` to `Unreleased` since it missed `0.27`. Can't wait for this
feature to land!
ahidalgob pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request May 15, 2024
In the last wave of providers #31416 we bumped min-airlfow-version to 2.4 and
added mechanism to verify min-airflow version is ok while importing, but
it turned out that there are cases where installing just old version of
airflow (with no constraints) brings the latest version of those
providers and causes new installation of airflow to fail. This is far
too common to ignore or require to use constraints, unfortunately  We do
not have min-airlfow-version in the preinstalled providers for one
reason only. For some tools that are NOT conforming to standards (such
as bazel), having min-airflow-version for those providers causes
circular dependency (even though technically dependencies in PyPI can -
and often are - circular, because dependencies in Python are not DAG and
can contain cycles.

This was in response to this issue in 2021 #17795.

However, this is really  bazel issue, and on top of it - it's recognized
as such and being fixed very recently
in bazelbuild/rules_python#1188 because there
are other packages that have similar problems (pytorch and triton being
popular couple). Also Bazel is not that popular in the Python world. 

Therefore, rather than trying to workaround the problem of bazel, we
encourage them to merge and release the fix
bazelbuild/rules_python#1166 (comment)
and call it out in our installation instructions, that bazel
installation might lead to problems like that.

If bazel does not fix it, this will only be a problem for Future
installations of airflow in a few months most likely. It will not impact
current bazel users installing old versions of Airflow (actually they
might start having problems now if we do not fix it and yank the 5
providers released yesterday)

GitOrigin-RevId: 547e352578fac92f072b269dc257d21cdc279d97
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 19, 2024
In the last wave of providers #31416 we bumped min-airlfow-version to 2.4 and
added mechanism to verify min-airflow version is ok while importing, but
it turned out that there are cases where installing just old version of
airflow (with no constraints) brings the latest version of those
providers and causes new installation of airflow to fail. This is far
too common to ignore or require to use constraints, unfortunately  We do
not have min-airlfow-version in the preinstalled providers for one
reason only. For some tools that are NOT conforming to standards (such
as bazel), having min-airflow-version for those providers causes
circular dependency (even though technically dependencies in PyPI can -
and often are - circular, because dependencies in Python are not DAG and
can contain cycles.

This was in response to this issue in 2021 #17795.

However, this is really  bazel issue, and on top of it - it's recognized
as such and being fixed very recently
in bazelbuild/rules_python#1188 because there
are other packages that have similar problems (pytorch and triton being
popular couple). Also Bazel is not that popular in the Python world. 

Therefore, rather than trying to workaround the problem of bazel, we
encourage them to merge and release the fix
bazelbuild/rules_python#1166 (comment)
and call it out in our installation instructions, that bazel
installation might lead to problems like that.

If bazel does not fix it, this will only be a problem for Future
installations of airflow in a few months most likely. It will not impact
current bazel users installing old versions of Airflow (actually they
might start having problems now if we do not fix it and yank the 5
providers released yesterday)

GitOrigin-RevId: 547e352578fac92f072b269dc257d21cdc279d97
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Nov 8, 2024
In the last wave of providers #31416 we bumped min-airlfow-version to 2.4 and
added mechanism to verify min-airflow version is ok while importing, but
it turned out that there are cases where installing just old version of
airflow (with no constraints) brings the latest version of those
providers and causes new installation of airflow to fail. This is far
too common to ignore or require to use constraints, unfortunately  We do
not have min-airlfow-version in the preinstalled providers for one
reason only. For some tools that are NOT conforming to standards (such
as bazel), having min-airflow-version for those providers causes
circular dependency (even though technically dependencies in PyPI can -
and often are - circular, because dependencies in Python are not DAG and
can contain cycles.

This was in response to this issue in 2021 #17795.

However, this is really  bazel issue, and on top of it - it's recognized
as such and being fixed very recently
in bazelbuild/rules_python#1188 because there
are other packages that have similar problems (pytorch and triton being
popular couple). Also Bazel is not that popular in the Python world. 

Therefore, rather than trying to workaround the problem of bazel, we
encourage them to merge and release the fix
bazelbuild/rules_python#1166 (comment)
and call it out in our installation instructions, that bazel
installation might lead to problems like that.

If bazel does not fix it, this will only be a problem for Future
installations of airflow in a few months most likely. It will not impact
current bazel users installing old versions of Airflow (actually they
might start having problems now if we do not fix it and yank the 5
providers released yesterday)

GitOrigin-RevId: 547e352578fac92f072b269dc257d21cdc279d97
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.