Skip to content

Commit

Permalink
feat(pip_parse): support patching 'whl_library'
Browse files Browse the repository at this point in the history
Before that the users had to rely on patching the actual wheel files and
uploading them as different versions to internal artifact stores if they
needed to modify the wheel dependencies. This is very common when
breaking dependency cycles in `pytorch` or `apache-airflow` packages.
With this feature we can support patching external PyPI dependencies via
unified patches passed into the `pip.whl_mods` extension and the legacy
`package_annotation` macro.

Fixes #1076.

Add a non-empty patch and show that there can be multiple patches

exp: A different design, that does not require us to put patches to annotations.json

Simplify the design and add extra notes in the implementation

fix: make the legacy WORKSPACE patching compatible with bazel 5.4

chore: update docs

refactor: s/module_override/whl_override

doc: update changelog

doc: improve documentation and code comments on the new features

s/whl_override/whl_library_override/g

refactor wheel installer

feat: support whl_overriding before extraction

Add a note

doc: update changelog

fixup: set better default values for patches

chore: add wheel_repackager.py to the list of pysrcs

fix rebase conflicts

fix: use better defaults for annotations

fixup: update docs

fixup: minor tidy up

add a comment on annotation support for bzlmod

refactor: whl patching to a separate function

finish cleaning up handling of whl_patches for python annotations

Add an empty patch to the pip_repository_annotations example

Move patch argument processing to a single place, next to annotations

Improve the script and add logging

feat!: remove legacy bzlmod patching example

doc: remove changelog entry for legacy patching

feat!: remove the patching support via pip annotations

feat!: remove support for whl_library patching for now
  • Loading branch information
aignas committed Sep 25, 2023
1 parent 0d0e183 commit c2dee2c
Show file tree
Hide file tree
Showing 13 changed files with 348 additions and 46 deletions.
4 changes: 2 additions & 2 deletions .bazelrc
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
# This lets us glob() up all the files inside the examples to make them inputs to tests
# (Note, we cannot use `common --deleted_packages` because the bazel version command doesn't support it)
# To update these lines, run tools/bazel_integration_test/update_deleted_packages.sh
build --deleted_packages=examples/build_file_generation,examples/build_file_generation/random_number_generator,examples/bzlmod,examples/bzlmod/entry_points,examples/bzlmod/entry_points/tests,examples/bzlmod/libs/my_lib,examples/bzlmod/other_module,examples/bzlmod/other_module/other_module/pkg,examples/bzlmod/runfiles,examples/bzlmod/tests,examples/bzlmod/tests/other_module,examples/bzlmod/whl_mods,examples/bzlmod_build_file_generation,examples/bzlmod_build_file_generation/other_module/other_module/pkg,examples/bzlmod_build_file_generation/runfiles,examples/multi_python_versions/libs/my_lib,examples/multi_python_versions/requirements,examples/multi_python_versions/tests,examples/pip_install,examples/pip_parse,examples/pip_parse_vendored,examples/pip_repository_annotations,examples/py_proto_library,tests/compile_pip_requirements,tests/compile_pip_requirements_test_from_external_workspace,tests/ignore_root_user_error,tests/pip_repository_entry_points
query --deleted_packages=examples/build_file_generation,examples/build_file_generation/random_number_generator,examples/bzlmod,examples/bzlmod/entry_points,examples/bzlmod/entry_points/tests,examples/bzlmod/libs/my_lib,examples/bzlmod/other_module,examples/bzlmod/other_module/other_module/pkg,examples/bzlmod/runfiles,examples/bzlmod/tests,examples/bzlmod/tests/other_module,examples/bzlmod/whl_mods,examples/bzlmod_build_file_generation,examples/bzlmod_build_file_generation/other_module/other_module/pkg,examples/bzlmod_build_file_generation/runfiles,examples/multi_python_versions/libs/my_lib,examples/multi_python_versions/requirements,examples/multi_python_versions/tests,examples/pip_install,examples/pip_parse,examples/pip_parse_vendored,examples/pip_repository_annotations,examples/py_proto_library,tests/compile_pip_requirements,tests/compile_pip_requirements_test_from_external_workspace,tests/ignore_root_user_error,tests/pip_repository_entry_points
build --deleted_packages=examples/build_file_generation,examples/build_file_generation/random_number_generator,examples/bzlmod,examples/bzlmod/entry_points,examples/bzlmod/entry_points/tests,examples/bzlmod/libs/my_lib,examples/bzlmod/other_module,examples/bzlmod/other_module/other_module/pkg,examples/bzlmod/patches,examples/bzlmod/runfiles,examples/bzlmod/tests,examples/bzlmod/tests/other_module,examples/bzlmod/whl_mods,examples/bzlmod_build_file_generation,examples/bzlmod_build_file_generation/other_module/other_module/pkg,examples/bzlmod_build_file_generation/runfiles,examples/multi_python_versions/libs/my_lib,examples/multi_python_versions/requirements,examples/multi_python_versions/tests,examples/pip_install,examples/pip_parse,examples/pip_parse_vendored,examples/pip_repository_annotations,examples/pip_repository_annotations/patches,examples/py_proto_library,tests/compile_pip_requirements,tests/compile_pip_requirements_test_from_external_workspace,tests/ignore_root_user_error,tests/pip_repository_entry_points
query --deleted_packages=examples/build_file_generation,examples/build_file_generation/random_number_generator,examples/bzlmod,examples/bzlmod/entry_points,examples/bzlmod/entry_points/tests,examples/bzlmod/libs/my_lib,examples/bzlmod/other_module,examples/bzlmod/other_module/other_module/pkg,examples/bzlmod/patches,examples/bzlmod/runfiles,examples/bzlmod/tests,examples/bzlmod/tests/other_module,examples/bzlmod/whl_mods,examples/bzlmod_build_file_generation,examples/bzlmod_build_file_generation/other_module/other_module/pkg,examples/bzlmod_build_file_generation/runfiles,examples/multi_python_versions/libs/my_lib,examples/multi_python_versions/requirements,examples/multi_python_versions/tests,examples/pip_install,examples/pip_parse,examples/pip_parse_vendored,examples/pip_repository_annotations,examples/pip_repository_annotations/patches,examples/py_proto_library,tests/compile_pip_requirements,tests/compile_pip_requirements_test_from_external_workspace,tests/ignore_root_user_error,tests/pip_repository_entry_points

test --test_output=errors

Expand Down
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@ A brief description of the categories of changes:
* (python_repository) Support `netrc` and `auth_patterns` attributes to enable
authentication against private HTTP hosts serving Python toolchain binaries.

* (bzlmod) Added patching support via `patches` and `patch_strip` arguments to
the new `pip.whl_override` tag class.

### Removed

* (bzlmod) The `entry_point` macro is no longer supported and has been removed
Expand Down
3 changes: 2 additions & 1 deletion docs/pip_repository.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions examples/bzlmod/MODULE.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,15 @@ pip.parse(
"@whl_mods_hub//:wheel.json": "wheel",
},
)

# You can add patches that will be applied on the extracted whl contents
pip.whl_override(
file = "requests-2.25.1-py2.py3-none-any.whl",
patch_strip = 1,
patches = [
"@//patches:empty.patch",
],
)
use_repo(pip, "pip")

bazel_dep(name = "other_module", version = "", repo_name = "our_other_module")
Expand Down
4 changes: 4 additions & 0 deletions examples/bzlmod/patches/BUILD.bazel
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
exports_files(
srcs = glob(["*.patch"]),
visibility = ["//visibility:public"],
)
Empty file.
95 changes: 92 additions & 3 deletions python/extensions/pip.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ You cannot use both the additive_build_content and additive_build_content_file a
whl_mods = whl_mods,
)

def _create_versioned_pip_and_whl_repos(module_ctx, pip_attr, whl_map):
def _create_versioned_pip_and_whl_repos(module_ctx, pip_attr, whl_map, whl_overrides):
python_interpreter_target = pip_attr.python_interpreter_target

# if we do not have the python_interpreter set in the attributes
Expand All @@ -97,9 +97,10 @@ def _create_versioned_pip_and_whl_repos(module_ctx, pip_attr, whl_map):
))
python_interpreter_target = INTERPRETER_LABELS[python_name]

python_version = version_label(pip_attr.python_version)
pip_name = "{}_{}".format(
hub_name,
version_label(pip_attr.python_version),
python_version,
)
requrements_lock = locked_requirements_label(module_ctx, pip_attr)

Expand Down Expand Up @@ -136,12 +137,17 @@ def _create_versioned_pip_and_whl_repos(module_ctx, pip_attr, whl_map):
# to.
annotation = whl_modifications.get(whl_name)
whl_name = normalize_name(whl_name)

whl_library(
name = "%s_%s" % (pip_name, whl_name),
requirement = requirement_line,
repo = pip_name,
repo_prefix = pip_name + "_",
annotation = annotation,
whl_patches = {
p: json.encode(args)
for p, args in whl_overrides.get(whl_name, {}).items()
},
python_interpreter = pip_attr.python_interpreter,
python_interpreter_target = python_interpreter_target,
quiet = pip_attr.quiet,
Expand All @@ -159,6 +165,42 @@ def _create_versioned_pip_and_whl_repos(module_ctx, pip_attr, whl_map):

whl_map[hub_name][whl_name][pip_attr.python_version] = pip_name + "_"

def _parse_whl_name(file):
if not file.endswith(".whl"):
fail("not a valid wheel: {}".format(file))

file = file[:-len(".whl")]

# Parse the following
# {distribution}-{version}(-{build tag})?-{python tag}-{abi tag}-{platform tag}.whl
head, _, platform_tag = file.rpartition("-")
if not platform_tag:
fail("cannot extract platform tag from the whl filename: {}".format(file))
head, _, abi_tag = head.rpartition("-")
if not abi_tag:
fail("cannot extract abi tag from the whl filename: {}".format(file))
head, _, python_tag = head.rpartition("-")
if not python_tag:
fail("cannot extract python tag from the whl filename: {}".format(file))
head, _, version = head.rpartition("-")
if not version:
fail("cannot extract version from the whl filename: {}".format(file))
distribution, _, maybe_version = head.partition("-")

if maybe_version:
version, build_tag = maybe_version, version
else:
build_tag = None

return struct(
distribution = distribution,
version = version,
build_tag = build_tag,
python_tag = python_tag,
abi_tag = abi_tag,
platform_tag = platform_tag,
)

def _pip_impl(module_ctx):
"""Implementation of a class tag that creates the pip hub(s) and corresponding pip spoke, alias and whl repositories.
Expand Down Expand Up @@ -247,6 +289,29 @@ def _pip_impl(module_ctx):
# Build all of the wheel modifications if the tag class is called.
_whl_mods_impl(module_ctx)

_overriden_whl_set = {}
whl_overrides = {}

for module in module_ctx.modules:
for attr in module.tags.whl_override:
whl_name = normalize_name(_parse_whl_name(attr.file).distribution)

if attr.file in _overriden_whl_set:
fail("Duplicate module overrides for '{}'".format(attr.file))
_overriden_whl_set[attr.file] = None

for patch in attr.patches:
if whl_name not in whl_overrides:
whl_overrides[whl_name] = {}

if patch not in whl_overrides[whl_name]:
whl_overrides[whl_name][patch] = struct(
patch_strip = attr.patch_strip,
whls = [],
)

whl_overrides[whl_name][patch].whls.append(attr.file)

# Used to track all the different pip hubs and the spoke pip Python
# versions.
pip_hub_map = {}
Expand Down Expand Up @@ -292,7 +357,12 @@ def _pip_impl(module_ctx):
python_versions = [pip_attr.python_version],
)

_create_versioned_pip_and_whl_repos(module_ctx, pip_attr, hub_whl_map)
_create_versioned_pip_and_whl_repos(
module_ctx,
pip_attr,
hub_whl_map,
whl_overrides,
)

for hub_name, whl_map in hub_whl_map.items():
for whl_name, version_map in whl_map.items():
Expand Down Expand Up @@ -428,6 +498,24 @@ cannot have a child module that uses the same `hub_name`.
}
return attrs

_whl_override_tag = tag_class(
attrs = {
"file": attr.string(
doc = """The Python wheel name which needs to be patched. This will be applied to all repositories that setup this wheel via the pip.parse tag class.""",
mandatory = True,
),
"patch_strip": attr.int(
default = 0,
doc = "The number of leading path segments to be stripped from the file name in the patches.",
),
"patches": attr.label_list(
doc = "A list of patches to apply to the repository *after* 'whl_library' is extracted and BUILD.bazel file is generated.",
mandatory = True,
),
},
doc = "Apply patches to a given Python wheel library defined by other tags in this extension.",
)

pip = module_extension(
doc = """\
This extension is used to make dependencies from pip available.
Expand Down Expand Up @@ -469,6 +557,7 @@ JSON files where referred to as annotations, and were renamed to whl_modificatio
extension.
""",
),
"whl_override": _whl_override_tag,
},
)

Expand Down
90 changes: 87 additions & 3 deletions python/pip_install/pip_repository.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -612,6 +612,51 @@ py_binary(
environ = common_env,
)

def _patch_whl_file(rctx, *, python_interpreter, whl_path, patches, **kwargs):
"""Patch a whl file and repack it to ensure that the RECORD metadata stays correct.
Args:
rctx: repository_ctx
python_interpreter: the host python interpreter used for executing a script.
whl_path: The whl file name to be patched.
patches: a label-keyed-string dict that has
json.encode(struct([whl_file], patch_strip]) as values. This
is to maintain flexibility and correct bzlmod extension interface
until we have a better way to define whl_library and move whl
patching to a separate place.
**kwargs: extras passed to rctx.execute.
"""

# extract files into the current directory for patching as rctx.patch
# does not support patching in another directory.
rctx.extract(whl_path)

whl_file = rctx.path(whl_path).basename[:-len(".orig.zip")]

found_a_match = False
whls_not_found = []
for patch_file, json_args in patches.items():
patch_dst = struct(**json.decode(json_args))
if whl_file in patch_dst.whls:
rctx.patch(patch_file, strip = patch_dst.patch_strip)
found_a_match = True
else:
whls_not_found.extend(patch_dst.whls)

# Should we parse the passed whl_names and match it to `whl_file` for better errors?
if not found_a_match:
fail("Could not find a match for {} in {}".format(whl_file, whls_not_found))

return rctx.execute(
[
python_interpreter,
"-m",
"python.pip_install.tools.wheel_installer.wheel_repackager",
whl_path,
],
**kwargs
)

def _whl_library_impl(rctx):
python_interpreter = _resolve_python_interpreter(rctx)
args = [
Expand All @@ -624,10 +669,41 @@ def _whl_library_impl(rctx):

args = _parse_optional_attrs(rctx, args)

# Manually construct the PYTHONPATH since we cannot use the toolchain here
environment = _create_repository_execution_environment(rctx, python_interpreter)

result = rctx.execute(
args,
args + ["--no-extract"] + (["--rename-to-zip"] if rctx.attr.whl_patches else []),
environment = environment,
quiet = rctx.attr.quiet,
timeout = rctx.attr.timeout,
)
if result.return_code:
fail("whl_library %s failed: %s (%s) error code: '%s'" % (rctx.attr.name, result.stdout, result.stderr, result.return_code))

whl_path = json.decode(rctx.read("whl_file.json"))["whl_file"]
if not rctx.delete("whl_file.json"):
fail("failed to delete the whl_file.json file")

if rctx.attr.whl_patches:
result = _patch_whl_file(
rctx,
python_interpreter = python_interpreter,
whl_path = whl_path,
patches = rctx.attr.whl_patches,
environment = environment,
quiet = rctx.attr.quiet,
timeout = rctx.attr.timeout,
)
if result.return_code:
fail("repackaging .whl %s failed: %s (%s) error code: '%s'" % (rctx.attr.name, result.stdout, result.stderr, result.return_code))

whl_path = whl_path.replace(".orig.zip", "")

result = rctx.execute(
args + ["--whl-file", whl_path],
# Manually construct the PYTHONPATH since we cannot use the toolchain here
environment = _create_repository_execution_environment(rctx, python_interpreter),
environment = environment,
quiet = rctx.attr.quiet,
timeout = rctx.attr.timeout,
)
Expand Down Expand Up @@ -658,6 +734,11 @@ def _whl_library_impl(rctx):
)
entry_points[entry_point_without_py] = entry_point_script_name

annotation = None
if rctx.attr.annotation:
json_contents = json.decode(rctx.read(rctx.attr.annotation))
annotation = struct(**json_contents)

build_file_contents = generate_whl_library_build_bazel(
repo_prefix = rctx.attr.repo_prefix,
dependencies = metadata["deps"],
Expand All @@ -667,7 +748,7 @@ def _whl_library_impl(rctx):
"pypi_version=" + metadata["version"],
],
entry_points = entry_points,
annotation = None if not rctx.attr.annotation else struct(**json.decode(rctx.read(rctx.attr.annotation))),
annotation = annotation,
)
rctx.file("BUILD.bazel", build_file_contents)

Expand Down Expand Up @@ -717,6 +798,9 @@ whl_library_attrs = {
mandatory = True,
doc = "Python requirement string describing the package to make available",
),
"whl_patches": attr.label_keyed_string_dict(
doc = "Patches to be applied after building/downloading the '.whl' file before generating BUILD.bazel files and extracting it. INTERNAL USE ONLY.",
),
"_python_path_entries": attr.label_list(
# Get the root directory of these rules and keep them as a default attribute
# in order to avoid unnecessary repository fetching restarts.
Expand Down
1 change: 1 addition & 0 deletions python/pip_install/private/srcs.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,5 @@ PIP_INSTALL_PY_SRCS = [
"@rules_python//python/pip_install/tools/wheel_installer:namespace_pkgs.py",
"@rules_python//python/pip_install/tools/wheel_installer:wheel.py",
"@rules_python//python/pip_install/tools/wheel_installer:wheel_installer.py",
"@rules_python//python/pip_install/tools/wheel_installer:wheel_repackager.py",
]
7 changes: 7 additions & 0 deletions python/pip_install/tools/wheel_installer/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,13 @@ py_library(
],
)

py_binary(
name = "wheel_repackager",
srcs = [
"wheel_repackager.py",
],
)

py_binary(
name = "wheel_installer",
srcs = [
Expand Down
14 changes: 14 additions & 0 deletions python/pip_install/tools/wheel_installer/arguments.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

import argparse
import json
import pathlib
from typing import Any


Expand Down Expand Up @@ -59,6 +60,19 @@ def parser(**kwargs: Any) -> argparse.ArgumentParser:
help="Use 'pip download' instead of 'pip wheel'. Disables building wheels from source, but allows use of "
"--platform, --python-version, --implementation, and --abi in --extra_pip_args.",
)
parser.add_argument(
"--whl-file", type=pathlib.Path, help="The file to be used for extraction."
)
parser.add_argument(
"--no-extract",
action="store_true",
help="Whether to extract the downloaded file.",
)
parser.add_argument(
"--rename-to-zip",
action="store_true",
help="Whether to rename the whl file to zip for easier patching.",
)
return parser


Expand Down
Loading

0 comments on commit c2dee2c

Please sign in to comment.