Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return values in module extensions #17048

Closed
matts1 opened this issue Dec 19, 2022 · 7 comments
Closed

Return values in module extensions #17048

matts1 opened this issue Dec 19, 2022 · 7 comments
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request

Comments

@matts1
Copy link
Contributor

matts1 commented Dec 19, 2022

Description of the feature request:

Allow a module extension to return a value:

my_extension = use_extension(...)
value = my_extension.my_function(...)
# do something with value

What underlying problem are you trying to solve with this feature?

If a module extension returns a large number of git repositories, it becomes infeasible to use_repo on each one individually.

Here's an example:

bazel_dep(name = "gazelle", version = "0.27.0")
go_deps = use_extension("@gazelle//:extensions.bzl", "go_deps")
go_deps.from_file(go_mod = "//:go.mod")
use_repo(go_deps,
    "com_github_gogo_protobuf",
    ...
)

In this example, we need to specify the dependencies once in the go.mod file, and once again in  the MODULE.bazel file. I've tested, and the following does appear to work:

bazel_dep(name = "gazelle", version = "0.27.0")
go_deps = use_extension("@gazelle//:extensions.bzl", "go_deps")
go_deps.from_file(go_mod = "//:go.mod")
GO_REPOS = ["com_github_gogo_protobuf", ...]
use_repo(go_deps, *GO_REPOS)

Would it be possible to support the following:

bazel_dep(name = "gazelle", version = "0.27.0")
go_deps = use_extension("@gazelle//:extensions.bzl", "go_deps")
go_repos = go_deps.from_file(go_mod = "//:go.mod")
use_repo(go_deps, *go_repos)

This would allow go.mod to be our source of truth for go repositories (and would allow other languages to do the same).

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

release 6.0.0-pre.20221012.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

N/A (using repo tool, many repos)

Have you found anything relevant by searching the web?

Python appears to take an alternative approach where they write every package to a single repository. This would be a theoretically possible alternative, but:

  • go has historically used one repo per package, and this would make it much harder to upgrade existing go repositories to bzlmod
  • I assume other languages may have similar issues
  • I tried writing a repo_rule for rust recently which did a similar thing, and it's much simpler to work with one package per repository
  • Depending on how this was implemented in bazel, there may be degraded performance for putting all things in a single repository, including potentially:
    • Unable to do a partial download if you only depend upon some of the things declared by the module extension
    • Unable to download these repos in parallel
    • A change to one package version could result in having to re-download every one.

Any other information, logs, or outputs that you want to share?

No response

@matts1
Copy link
Contributor Author

matts1 commented Dec 19, 2022

@Wyverald I can't appear to assign the feature request to you, but since it's a module extension FR I assume you should be the one triaging?

@sgowroji sgowroji added type: feature request team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. untriaged labels Dec 19, 2022
@fmeum
Copy link
Collaborator

fmeum commented Dec 19, 2022

Previous discussion on this topic: https://gist.github.com/fmeum/c87c183ad6ffc55add0ca04232761ea1

go_repos = go_deps.from_file(go_mod = "//:go.mod")
use_repo(go_deps, *go_repos)

can't quite work since tags in MODULE.bazel are just metadata, they don't invoke any actual logic - module extensions are fetched lazily and in parallel, not while MODULE.bazel files are evaluated.

But as far as I understand the implementation, a tag could "return" an opaque object that use_repo accepts and resolves to all repos that the module extension marked as direct deps introduced by that particular tag via new Starlark API.

I am more and more convinced that we essentially get to choose between:

  1. Keep MODULE.bazel capabilities as is and accept that large monorepos will have to resort to tooling (e.g. Gazelle) to manage it. Note that MODULE.bazel can't be segmented, so it won't be possible to extract deps lists into macros as with WORKSPACE.
  2. Keep MODULE.bazel brief and human-editable, but at the cost of introducing new API and (likely rare in practice) cases of "repository has been defined twice: go_repos and other_go_repos" errors.

I would personally prefer 2): Module extensions already have access to all the API and deps information needed to manage external deps without additional, external tooling. Gazelle and other similar tools only operate on the syntax of MODULE.bazel and thus further limit the possible uses of Starlark in MODULE.bazel (variables, list comprehensions, ...).

@Wyverald
Copy link
Member

@matts1 The pattern we've been recommending is to use a "hub" repo that contains alias rules to other "satellite" repos that the module extension generates. Users would use_repo the hub repo only and still get the benefits of partial downloads, caches, etc.

This obviously doesn't work if the satellite repos have a lot of targets in them (which would mean that the hub contains the sum total of all targets in the satellite repos), which is unfortunately the case with rules_go. I don't have a good solution for this.


@fmeum

Keep MODULE.bazel brief and human-editable, but at the cost of introducing new API and (likely rare in practice) cases of "repository has been defined twice: go_repos and other_go_repos" errors.

The problem with that is we'd need to run all module extensions to resolve repo mappings for any given repo. Currently, the repo mapping of a non-module-extension-generated repo can be calculated from parsing MODULE.bazel files, without running any extensions. If we were to have something like use_all_repos, we'd need to run all extensions with use_all_repos eagerly, which is still okay except for the fact that we now allow extensions to use repos from other extensions. Essentially, we need to choose between use_all_repos, and only allowing extensions to see non-extension repos; otherwise, I don't see a way to structure the order of evaluations for extensions.

Gazelle and other similar tools only operate on the syntax of MODULE.bazel and thus further limit the possible uses of Starlark in MODULE.bazel (variables, list comprehensions, ...).

That may not be a bad thing -- we also recommend against extensive usage of Starlark in BUILD files.

@Wyverald Wyverald added P2 We'll consider working on this in future. (Assignee optional) area-Bzlmod Bzlmod-specific PRs, issues, and feature requests and removed untriaged labels Dec 19, 2022
@fmeum
Copy link
Collaborator

fmeum commented Dec 24, 2022

That's a very good point, I didn't think about the implications for extension evaluation.

I thought about alternative solutions and had an idea: How about having a Bazel module auto_use_repo that any module can use to automatically maintain the root module's use_repo calls? It would implement the MODULE.bazel-equivalent of Gazelle's update-repos, but 1) in a way that isn't Go-specific and 2) could use the existing module extension logic and thus be controlled entirely from Starlark.

@matts1 Do you think something like that could work for rules_rust?

@fmeum
Copy link
Collaborator

fmeum commented Jan 4, 2023

I created auto_use_repo. The README links to a Gazelle PR that adds support for it to go_deps.

@matts1
Copy link
Contributor Author

matts1 commented Jan 5, 2023

I don't think we need it for rules_rust - rules_rust should work just fine with the hub-and-spoke model that @Wyverald suggested, as it already is capable of generating spoke repos and a centralized build file that aliases to them, given the right configuration.

@Wyverald
Copy link
Member

This is addressed by https://docs.google.com/document/d/1dj8SN5L6nwhNOufNqjBhYkk5f-BJI_FPYWKxlB3GAmA/edit. The API will be available in 6.2.0.

aherrmann added a commit to tweag/rules_nixpkgs that referenced this issue Sep 29, 2023
This uses the new isolated module extensions feature [1], instead of a
hub-repository approach [2] [3], to distinguish globally unified
repositories and locally specialized repositories.

[1]: bazelbuild/bazel#18529
[2]: bazelbuild/bazel#17048
[3]: bazelbuild/bazel#17493
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Bzlmod Bzlmod-specific PRs, issues, and feature requests P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request
Projects
None yet
Development

No branches or pull requests

4 participants