Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose "human usable" workspace names for labels #20486

Closed
shs96c opened this issue Dec 11, 2023 · 12 comments
Closed

Expose "human usable" workspace names for labels #20486

shs96c opened this issue Dec 11, 2023 · 12 comments
Assignees
Labels
P1 I'll work on this now. (Assignee required) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request

Comments

@shs96c
Copy link
Contributor

shs96c commented Dec 11, 2023

Description of the feature request:

In rules_jvm_external we stamp jars so that buildifier can give accurate instructions. To do this, we need to know the workspace name that a human will use, rather than the canonical workspace name.

@fmeum has suggested opening up Label#getDisplayForm to Starlark to enable this.

Which category does this issue belong to?

Core, Documentation, External Dependency

What underlying problem are you trying to solve with this feature?

I need to be able to supply the workspace name that a user will use in dialogues and embedded information in artifacts.

Which operating system are you running Bazel on?

macOS

What is the output of bazel info release?

release 7.0.0rc7

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

This conversation on Slack

Any other information, logs, or outputs that you want to share?

No response

@sgowroji sgowroji added the team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. label Dec 11, 2023
@fmeum
Copy link
Collaborator

fmeum commented Dec 11, 2023

This is relevant even for the Java rules. Cc @hvadehra

@meteorcloudy meteorcloudy added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Dec 19, 2023
@aiuto
Copy link
Contributor

aiuto commented Jan 8, 2024

cc: @aiuto

@fmeum
Copy link
Collaborator

fmeum commented Jan 8, 2024

I could implement this as Label#to_display_form(). As @Wyverald described in the linked slack thread, making this a field rather than a method isn't feasible at this time.

It should be noted that this introduces the capability to sniff (and more crucially, branch on) whether something is a direct dependency of the root module, which is probably not something we want to encourage. But I don't see a way to prevent this and consider it acceptable.

fmeum added a commit to fmeum/bazel that referenced this issue Feb 1, 2024
Fixes bazelbuild#20486

Also use `to_display_form()` to generate Clang module names in the
correct form for external repositories. `java_*` rules require more
delicate handling and will be migrated in a follow-up change.

RELNOTES: The `to_display_form()` method on `Label` returns a string
representation of a label optimized for readability by humans.
@fmeum fmeum self-assigned this Feb 1, 2024
fmeum added a commit to fmeum/bazel that referenced this issue Feb 12, 2024
Fixes bazelbuild#20486

Also use `to_display_form()` to generate Clang module names in the
correct form for external repositories. `java_*` rules require more
delicate handling and will be migrated in a follow-up change.

RELNOTES: The `to_display_form()` method on `Label` returns a string
representation of a label optimized for readability by humans.
Wyverald pushed a commit that referenced this issue Feb 12, 2024
Fixes #20486

While `to_display_form()` can be called in all contexts, it only returns apparent names for BUILD threads, which are the most common use case for display form labels. Support for module extensions can be added later, but requires explicit tracking of inverse mappings in the lockfile.

Also use `to_display_form()` to generate Clang module names in the correct form for external repositories. `java_*` rules require more delicate handling and will be migrated in a follow-up change.

RELNOTES: The `to_display_form()` method on `Label` returns a string representation of a label optimized for readability by humans.

Closes #21179.

PiperOrigin-RevId: 606330539
Change-Id: Id6a4ad79fd3d50319320789b88c618aad28db28b
Wyverald pushed a commit that referenced this issue Feb 13, 2024
Fixes #20486

While `to_display_form()` can be called in all contexts, it only returns apparent names for BUILD threads, which are the most common use case for display form labels. Support for module extensions can be added later, but requires explicit tracking of inverse mappings in the lockfile.

Also use `to_display_form()` to generate Clang module names in the correct form for external repositories. `java_*` rules require more delicate handling and will be migrated in a follow-up change.

RELNOTES: The `to_display_form()` method on `Label` returns a string representation of a label optimized for readability by humans.

Closes #21179.

PiperOrigin-RevId: 606330539
Change-Id: Id6a4ad79fd3d50319320789b88c618aad28db28b
github-merge-queue bot pushed a commit that referenced this issue Feb 13, 2024
Fixes #20486

While `to_display_form()` can be called in all contexts, it only returns
apparent names for BUILD threads, which are the most common use case for
display form labels. Support for module extensions can be added later,
but requires explicit tracking of inverse mappings in the lockfile.

Also use `to_display_form()` to generate Clang module names in the
correct form for external repositories. `java_*` rules require more
delicate handling and will be migrated in a follow-up change.

RELNOTES: The `to_display_form()` method on `Label` returns a string
representation of a label optimized for readability by humans.

Closes #21179.

PiperOrigin-RevId: 606330539
Change-Id: Id6a4ad79fd3d50319320789b88c618aad28db28b

Co-authored-by: Fabian Meumertzheim <[email protected]>
@fmeum
Copy link
Collaborator

fmeum commented Feb 21, 2024

@justinhorvitz I initially resolved this issue by adding a to_display_form() method that returns the label as a string with an apparent repo name (if available). In particular, this matches the output of Label#toString() for labels pointing into the main repo.

I am now thinking that this isn't the best design: The current main use case for this label form is to pass an argument to a spawn that can be used for strict deps error messages, where labels should be safe to include in BUILD files, which canonical labels aren't.

However, doing that in a memory-efficient way With Args requires quite a bit of boilerplate. Ideally, all Label arguments to structured command lines would automatically end up formatted in this way, just like for main repo labels.

Do you see a way to get the required RepositoryMapping of the main repo into StarlarkCustomCommandline? A new field would probably result in a regression.

@justinhorvitz
Copy link
Contributor

Can you give me an example of a label getting included in a command line via Args?

I am not too familiar with repository mappings because they don't exist internally. Is there anything in the ctx passed to the rule implementation function that would give you the repository mapping? If not, can we add it, then maybe use map_each to format labels?

@fmeum
Copy link
Collaborator

fmeum commented Feb 22, 2024

A Starlark example: https://cs.opensource.google/bazel/bazel/+/2ae85c16c7c7f6cc69a8412a44db8061d4126bf3:src/main/starlark/builtins_bzl/common/java/java_common_internal_for_builtins.bzl;l=378

A native example:
https://cs.opensource.google/bazel/bazel/+/2ae85c16c7c7f6cc69a8412a44db8061d4126bf3:src/main/java/com/google/devtools/build/lib/rules/java/JavaCompileActionBuilder.java;l=318

In both cases, the label ends up in buildozer fixup commands and should thus use user-specified apparent repository names when available.

The BazelModuleContext has access to the repository mapping of a .bzl file's repository, which powers the Label() constructor. But for the case above, we always need the mapping of the main repository since that has the BUILD files that buildozer can edit. Repository mappings aren't objects that users should have arbitrary access to, so ideally they would just be able to turn a label into a string of this form - this is what Label.to_display_form() does.

However, while this function can be used with map_each, this (at least the native equivalent) appears to cause a memory regression at Google and is in particular wasteful for main repo labels. Thus, correct usage would look something like this:

def _to_display_form(label):
    return label.to_display_form()

def _my_rule_impl(ctx);
    ...
    if some_label.repo_name:
        args.add_all([some_label], map_each = _to_display_form)
    else:
        args.add(some_label)

This is very verbose and makes me think that to_display_form just isn't the right API. Since I don't know of any use case for adding a collection of labels to a command line, my current favorite solution would be to make args.add(some_label) work by type checking the argument and prepending the RepositoryMapping in StarlarkCustomCommandline if it is a label pointing into an external repo. But of course having all labels be subject to this, not just those in scalar args, would be cleaner. What do you think about that?

@justinhorvitz
Copy link
Contributor

this (at least the native equivalent) appears to cause a memory regression at Google and is in particular wasteful for main repo labels

Are you referring to #21180? The significant memory cost for that was the skyframe dep from every configured target on the repository mapping, not the command line. In fact, the benchmark build did not show any significant regression when run with --notrack_incremental_state in which edges are not stored.

We could instead try an approach like what CachingAnalysisEnvironment does for the stamp artifacts - request them from skyframe when the getter method is called, and use MissingDepException if not yet available (although I imagine the main repo mapping would always be available at that point).

my current favorite solution would be to make Args.add(some_label) work by type checking the argument and prepending the RepositoryMapping

I'm concerned about how this would change the behavior of any actions that currently pass a label in such a manner. Someone more familiar would have to weigh in about whether this concern is valid, but if so, you could create a specialized Args.add_display_label method.

Another option is something like Label.to_display_form(ctx) in which the implementation can avoid a skyframe dep altogether if the label is in the main repo, or else request the main repo mapping as a skyframe dep from the AnalysisEnvironment as discussed above.

copybara-service bot pushed a commit that referenced this issue Apr 1, 2024
`Label`s added to `Args` are now formatted in display form, which means that they will be repo-relative if referring to the main repo and use an apparent repository name if possible.

Previously, labels were formatted in repo-relative form if referring to the main repo and using canonical repository names exclusively for external repos. At the same time, `Args` docs explicitly stated that the exact stringification of any type other than `File` is unspecified. The previous behavior was problematic since it neither always used canonical labels (which could be useful for writing tests that check these labels against an allowlist) nor provided labels that could be included in BUILD files (canonical names are explicitly unstable). Furthermore, whether a label resulted in a string prefixed with a single or two `@`s already dependended on a user choice, namely the value of `--enable_bzlmod`.

Thus, this change is not considered to be breaking. It makes it so that existing rulesets relying on default stringification to return a BUILD-file compatible label in WORKSPACE continue to do so with Bzlmod with no code changes.

This change aims to be as memory efficient as possible:
* Single labels or sequences of labels that reference targets in the main repo incur no memory overhead.
* Labels referring to external repos as well as `NestedSet`s of labels result in an additional Skyframe dependency for each target using the command line as well as one additional reference (4 bytes) stored in the command line's `arguments`.

Work towards #20486

Closes #21702.

PiperOrigin-RevId: 620925978
Change-Id: I54aa807c41bf783aee223482d2309f5cee2726b5
copybara-service bot pushed a commit that referenced this issue Apr 9, 2024
The change in #21702 only fixed the buildozer fixup messages for `java_*` actions entirely defined in Starlark. This commit makes the analogous change to the remaining native actions.

Related to #20486 and #21702

Closes #21827.

PiperOrigin-RevId: 623253302
Change-Id: Iadc2b0d8ce64f359b921ad7a7c489111a3a29997
iancha1992 pushed a commit to iancha1992/bazel that referenced this issue Apr 9, 2024
The change in bazelbuild#21702 only fixed the buildozer fixup messages for `java_*` actions entirely defined in Starlark. This commit makes the analogous change to the remaining native actions.

Related to bazelbuild#20486 and bazelbuild#21702

Closes bazelbuild#21827.

PiperOrigin-RevId: 623253302
Change-Id: Iadc2b0d8ce64f359b921ad7a7c489111a3a29997
fmeum added a commit to fmeum/bazel that referenced this issue Apr 10, 2024
`Label`s added to `Args` are now formatted in display form, which means that they will be repo-relative if referring to the main repo and use an apparent repository name if possible.

Previously, labels were formatted in repo-relative form if referring to the main repo and using canonical repository names exclusively for external repos. At the same time, `Args` docs explicitly stated that the exact stringification of any type other than `File` is unspecified. The previous behavior was problematic since it neither always used canonical labels (which could be useful for writing tests that check these labels against an allowlist) nor provided labels that could be included in BUILD files (canonical names are explicitly unstable). Furthermore, whether a label resulted in a string prefixed with a single or two `@`s already dependended on a user choice, namely the value of `--enable_bzlmod`.

Thus, this change is not considered to be breaking. It makes it so that existing rulesets relying on default stringification to return a BUILD-file compatible label in WORKSPACE continue to do so with Bzlmod with no code changes.

This change aims to be as memory efficient as possible:
* Single labels or sequences of labels that reference targets in the main repo incur no memory overhead.
* Labels referring to external repos as well as `NestedSet`s of labels result in an additional Skyframe dependency for each target using the command line as well as one additional reference (4 bytes) stored in the command line's `arguments`.

Work towards bazelbuild#20486

Closes bazelbuild#21702.

PiperOrigin-RevId: 620925978
Change-Id: I54aa807c41bf783aee223482d2309f5cee2726b5
fmeum added a commit to fmeum/bazel that referenced this issue Apr 10, 2024
The change in bazelbuild#21702 only fixed the buildozer fixup messages for `java_*` actions entirely defined in Starlark. This commit makes the analogous change to the remaining native actions.

Related to bazelbuild#20486 and bazelbuild#21702

Closes bazelbuild#21827.

PiperOrigin-RevId: 623253302
Change-Id: Iadc2b0d8ce64f359b921ad7a7c489111a3a29997
fmeum added a commit to fmeum/bazel that referenced this issue Apr 10, 2024
The change in bazelbuild#21702 only fixed the buildozer fixup messages for `java_*` actions entirely defined in Starlark. This commit makes the analogous change to the remaining native actions.

Related to bazelbuild#20486 and bazelbuild#21702

Closes bazelbuild#21827.

PiperOrigin-RevId: 623253302
Change-Id: Iadc2b0d8ce64f359b921ad7a7c489111a3a29997
fmeum added a commit to fmeum/bazel that referenced this issue Apr 10, 2024
The argument was already available on 'print' and is necessary for full control over the formatting of error messages that rely on StarlarkValue's `debugPrint`.

Work towards bazelbuild#20486
fmeum added a commit to fmeum/bazel that referenced this issue Apr 10, 2024
Since `debugPrint` is meant for emitting potentially non-deterministic, it is safe to give it access to `StarlarkThread` information.

This will be used in a follow-up to let `print` and `fail` emit `Label`s in display form without having to track the reverse repo mapping for the sake of invalidation of repo rules or module extensions.

Work towards bazelbuild#20486
@fmeum fmeum added P1 I'll work on this now. (Assignee required) and removed P2 We'll consider working on this in future. (Assignee optional) labels Apr 11, 2024
copybara-service bot pushed a commit that referenced this issue Apr 15, 2024
The argument was already available on 'print' and is necessary for full control over the formatting of error messages that rely on StarlarkValue's `debugPrint`.

Work towards #20486

Closes #21961.

PiperOrigin-RevId: 625022818
Change-Id: I895b5844d8543f936fc31d367b12bc6291a10bf8
bazel-io pushed a commit to bazel-io/bazel that referenced this issue Apr 15, 2024
The argument was already available on 'print' and is necessary for full control over the formatting of error messages that rely on StarlarkValue's `debugPrint`.

Work towards bazelbuild#20486

Closes bazelbuild#21961.

PiperOrigin-RevId: 625022818
Change-Id: I895b5844d8543f936fc31d367b12bc6291a10bf8
github-merge-queue bot pushed a commit that referenced this issue Apr 15, 2024
The argument was already available on 'print' and is necessary for full
control over the formatting of error messages that rely on
StarlarkValue's `debugPrint`.

Work towards #20486

Closes #21961.

PiperOrigin-RevId: 625022818
Change-Id: I895b5844d8543f936fc31d367b12bc6291a10bf8

Commit
6b2e0db

Co-authored-by: Fabian Meumertzheim <[email protected]>
fmeum added a commit to fmeum/bazel that referenced this issue Apr 25, 2024
Since `debugPrint` is meant for emitting potentially non-deterministic, it is safe to give it access to `StarlarkThread` information.

This will be used in a follow-up to let `print` and `fail` emit `Label`s in display form without having to track the reverse repo mapping for the sake of invalidation of repo rules or module extensions.

Work towards bazelbuild#20486
fmeum added a commit to fmeum/bazel that referenced this issue May 9, 2024
Since `debugPrint` is meant for emitting potentially non-deterministic, it is safe to give it access to `StarlarkThread` information.

This will be used in a follow-up to let `print` and `fail` emit `Label`s in display form without having to track the reverse repo mapping for the sake of invalidation of repo rules or module extensions.

Work towards bazelbuild#20486
Kila2 pushed a commit to Kila2/bazel that referenced this issue May 13, 2024
The change in bazelbuild#21702 only fixed the buildozer fixup messages for `java_*` actions entirely defined in Starlark. This commit makes the analogous change to the remaining native actions.

Related to bazelbuild#20486 and bazelbuild#21702

Closes bazelbuild#21827.

PiperOrigin-RevId: 623253302
Change-Id: Iadc2b0d8ce64f359b921ad7a7c489111a3a29997
Kila2 pushed a commit to Kila2/bazel that referenced this issue May 13, 2024
The argument was already available on 'print' and is necessary for full control over the formatting of error messages that rely on StarlarkValue's `debugPrint`.

Work towards bazelbuild#20486

Closes bazelbuild#21961.

PiperOrigin-RevId: 625022818
Change-Id: I895b5844d8543f936fc31d367b12bc6291a10bf8
fmeum added a commit to fmeum/bazel that referenced this issue May 15, 2024
Since `debugPrint` is meant for emitting potentially non-deterministic, it is safe to give it access to `StarlarkThread` information.

This will be used in a follow-up to let `print` and `fail` emit `Label`s in display form without having to track the reverse repo mapping for the sake of invalidation of repo rules or module extensions.

Work towards bazelbuild#20486
fmeum added a commit to fmeum/bazel that referenced this issue May 21, 2024
This is a reland of 30b95e3 with a different approach to emitting display form labels that avoids adding a new `to_display_form()` method to `Label`:
* In action command lines, which are the most frequent use of labels in rule implementation functions, labels are automatically emitted in display form since 9d3a8b0.
* In module extensions and repository rules, if labels can be turned into display form, the inverse of the repository mapping would need to be tracked in lockfiles and marker files for correct incrementality. Furthermore, allowing implementation functions to access apparent names would allow them to "discriminate" against them, thus possibly breaking the user's capability to map repos at will via `use_repo` and `repo_name`. Similar to how providers on a target can't be enumerated, it is thus safer to not provide this information to the implementation functions directly.

This change thus implements `StarlarkValue#debugPrint` for `Label` to allow ruleset authors to emit labels in display form in warnings and error messages while ensuring that Starlark logic doesn't have access to this information. `print("My message", label)` degrades gracefully with older Bazel versions (it just prints a canonical label literal) and can thus be adopted immediately without a need for feature detection.

This requires changing the signature of `StarlarkValue#debugPrint` to receive the `StarlarkThread` instead of just the `StarlarkSemantics`. Since `debugPrint` is meant for emitting potentially non-deterministic information, it is safe to give it access to `StarlarkThread`.

Also improves the Bzlmod cycle reporter so that it prints helpful information on a cycle encountered in a previous iteration of this PR.

Fixes bazelbuild#20486

RELNOTES: `Label` instances passed to `print` or `fail` as positional arguments are now formatted with apparent repository names (optimized for human readability).

Closes bazelbuild#21963.

PiperOrigin-RevId: 635589078
Change-Id: If60fdc632a59f19dad0cb02312690c15a0540c8e
fmeum added a commit to fmeum/bazel that referenced this issue May 21, 2024
This is a reland of 30b95e3 with a different approach to emitting display form labels that avoids adding a new `to_display_form()` method to `Label`:
* In action command lines, which are the most frequent use of labels in rule implementation functions, labels are automatically emitted in display form since 9d3a8b0.
* In module extensions and repository rules, if labels can be turned into display form, the inverse of the repository mapping would need to be tracked in lockfiles and marker files for correct incrementality. Furthermore, allowing implementation functions to access apparent names would allow them to "discriminate" against them, thus possibly breaking the user's capability to map repos at will via `use_repo` and `repo_name`. Similar to how providers on a target can't be enumerated, it is thus safer to not provide this information to the implementation functions directly.

This change thus implements `StarlarkValue#debugPrint` for `Label` to allow ruleset authors to emit labels in display form in warnings and error messages while ensuring that Starlark logic doesn't have access to this information. `print("My message", label)` degrades gracefully with older Bazel versions (it just prints a canonical label literal) and can thus be adopted immediately without a need for feature detection.

This requires changing the signature of `StarlarkValue#debugPrint` to receive the `StarlarkThread` instead of just the `StarlarkSemantics`. Since `debugPrint` is meant for emitting potentially non-deterministic information, it is safe to give it access to `StarlarkThread`.

Also improves the Bzlmod cycle reporter so that it prints helpful information on a cycle encountered in a previous iteration of this PR.

Fixes bazelbuild#20486

RELNOTES: `Label` instances passed to `print` or `fail` as positional arguments are now formatted with apparent repository names (optimized for human readability).

Closes bazelbuild#21963.

PiperOrigin-RevId: 635589078
Change-Id: If60fdc632a59f19dad0cb02312690c15a0540c8e
@shs96c
Copy link
Contributor Author

shs96c commented May 21, 2024

What I'm considering as an alternative is to override StarlarkValue#debugPrint for Label and have that return the display form. Since the return value isn't available to Starlark, the dependency wouldn't need to be tracked.

This would be sufficient for my needs, yes.

github-merge-queue bot pushed a commit that referenced this issue May 21, 2024
…2460)

This is a reland of 30b95e3 with a
different approach to emitting display form labels that avoids adding a
new `to_display_form()` method to `Label`:
* In action command lines, which are the most frequent use of labels in
rule implementation functions, labels are automatically emitted in
display form since 9d3a8b0.
* In module extensions and repository rules, if labels can be turned
into display form, the inverse of the repository mapping would need to
be tracked in lockfiles and marker files for correct incrementality.
Furthermore, allowing implementation functions to access apparent names
would allow them to "discriminate" against them, thus possibly breaking
the user's capability to map repos at will via `use_repo` and
`repo_name`. Similar to how providers on a target can't be enumerated,
it is thus safer to not provide this information to the implementation
functions directly.

This change thus implements `StarlarkValue#debugPrint` for `Label` to
allow ruleset authors to emit labels in display form in warnings and
error messages while ensuring that Starlark logic doesn't have access to
this information. `print("My message", label)` degrades gracefully with
older Bazel versions (it just prints a canonical label literal) and can
thus be adopted immediately without a need for feature detection.

This requires changing the signature of `StarlarkValue#debugPrint` to
receive the `StarlarkThread` instead of just the `StarlarkSemantics`.
Since `debugPrint` is meant for emitting potentially non-deterministic
information, it is safe to give it access to `StarlarkThread`.

Also improves the Bzlmod cycle reporter so that it prints helpful
information on a cycle encountered in a previous iteration of this PR.

Fixes #20486

RELNOTES: `Label` instances passed to `print` or `fail` as positional
arguments are now formatted with apparent repository names (optimized
for human readability).

Closes #21963.

PiperOrigin-RevId: 635589078
Change-Id: If60fdc632a59f19dad0cb02312690c15a0540c8e

Closes #22136
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 I'll work on this now. (Assignee required) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request
Projects
None yet
9 participants