Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cquery doesn't filter ruleInputs according to used configurations #14610

Closed
illicitonion opened this issue Jan 20, 2022 · 6 comments
Closed

cquery doesn't filter ruleInputs according to used configurations #14610

illicitonion opened this issue Jan 20, 2022 · 6 comments
Assignees
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Configurability platforms, toolchains, cquery, select(), config transitions type: feature request

Comments

@illicitonion
Copy link
Contributor

Description of the problem / feature request:

Cquery follows selects across the build graph, but does not appear to perform this filtering to the Rule.rule_input field.

Accordingly, when trying to ascertain what files may affect a target in the current configuration, irrelevant files may be returned.

For example, when trying to query the transitive affecting files of this kind of target:

java_test(
    name = "ExampleTest",
    srcs = ["ExampleTest.java"],
)

Following the rule_input fields when evaluating bazel cquery 'deps(//java/example:ExampleTest) --output=jsonproto, there's the chain:

//java/example:ExampleTest -> @bazel_tools//tools/test:xml_writer -> @bazel_tools//tools/test:xml.exe

However, when the currently configured platform is not Windows, there should be no dependency on xml.exe, as that's Window-specific.

I think it would be reasonable to have to perform some logic to do this filtering (though it just being presented completely up-front would be great), but right now I don't think it's possible to follow any algorithm to decide that xml.exe isn't actually a dependency in the current configuration...

Feature requests: what underlying problem are you trying to solve with this feature?

Coming up with a hash for each target so that we can diff graphs across different commits of source code. See also #7962.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

With the above BUILD file, run a bazel cquery for that target, and try determine whether @bazel_tools//tools/test:xml.exe is depended on by the target.

What's the output of bazel info release?

release 4.2.1

/cc @gregestren

@illicitonion
Copy link
Contributor Author

Relatedly, it feels like these ruleInputs should be ConfiguredTargets not just labels - it's impossible to chase the ruleInput graph if a label is present under multiple configurations. For instance, say have a repo containing both a target using Java in the exec configuration:

java_binary(
    name = "jbin",
    srcs = ["JBin.java"],
    main_class = "JBin",
)
 
genrule(
    name = "run_jbin",
    tools = [":jbin"],
    cmd = "$(execpath :jbin) hello world > $@",
    outs = ["out"],
)

As well as a java_test which depends on a Java toolchain in the target configuration - these will both list @bazel_tools//tools/jdk:current_java_runtime as a ruleInput, but without specifying which configuration each is for, it's impossible to know which is affected by changes to host or target java toolchains. We can over-estimate and say that both are affected, but this may lead to significant over-building in the CI case.

@sdtwigg
Copy link
Contributor

sdtwigg commented Jan 31, 2022

So, looking specifically at the ruleInputs issue, this is as-intended. It is dumping the (potential) inputs for a Rule where Rule is a loading phase (post-BUILD file processing) concept and thus has no notion of a configuration. (A ConfiguredTarget is what results from taking a Rule and applying a configuration to it so that select statements can be resolved and an implementation executed.)

Can you clarify the underlying problem? I am not very familiar with the JSON proto output format. On first glance, for cquery, it looks like it just dumps the Rule info and configuration info for each target, which may not be sufficiently disambiguating for your use case.

@illicitonion
Copy link
Contributor Author

Can you clarify the underlying problem?
The root problem we're trying to solve here is to determine which targets Bazel thinks could have changed between two commits, so that we know what tests to run on CI, what binaries to re-deploy, etc - in order to do that, we need a list of files which, if they've changed, cause the target to have changed.

Taking the example from the original issue here, this dep chain:

//java/example:ExampleTest -> @bazel_tools//tools/test:xml_writer -> @bazel_tools//tools/test:xml.exe

By the time cquery is running, when not on Windows, Bazel knows that xml.exe is not a relevant ruleInput, because it has resolved this select:

filegroup(
  name = "xml_writer",
  srcs = select({
    "@bazel_tools//src/conditions:windows": ["@bazel_tools//tools/test:xml.exe"],
    "//conditions:default": ["@bazel_tools//tools/test:xml"],
  }),
)

Right now the code I'm writing to fingerprint each target does more work than it needs (hashing xml.exe) and will return false positives (if xml.exe has changed, it will say that the target has changed and needs re-running) - I'm trying to leverage information that Bazel already knows to avoid these two issues.

illicitonion added a commit to illicitonion/bazel that referenced this issue Mar 14, 2022
When the existing `--transitions` flag is set to a non-None value,
cquery will report the configuration its dependencies are configured in.
This allows for pruning away values from ruleInputs which are not
required in the current configuration.

Example output from running:
`bazel cquery --transitions=lite --output=jsonproto 'deps(...)'`:
```
[...]
        "ruleInput": ["@bazel_tools//src/conditions:host_windows", "@bazel_tools//src/tools/launcher:launcher", "@bazel_tools//tools/launcher:launcher_windows"],
        "configuredRuleInput": [{
          "label": "@bazel_tools//src/tools/launcher:launcher",
          "configurationChecksum": "01ec5513a9fc41b2a15570123817f3c2200ad9aeb21b1181d588a4b4f91d5693",
          "configurationId": 3
        }]
[...]
```

Previously on a non-Windows platform it was not possible to determine
that the two windows-specific ruleInputs are not actually required - now
we can see from the configuredRuleInput section that the
windows-specific dependencies have been pruned by selection, and we can
see that a transition has re-configured the ruleInput into the exec
configuration.

For dependencies which were not subject to transition (e.g. because
they're in a non-transition attribute), or which had no configuration
(e.g. because they're a source file), we add the label as a
ConfiguredRuleInput _without_ any configuration information. This
indicates that the dependency has not been pruned, but that the caller
should determine the correct configuration from context (probably be
determining whether it's a source file, and if so, considering it
un-configured, otherwise propagating the contextual ConfiguredTarget's
configuration).

Fixes bazelbuild#14610
Fixes bazelbuild#14617

Implementation-wise, this took roughly the following shape:
1. Extract TransitionResolver from being inline in TransitionsOutputFormatterCallback
2. Extract KnownTargetsDependencyResolver from TransitionsOutputFormatterCallback.FormatterDependencyResolver
3. Conditionally call these from ProtoOutputFormatterCallback depending
   on the --transitions flag.
@sdtwigg sdtwigg added type: feature request P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed type: bug untriaged labels Mar 25, 2022
@purkhusid
Copy link

@illicitonion I'm curious if the tool you are building is open source? We currently use https://github.com/Tinder/bazel-diff but I'm interested in seeing what approaches people are using to solve the Bazel diff problem.

@illicitonion
Copy link
Contributor Author

@illicitonion I'm curious if the tool you are building is open source? We currently use https://github.com/Tinder/bazel-diff but I'm interested in seeing what approaches people are using to solve the Bazel diff problem.

It's not currently, but we're hoping to open source it soon. Roughly our approach is to, at both the before and after commit, cquery deps(...) and follow all of the ruleInputs edges, hashing rules and files appropriately. Watch this space :) (and very happy to chat, e.g. on the Bazel slack)

@gibfahn
Copy link
Contributor

gibfahn commented May 16, 2022

I'm curious if the tool you are building is open source? We currently use Tinder/bazel-diff but I'm interested in seeing what approaches people are using to solve the Bazel diff problem.

We have now open-sourced this, it's at https://github.com/bazel-contrib/target-determinator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Configurability platforms, toolchains, cquery, select(), config transitions type: feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants