Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazel query "deps(//:target)" returns //:target #17777

Closed
dgp1130 opened this issue Mar 15, 2023 · 7 comments
Closed

bazel query "deps(//:target)" returns //:target #17777

dgp1130 opened this issue Mar 15, 2023 · 7 comments
Labels
team-Performance Issues for Performance teams type: bug untriaged

Comments

@dgp1130
Copy link
Contributor

dgp1130 commented Mar 15, 2023

Description of the bug:

I'm not sure if this is intended behavior or not, but I personally find it quite annoying and couldn't find any issue or documentation stating that this was intended.

If you run a bazel query and use deps() or rdeps() on some inputs, you will always get the inputs in the result (plus the actual dependencies / reverse dependencies). I don't think this makes logical sense as a target does not have a "dependency" on itself (that would be a self-edge and generally banned).

The common workaround for this is to except the inputs like so:

bazel query "deps(//:target) except //:target"

However, this can lose information when querying the dependencies of multiple inputs. Consider the example:

# /BUILD.bazel

some_rule(
    name = "foo",
    deps = [":bar"],
)

some_rule(
    name = "bar",
    deps = [":baz"],
)

some_rule(
    name = "baz",
)

Here, //:foo has a dependency on //:bar. Therefore deps(//:foo) except //:foo returns //:bar and //:baz, working as expected. However, this model falls apart with multiple inputs such as:

bazel query "deps(set(//:foo //:bar))"

IMHO, the "correct" output here is //:bar and //:baz. At least one input (//:foo) depends on //:bar and is not //:bar itself, therefore //:bar should be included in the output. //:baz is depended upon by both. Currently it actually outputs //:foo, //:bar, and //:baz. That means we need to manually exclude the inputs, so let's try:

bazel query "deps(set(//:foo //:bar)) except set(//:foo //:bar)

However this only returns //:baz and drops //:bar. Dependency information between targets in the input set is lost because this query assumes they do not depend on each other. As a result, deps() and rdeps() are returning incorrect results, and the obvious workaround of except doesn't actually work around the problem.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

some_rule(
    name = "foo",
    deps = [":bar"],
)

some_rule(
    name = "bar",
    deps = [":baz"],
)

some_rule(
    name = "baz",
)
bazel query "deps(set(//:foo //:bar))"

Should return:

//:bar
//:baz

But actually returns:

//:foo
//:bar
//:baz

Which operating system are you running Bazel on?

WSL2 on Windows

What is the output of bazel info release?

release 6.0.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

I was not able to find any existing issue or documentation which explicitly mentioned this "self-edge" in deps() and rdeps().

Any other information, logs, or outputs that you want to share?

No response

@benjaminp
Copy link
Collaborator

https://bazel.build/query/language#deps says "For example, the value of deps(//foo) is the dependency graph rooted at the single node foo, including all its dependencies."

@dgp1130
Copy link
Contributor Author

dgp1130 commented Mar 15, 2023

Ah sorry, I managed to miss that (was looking under the deps() section specifically).

Is there any particular reason for including the input targets? I gave a concrete example above for why this can be difficult to work with. What is the benefit to including the input target as well? Is there historical reasoning for this?

@gregestren gregestren added team-Performance Issues for Performance teams and removed team-Configurability platforms, toolchains, cquery, select(), config transitions labels Mar 17, 2023
@tjgq
Copy link
Contributor

tjgq commented Mar 22, 2023

We can't change the behavior of the deps operator; that would be a massive incompatible change. At best, this would be a feature request to add a new query operator with the desired behavior. To that end, it would help to describe a concrete scenario where the absence of such an operator makes it difficult to accomplish an end goal (and it can't be reasonably worked around).

@fmeum
Copy link
Collaborator

fmeum commented Mar 22, 2023

A type of query I have personally been interested in is "all targets matching a given pattern that aren't dependencies of targets matching that pattern", i.e., getting the root nodes of the build graph. It's something that would be very easy to build if deps had the behavior requested in this issue. I don't know how to realize this query with the currently available operators, but I'm also not very experienced with them.

@pennig
Copy link

pennig commented Mar 22, 2023

I also have run into this issue in the past. The task I was trying to accomplish was get a list of targets of a class that improperly depend (directly, but particularly transitively) on other members of that class. It doesn't use deps, but it does touch on the fundamental issue here—disconnected nodes in the graph are returned unexpectedly in the query. Here's an example that reduces the issue:

Given a subgraph with nodes A, B, C, D, E, but only one edge among them, say B → D; when performing the query: allpaths(set(A B C D E), set(A B C D E)) all nodes are returned in the query instead of just B and D.

The only thing I could think of as a workaround was to perform n+1 queries, one to get the list of items in the class, and then one for each target in the class, using except to remove it from the results. Incredibly inefficient.

@dgp1130
Copy link
Contributor Author

dgp1130 commented Mar 23, 2023

We can't change the behavior of the deps operator; that would be a massive incompatible change. At best, this would be a feature request to add a new query operator with the desired behavior. To that end, it would help to describe a concrete scenario where the absence of such an operator makes it difficult to accomplish an end goal (and it can't be reasonably worked around).

Hey @tjgq! Agreed that this would be a very nuanced breaking change and might be infeasible for that reason. As for one concrete scenario, I can point at dgp1130/rules_prerender#40 (comment), where I was hoping to create a genquery test to identify bad dependency edges, but which is fundamentally impossible because of this "feature" of deps() (see number 2 of the second list).

Ultimately I suspect there are a number of users who simply don't know about this behavior and may even be encountering the bug without realizing it. I know I've done deps($input) exclude $input countless times without realizing that this is a fundamentally flawed operation whenever $input has more than one target. I wouldn't be surprised if a significant number of users actually have bad queries and just haven't noticed.

@zhengwei143
Copy link
Contributor

I don't think it is feasible to change the semantics here, since it will likely break many people who rely on this functionality. Not to mention, the behavior of including the input targets is similar across other query functions (e.g. siblings, rdeps, ...) so it is currently consistent in that regard, changing the behavior of a single function would make that inconsistent unless we do the same for the rest.

W.r.t historical reasons, I don't have any concrete reason as to why this was the case to begin with. deps(//:a) was likely meant to generate the transitive graph rooted at //:a as the documentation does hint at (but not very precisely).

Closing this, but feel free to re-open if you would like to continue the discussion.

@zhengwei143 zhengwei143 closed this as not planned Won't fix, can't repro, duplicate, stale Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Performance Issues for Performance teams type: bug untriaged
Projects
None yet
Development

No branches or pull requests

8 participants