Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for peeking at dependency/dependents rules #18112

Merged
merged 13 commits into from
Mar 1, 2023

Conversation

kaos
Copy link
Member

@kaos kaos commented Jan 28, 2023

Related to #17694

By providing --include-dep-rules to the peek goal, a few additional fields will be included for each target:

  • _dependents_rules enumerates all rules that apply for the target's dependents.
  • _dependencies_rules enumerates all rules that apply for the target's dependencies.
  • _applicable_dep_rules enumerates all rules that are in effect for the targets's dependencies.

Currently any DENY actions will cause peek to fail, so for introspection of these rules the work-around is to change them to WARN instead (replace ! with ? in the rule) until we have fixed #17660.

Example (stripped) output for ./pants peek --include-dep-rules 3rdparty/python#uvicorn:

[
  {
    "address": "3rdparty/python#uvicorn",
    "target_type": "python_requirement",
    "_dependencies_rules": null,
    "_dependents_rules": [
      "src/python/pants/explorer/server/**",
      "!*"
    ],
    "_applicable_dep_rules": [
      "3rdparty/python -> 3rdparty/python/BUILD[*] : ALLOW\npython_requirements 3rdparty/python#uvicorn -> python_requirements 3rdparty/python/requirements.txt",
      "3rdparty/python -> 3rdparty/python/BUILD[*] : ALLOW\npython_requirements 3rdparty/python#uvicorn -> _lockfiles 3rdparty/python/user_reqs.lock:python-default"
    ],
    "dependencies": [
      "3rdparty/python/requirements.txt",
      "3rdparty/python/user_reqs.lock:python-default"
    ],
    "requirements": [
      "uvicorn[standard]==0.17.6"
    ],
  }
]

The rules from the BUILD file in effect for the above was:

__dependents_rules__(
    (  # Only the explorer server may depend on these libraries
        (
            "[fastapi]",
            "[starlette]",
            "[strawberry-graphql]",
            "[uvicorn]",
        ),
        "src/python/pants/explorer/server/**",
        "!*",
    ),
    # Free for all on the rest
    ("*", "*"),
)

Another (stripped) example from "the other" end for the above rule ./pants peek --include-dep-rules src/python/pants/explorer/server/graphql/field_types.py:

[
  {
    "address": "src/python/pants/explorer/server/graphql/field_types.py",
    "target_type": "python_source",
    "_dependencies_rules": null,
    "_dependents_rules": [
      "src/python/pants/explorer/server/**",
      "!*"
    ],
    "_applicable_dep_rules": [
      "src/python/pants/explorer/server/graphql -> 3rdparty/python/BUILD[src/python/pants/explorer/server/**] : ALLOW\npython_sources src/python/pants/explorer/server/graphql/field_types.py -> python_requirements 3rdparty/python#strawberry-graphql",
      "src/python/pants/explorer/server/graphql -> src/python/pants : ALLOW\npython_sources src/python/pants/explorer/server/graphql/field_types.py -> python_sources src/python/pants/__init__.py"
    ],
    "dependencies": [
      "3rdparty/python#strawberry-graphql",
      "src/python/pants/__init__.py"
    ],
    "sources": [
      "src/python/pants/explorer/server/graphql/field_types.py"
    ],
  }
]

The commits are reviewable independently.

@kaos kaos requested a review from stuhood January 31, 2023 19:36
@stuhood stuhood requested review from benjyw and AlexTereshenkov and removed request for stuhood February 6, 2023 23:19
@stuhood
Copy link
Member

stuhood commented Feb 6, 2023

I'll bow out, since I'm less familiar with this feature.

for tgt, (family, adaptor) in zip(sorted_targets, family_adaptors)
if family.dependents_rules is not None
}
all_effective_dep_rules = await MultiGet(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's odd that we're introducing this new bit of terminology, "effective", but populating it with something using the terminology "application". Why not be consistent, and call this "applied_dependency_rules" or "applicable_dependency_rules"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I'm not sure what the purpose of this is? If a rule bans a dependency then that dependency won't exist, and so there will be no rule that applies to it. So this is just for rules that allow dependencies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It allows you to see what rules are defined, and also, in case you see dependencies you thought would have been denied but aren't, which rules allowed them. So it lets you peek into the defined visibility rules. It's been requested for a way to find out what rules have been defined and what applies to certain targets etc.

Agree on effective vs applicable. I'll change that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this is a bit confusing? What is in effective_ vs in dependencies/dependents?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in dependencies/dependents you set up a bunch of rules yea. They may or may not "capture" anything. Applied (effective) rules are those that actually hit something. You've got a dependency relation between two targets where some rule(s) is/are in play.

Copy link
Member Author

@kaos kaos Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rules applying to this node as a dest The subset of these that actually applied for some edge

is that not very expensive? or, it shouldn't be worse than pants dependents .. ?
but that would be some extra lookup from what we do in peek now, any way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so. I'm just trying to wrap my head around the utility of the "applied" data. Most of the time rules will ban edges, so those edges won't exist. So this is only for the case of an edge that was explicitly allowed by a rule, rather than being allowed by default because no rule bans it? And we're not actually putting this information on the edge, but on the source target, and we're not saying which edge it referred to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be rules that use "warn" rather than "deny". We may introduce an option to treat "deny" as "warn", we may also, for peek and lint not exclude denied edges for the purpose of reporting (#17671 (comment)).

we're not saying which edge it referred to?

Oh, but we do.

src/python/pants/explorer/server/graphql -> src/python/pants : ALLOW
python_sources src/python/pants/explorer/server/graphql/field_types.py -> python_sources src/python/pants/__init__.py"

This tells us that for the dependency edge from ../graphql/field_types.py to ../pants/__init__.py was allowed (by default, as there where no rules involved.

The other example, for comparison:

src/python/pants/explorer/server/graphql -> 3rdparty/python/BUILD[src/python/pants/explorer/server/**] : ALLOW
python_sources src/python/pants/explorer/server/graphql/field_types.py -> python_requirements 3rdparty/python#strawberry-graphql",

Edge from ../graphql/field_types.py to ../python#strawberry-graphql was allowed by the rule .../server/** of the dependents rule defined in 3rdparty/python/BUILD, and there were no rules in play on the dependency side of the edge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth noting is, that although this data only shows what has been allowed, the logs will hold information about what was denied, so that is already available. And if you expected something to be denied but wasn't, this will help pin point why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha. OK, that is neat.

Copy link
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, sorry to be difficult, but something is still bugging me about this: Let's say you have just a handful of coarse-grained rules in the repo (which should be the common case). We will repeat them many, many times in the peek output, very redundantly. At least, that is my understanding? This will massively bloat peek output with a ton of redundant text.

What is the use case here? I wonder if maybe this data belongs in some new goal that you can use to query a specific edge? Or maybe it belongs on the paths goal or something?

@kaos
Copy link
Member Author

kaos commented Feb 10, 2023

Hey, sorry to be difficult, but

😂 no worries, I'm happy to work through this until we're both happy about it, where ever that takes us in the end.

We will repeat them many, many times in the peek output

I didn't readily articulate this, but this is one of the motivations I had for the --include-dep-rules option whether to present this data or not in the peek output as I realize it will not always be of interest. Iff we chose going the way presented in this PR we could default the value based on the presence of a visibility backend.

Having said that, I'm not against moving this to paths or else where; I don't consider peek to be the absolute optimal choice here, however I disagree with the notion we would repeat this information redundantly for some common rule at the top any more than it would be to present tags = ["foo", "bar"] in case you'd stick a __defaults__(all={"tags": ("foo", "bar"))) at the top. (there may be some overriding rules halfway down the tree, how would you find that out if we didn't include this for every target?)

I am planning for a visibility lint goal implementation, but that would be more focused on presenting all the violations there may be (#17671), while this on the other hand is good for introspection to see what is actually being used also to allow edges. That part about "introspection" there lead me to think that peek is a suitable goal for this. I also see the reason for suggesting paths as this is edge related--however the dependencies/dependents rules for a target would be awkward there, while the applicable rules would make perfect sense (maybe have that information there as well!?)

@benjyw
Copy link
Contributor

benjyw commented Feb 10, 2023

Let's go back to basics: What is this for? Let's enumerate the important use cases. For example,

  • I see a dependency that I think should be allowed, but isn't. I want to find out why.
  • ?
  • ?

Let's fill in those question marks

@kaos
Copy link
Member Author

kaos commented Feb 10, 2023

Let's go back to basics: What is this for? Let's enumerate the important use cases. For example,

  • I see a dependency that I think should be allowed, but isn't. I want to find out why.
  • ?
  • ?

Let's fill in those question marks

That's basically the linked issue from the PR description: #17694

@benjyw
Copy link
Contributor

benjyw commented Feb 10, 2023

That issue is a little general, but the specific example it gives is "who is allowed to import this particular Python module?".

I'm not sure that question is well-answered by this peek-based implementation? We'll show "the rules that apply to this module", which you can usually also get by reading that module's BUILD file (or an ancestor of it), but we won't list who is allowed to be at the other end of those rules. Even the applied_ data doesn't answer this question. It shows which existing edges were allowed by which rule, but that wasn't the question.

It seems like we need a new goal for this.

@benjyw
Copy link
Contributor

benjyw commented Feb 10, 2023

Or maybe this is a filter subsystem thing?

@benjyw
Copy link
Contributor

benjyw commented Feb 10, 2023

Filter subsets its input targets, so you could do "show me the subset of the input targets that is allowed to depend on target X". So now you can do "show me all targets that are allowed to depend on X" or "show me all targets in foo/bar that are allowed to depend on X" or even down to "show me if Y is allowed to depend on X"

But then the problem is we wouldn't be showing you why, because the output of filter is just a list of targets.

Still, that may be a good start?

@stuhood
Copy link
Member

stuhood commented Feb 10, 2023

This will massively bloat peek output with a ton of redundant text.

Drive by comment: this is also the case for __defaults__, but it seems like in both cases, the answer isn't "contextual information is bad", rather: "show me relevant contextual information by default, but allow for showing me more or less".

@kaos
Copy link
Member Author

kaos commented Feb 10, 2023

The list idea is a different use case in my eyes. The peek goal is a kind of debug tool to expose the internal representation of how Pants sees your project. To me, what visibility rules are defined (and how they interact with the dependencies presented) fits in quite well here.

However, this may be much, and not always interesting, hence a flag to be able to choose if it should be presented or not.

@kaos
Copy link
Member Author

kaos commented Feb 10, 2023

In other words, I'm in favour of reverting 60563c4

@benjyw
Copy link
Contributor

benjyw commented Feb 10, 2023

Yeah, at the very least we will want that option after all.

But note my comment about the fact that this doesn't actually solve the need in the original ticket.

@kaos
Copy link
Member Author

kaos commented Feb 11, 2023

Yeah, at the very least we will want that option after all.

Agreed, then. 👍🏽

But note my comment about the fact that this doesn't actually solve the need in the original ticket.

Hmm.. from the ticket:

I'd like to get information about the visibility rules defined in a BUILD file in this directory (both raw and with expanded paths).

This is presented, but perhaps not at the level of detail suggested. (i.e. not both raw and expanded paths)

When the rule selectors are expanded to real files, one could answer a variety of questions, e.g. "who is allowed to import this particular Python module?".

I didn't read this strictly literally, but considered this detail to satisfy the above:

    "_dependents_rules": [
      "src/python/pants/explorer/server/**",
      "!*"
    ],

In that it shows clearly what paths would be accepted and all the rest not. Note as well that there may very well be rules on the other end affecting the final verdict for the edge, so it's potentially only half the truth here.

This may seem trivial, but given a a large set of rules, potentially with complex logic creating the rules in the BUILD file this may not be obviously apparent simply by reading the sources. Mix-in extend=True to inherit parent rules and this may become even more difficult, so being able to get the end result out may prove very valuable.

Reading something into the emoji reactions from @AlexTereshenkov also shows this is not entirely non-sense ;)
image

@AlexTereshenkov maybe you could chime in with if this does what you'd expect/want to and also what may be missing? :)

@benjyw
Copy link
Contributor

benjyw commented Feb 12, 2023

Not sure I agree. Looking at this data seems not much different from looking at the original rules in the BUILD files. It doesn't answer a question like "who is allowed to import X" or "is the dependency X->Y valid and why". Answering those requires evaluating rules, but that is exactly the non-trivial thing we can do! It seems a lot more useful than just recapitulating the rules from the BUILD files, which are not hard to look up.

@kaos
Copy link
Member Author

kaos commented Feb 13, 2023

Answering those requires evaluating rules, but that is exactly the non-trivial thing we can do!

and the applicable_rules (or effective_rules) are just that, evaluated rules for any given target and its dependencies.

Looking at this data seems not much different from looking at the original rules in the BUILD files.

I disagree--as this assumes you write the rules in a plain straight very non-DRY fashion.
Real example (from wip, but shows that reading a single BUILD file doesn't necessarily give a straight answer):

__dependencies_rules__(
    (
        # Users permissions is an exception to the rules, as it may depend on x models.
        "[/permissions.py]",
        "/../x/models/**",
        APP_RULES,
    ),

    # Other (temporary?) exceptions
    (
        (
            "[/models/*]",
            "[/viewsets/*]",
        ),
        "/../x/models/**",
        "/",
        "/*",
        APP_RULES,
    ),
    ("[/tests/**]", "*"),
    ("[serializers.py]", "../x/models/*", APP_RULES),

    # Common rules for all x apps
    *x_app_rulesets()
)

@AlexTereshenkov
Copy link
Member

So sorry I am late to the party. I was meaning to catch up on this and provide feedback. Will do asap, definitely this week! 😆

@kaos
Copy link
Member Author

kaos commented Feb 13, 2023

It doesn't answer a question like "who is allowed to import X"

This one is interesting if interpreted as "list all files I have that potentially could depend on this target without breaking any rules", and is not covered by this PR, and I think don't belong in peek at all so suggest as leaving for follow up--to be implemented as an option to list perhaps?

@kaos
Copy link
Member Author

kaos commented Feb 13, 2023

It doesn't answer a question like [..] "is the dependency X->Y valid and why".

This is #18112 (comment) unless Y is a hypothetical dep..

@benjyw
Copy link
Contributor

benjyw commented Feb 13, 2023

It doesn't answer a question like "who is allowed to import X"

This one is interesting if interpreted as "list all files I have that potentially could depend on this target without breaking any rules", and is not covered by this PR, and I think don't belong in peek at all so suggest as leaving for follow up--to be implemented as an option to list perhaps?

This is literally the example given in the original ticket :)

I think enumerating the rules is an OK feature, but actually applying them is way cooler and more useful. Then you can say something like ./pants dep-rules --from=foo/bar.py --to=baz/qux.py and see whether that is allowed or denied, and why. And those paths could be globs, etc.

@benjyw
Copy link
Contributor

benjyw commented Feb 13, 2023

It doesn't answer a question like [..] "is the dependency X->Y valid and why".

This is #18112 (comment) unless Y is a hypothetical dep..

Yeah, it is only that if that is an actual edge in the graph, in which case we know it's allowed, so that is less exciting. But actually querying the graph on hypothetical edges is way exciting. For example, it'll tell you why an edge would be banned.

@kaos
Copy link
Member Author

kaos commented Feb 13, 2023

Agreed there's more we want to do here, this just adds the first basic stuff for introspection of what we actually have.

Live querying hypotheticals is def worth exploring too going forward.

@AlexTereshenkov
Copy link
Member

I've given it a go in my open-source project where I experiment with Pants. This all looks great and is very helpful: first and foremost it gives me a chance to extract the rules programmatically. This is already a big one (I don't have to use buildozer or parse Python file).

It also answers the questions I had in mind (given a few months of PoC work enabling visibility rules in the repo):

"who is allowed to import this particular Python module?"

❯ ./pants_from_sources peek cheeseshop/cli/utils/utils.py | jq '.[]._dependents_rules' 
[
  "//cheeseshop/cli/utils/**",
  "//cheeseshop/cli/**",
  "?*"
]

"what can this particular Python module import?"

❯ ./pants_from_sources peek cheeseshop/cli/utils/utils.py | jq '.[]._dependencies_rules'
[
  "cheeseshop/cli/utils/**",
  "?*"
]

Are import statements in this particular module allowed?

The imports in the cheeseshop/repository/package.py:

# 3rd party from requirements
from typing_extensions import TypeAlias
from packaging.version import Version

# forbidden import
from cheeseshop.cli.utils.utils import choices  

# allowed imports
from cheeseshop.repository.parsing.casts import from_int
from cheeseshop.repository.properties import PackageType
echo $(./pants_from_sources peek cheeseshop/repository/package.py | jq '.[]._applicable_dep_rules')
[ "cheeseshop/repository/BUILD[//cheeseshop/cli/**] -> cheeseshop/cli/utils/BUILD[?*] : WARN
python_sources cheeseshop/repository/package.py -> python_sources cheeseshop/cli/utils/utils.py", "cheeseshop/repository/BUILD[cheeseshop/repository/**] -> cheeseshop/repository : ALLOW
python_sources cheeseshop/repository/package.py -> python_sources cheeseshop/repository/properties.py", "cheeseshop/repository/BUILD[cheeseshop/repository/**] -> cheeseshop/repository/parsing : ALLOW
python_sources cheeseshop/repository/package.py -> python_sources cheeseshop/repository/parsing/casts.py", "cheeseshop/repository/BUILD[//requirements#*] -> requirements : ALLOW
python_sources cheeseshop/repository/package.py -> python_requirements requirements#packaging", 
"cheeseshop/repository/BUILD[//requirements#*] -> requirements : ALLOW
python_sources cheeseshop/repository/package.py -> python_requirements requirements#typing-extensions" ]

This already takes me very far.


One thing that would be terrific is to have a parseable data structure in the _applicable_dep_rules field? This

  "cheeseshop/repository/BUILD[cheeseshop/repository/**] -> cheeseshop/repository : ALLOW\npython_sources cheeseshop/repository/package.py -> python_sources cheeseshop/repository/properties.py",

could be represented as

{
  "from": "cheeseshop/repository/BUILD[cheeseshop/repository/**]",
  "to": "cheeseshop/repository",
  "rule": "ALLOW",
  "origin": {
    "target_type": "python_sources",
    "filepath": "cheeseshop/repository/package.py"
  },
  "destination": {
    "target_type": "python_sources",
    "filepath": "cheeseshop/repository/properties.py"
  }
}

The output of the peek is most certainly will be machine parsed, so it would be amazing to have a JSON object.


I am sure there are more things one can mine given the inputs, but this is a solid foundation from my POV.

@@ -143,7 +160,7 @@ async def get_build_file_dependency_rules_implementation(
f"There must be at most one BUILD file dependency rules implementation, got: {impls}"
)
for request_type in request_types:
impl = await Get( # noqa: PNT30: requires triage
impl = await Get( # noqa: PNT30: this for loop will never process more than a single iteration.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaos
Copy link
Member Author

kaos commented Feb 25, 2023

@AlexTereshenkov close enough?

                    "_applicable_dep_rules": [
                      {
                        "action": "ALLOW",
                        "rule_description": "foo/BUILD[*] -> foo/BUILD[*]",
                        "origin_address": "foo:baz",
                        "origin_type": "files",
                        "dependency_address": "foo/a.txt:baz",
                        "dependency_type": "files"
                      }
                    ],

@kaos kaos requested a review from benjyw February 25, 2023 03:36
@AlexTereshenkov
Copy link
Member

@AlexTereshenkov close enough?

                    "_applicable_dep_rules": [
                      {
                        "action": "ALLOW",
                        "rule_description": "foo/BUILD[*] -> foo/BUILD[*]",
                        "origin_address": "foo:baz",
                        "origin_type": "files",
                        "dependency_address": "foo/a.txt:baz",
                        "dependency_type": "files"
                      }
                    ],

Yes, this is fantastic. I think the information in the rule_description is already available in other keys, so this should be fine. If not, I can always look into improving that later. I don't want to block you on this.

@kaos
Copy link
Member Author

kaos commented Feb 25, 2023

@AlexTereshenkov close enough?

Yes, this is fantastic. I think the information in the rule_description is already available in other keys, so this should be fine. If not, I can always look into improving that later. I don't want to block you on this.

OK, cool. This is simply the source dataclass dumped as-is into the peek data.
Btw, rule_description contains unique information about where the rule was defined (and which rule in that file that was picked for this edge, so the path to the BUILD file may be != the path of the target/file in question when you have a rule spec higher up in your source tree)

@dataclass(frozen=True)
class DependencyRuleApplication:
action: DependencyRuleAction
rule_description: str
origin_address: Address
origin_type: str
dependency_address: Address
dependency_type: str

@kaos kaos added this to the 2.16.x milestone Feb 27, 2023
@kaos
Copy link
Member Author

kaos commented Mar 1, 2023

@benjyw ping. do you have any more feedback for this, or OK to land as-is?

Copy link
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexTereshenkov opened the original ticket, so if he's happy with this being useful, I am too.

Just one question re those underscores.

default=False,
help=softwrap(
"""
Whether to include `_dependencies_rules`, `_dependents_rules` and `_applicable_dep_rules`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the underscore prefix?

Copy link
Member Author

@kaos kaos Mar 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's me trying to be cautious to not get a key conflict with a target field name (in the future). This flat namespace that is the peek data is making me titchy. Hence #18383 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. I propose we adjust the peek schema, and with that we can also get rid of these underscores; but that's for another PR ;)

@kaos kaos merged commit feb4fac into pantsbuild:main Mar 1, 2023
@kaos kaos deleted the 17694/peek-visibility branch March 1, 2023 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Report all dependency errors for all targets at once in visibility backend
4 participants