Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy attachment update #1565

Merged

Conversation

youngnick
Copy link
Contributor

What type of PR is this?
/kind cleanup
/kind documentation
/kind gep

What this PR does / why we need it:
This updates the Policy Attachment documentation as per the discussion in https://docs.google.com/document/d/1p_kmAOmqebX8zyHnXW5I58cD8ZhNX4QNSa_gw8uAYxw/edit?usp=sharing

There's an update to the original GEP, and to the site documentation.

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/documentation Categorizes issue or PR as related to documentation. kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 29, 2022
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 29, 2022
site-src/geps/gep-713.md Outdated Show resolved Hide resolved
site-src/geps/gep-713.md Outdated Show resolved Hide resolved
site-src/geps/gep-713.md Outdated Show resolved Hide resolved
Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, especially considering we still consider this experimental, I'm 👍 for this in its current state. Would like to have Rob take a look too.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 28, 2023
@shaneutt
Copy link
Member

@youngnick needs a rebase

@robscott please review for final LGTM if you get some cycles

🖖

Comment on lines 108 to 112
* In terms of status, it should be reasonably easy for a user to understand that
everything is working - basically, as long as the targeted object exists, and
the modifications are valid, the metaresource is valid, and this should be
straightforward to communicate in one or two Conditions. Note that at the time
of writing, this is *not* completed.
Copy link
Contributor

@dprotaso dprotaso Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

straightforward to communicate in one or two Conditions.

On the policy or the targeted object?

How would you also signal that the policy is still needs to be reconciled? In other resources we have generation/observedGeneration pairing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is still TBD, and needs to be finished, yes.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 17, 2023
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @youngnick! Sorry it took me so long to get to this!

site-src/geps/gep-1364.md Outdated Show resolved Hide resolved
All ReferenceGrants are metaresources, and all objects that use Policy Attachment
are metaresources, but metaresources can be more than just those two things.

### Targeted metaresources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be simpler to split this into 2 categories:

  1. Hierarchical Policy Attachment (what's currently defined by the GEP)
  2. Direct Policy Attachment (close to what I think you're describing here)

I really want policy attachment to follow the same patterns and structures as closely as possible so that tooling can be built to compute the effective policy. If we leave this too open-ended it's going to be impossible to ever build an ecosystem around this framework that computes the results of any policy attachment. If we don't hat least have a way to reach that point, I think policy attachment may have failed.

As a rough example, these could require identical structures with the following exceptions:

  1. The gateway.networking.k8s.io/policy-attachment label would be set to either Direct or Hierarchical.
  2. When Direct, the resources would not have defaults or overrides and instead just a top level struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started this work, but I think the naming needs work. PTAL, keeping in mind we'll probably need to change Hierarchical Policy to something else.

site-src/geps/gep-713.md Outdated Show resolved Hide resolved
site-src/geps/gep-713.md Outdated Show resolved Hide resolved
site-src/geps/gep-713.md Outdated Show resolved Hide resolved
site-src/geps/gep-713.md Outdated Show resolved Hide resolved
site-src/geps/gep-713.md Outdated Show resolved Hide resolved
site-src/geps/gep-713.md Outdated Show resolved Hide resolved
site-src/geps/gep-713.md Outdated Show resolved Hide resolved
site-src/geps/gep-713.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 24, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 27, 2023
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @youngnick! Added lots of comments but really like this direction, lots of great ideas here.

geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
This kind of standardization not only enables consistent patterns, it allows
future tooling such as kubectl plugins to be able to visualize all policies that
have been applied to a given resource.
# Metaresources and Policy Attachment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for a separate PR, but reviewing this page reminded me of a broader idea I'd had recently. I think we may want to introduce a broader document/set of documents that cover all the ways the API can be extended. That would include both forms of Policy Attachment as well as ExtensionRefs and GatewayClass Params.

site-src/references/policy-attachment.md Outdated Show resolved Hide resolved
site-src/references/policy-attachment.md Outdated Show resolved Hide resolved
@youngnick
Copy link
Contributor Author

I've pushed another update here, I think the biggest outstanding thing is, of course, naming.

Currently, there are two types of Policy Attachment:

  • Hierarchical: the original form, with defaults and overrides.
  • Direct: a simplified form that can only attach to one object (or a small set).

I think that the Hierarchical name is technically correct but too complex and hard to write. (I've spelled it wrong so many times already).
So, we need another name.

Some thoughts:

  • Direct and Indirect: Simple, basically accurate, but I don't know if it indicates the hierarchical aspect enough.
  • Direct and Tree: Not great, but it really pushes the hierarchy.
  • Something else: I'm out of ideas.

I'll reword the site documentation once we've settled on this name.

TODO for the site docs:

  • Direct attachment example
  • Diagrams that include direct and hierarchical policy attachment
  • Better explanation of the differences

geps/gep-713.md Outdated Show resolved Hide resolved
Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking this over since there's been significant changes since my previous approval (as this PR has been "stuck" now for quite some time), I do see language and things I would like to tweak, however it's predominantly for reasons of brevity which (since we're not moving to standard yet) I feel I could just as easily contribute changes myself in a follow up PR where I feel strongly.

This is Experimental I want to operate within the spirit of that: we're in the experimental stage, the core concepts in here make sense to me, I imagine we'll have plenty more iterations to come before we move it out of experimental but I don't feel any of it's worth holding up this PR. Therefore I approve.

@shaneutt shaneutt added this to the v1.0.0 milestone Mar 8, 2023
@youngnick
Copy link
Contributor Author

@arkodg I think that, for hierarchical policies, the merge semantics depend on whether the settings are in defaults or overrides. However, for non-scalar fields, the merge semantics should be more context-dependent than just merging an entire struct into the same struct in another object.

I'll add in what I think here, and see if that makes sense.

in a hierarchy (this is "Hierarchical Policy Attachment").

Individual policy APIs:
- must be their own CRDs (e.g. `TimeoutPolicy`, `RetryPolicy` etc),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could gateway-api provide such policy api?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will probably end up supplying some Policy CRDs, I suspect. But it's not certain yet.

geps/gep-713.md Outdated
- must be their own CRDs (e.g. `TimeoutPolicy`, `RetryPolicy` etc),
- can be included in the Gateway API API group and installation or be defined by
implementations
- and must include a common `TargetRef` struct in their specification to identify how and where to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does bi-directional reference need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean, sorry. There's no bidirectional reference here, just a mandated struct.

Implementations should not copy the higher-level structs directly into the
affected object.

In the case that the field in the Policy affects a struct that is a member of a list,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets take an example of a Hierarchical policy

  • Only specify defaults
    Applied at Gateway
RetryPolicy
default:
  retryOn:
  - "501"
  - "502"
  - "503"

Applied at xRoute

RetryPolicy
default:
  retryOn:
  - "504"

is this the final state (appending/merging defaults) ?

retryOn:
  - "501"
  - "502"
  - "503"
  - "504"
  • Top level specifies a default and child level specifies a override
    Applied at Gateway
RetryPolicy
default:
  retryOn:
  - "501"
  - "502"
  - "503"

Applied at xRoute

RetryPolicy
override:
  retryOn:
  - "504"

is this the final state (bottom level override wins) ?

retryOn:
  - "504"
  • Top level specifies a override and child level specifies a default
    Applied at Gateway
RetryPolicy
override:
  retryOn:
  - "501"
  - "502"
  - "503"

Applied at xRoute

RetryPolicy
default:
  retryOn:
  - "504"

is this the final state (top level override wins) ?

retryOn:
  - "501"
  - "502"
  - "503"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my 2c, it probably makes sense to be implementation specific how merging is done. For this case, I would recommend implementations replace.

If we do prescribe replace vs merge it should be replace, IMO, since that covers more use cases (you can always add more with replace, but cannot take away with merge).

Copy link
Contributor

@arkodg arkodg Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah agreed replace for all cases is a good solution, would prefer if this was a recommendation from this project

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, there's some confusion here about the intent of defaults and overrides, and it's also intersecting with a particular type of element that makes merging very difficult.

For the latter, merging lists is always going to be hard, that's why Kubernetes (and kubebuilder) have the listMergeType directives, so that the apiserver and other api machinery get some clues about how to merge things. A list of strings is probably close to the worst-case for merge complexity, because there are no clues about priority in the string itself, unlike if there was a structured object there.

That said, I can see that in the example here, it does make sense. But we need to understand that we should try to avoid lists-of-strings in the same way we should avoid map[string]string - the lack of structure makes doing things with the field harder in the long term. (That isn't to say we should never do it, just that it has a cost we should consider).

For this specific setting, you could achieve the same aim by using a single string that contained comma-separated values. To that end, when a Policy changes the retryOn field, it will replace the field, not be merged into it. Effectively, the value of the list is the set of strings included in the list, and the Policy will replace the value, the same as it would on a key: value field inside a struct. The same applies for value: map[string]string fields.

In this case, the other rule that's important is the differences between defaults and overrides.

The overall rule is that only one instance of a Policy should be in effect on the final object at any one time. If two Policy objects of the same type end up affecting the same leaf node, then only one can win.

defaults are a "more specific wins" construct, so if multiple Policy objects are attached, the one that's closer to the thing being modified will win.

overrides are a "less specific wins" construct, so if multiple Policy objects are attached, the ones that's higher in the hierarchy will win.

In the event a default and an override both apply, no matter what level they've come from, then the override will win.

This means that in the first example, the defaults from the two Policy objects should not be merged. The defaults should be merged into the thing they're modifying, but only one instance of the same Policy will be the winner.
That means that the final result should look like

RetryPolicy
default:
  retryOn:
  - "504"

This is the same as if the actual Route had specified the setting.

For the second example, top level specifies a default and a lower an override, the override wins, because overrides beat defaults.

For the third, top level specifies an override, and a lower a default, the override wins, because overrides beat defaults.

Some other examples:
Top level specifies an override, and a lower a different override, the top-level override wins, because overrides are "less specific beats more specific".

This comment was me working through my thoughts, I'll take it and fold it into the GEP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this thread here for a bit so everyone can see it, but I've folded this language into both the GEP and the doc page now, so I consider this thread resolved.

geps/gep-713.md Outdated Show resolved Hide resolved
Copy link
Member

@sunjayBhatia sunjayBhatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some small comments but I'm generally on board with the changes here 👍🏽

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 16, 2023
@youngnick
Copy link
Contributor Author

I started writing some updates to address @arkodg's feedback, and it got a bit out of hand. I've added a bunch of extra description and examples to further describe:

  • how interactions between overrides and defaults should work
  • how fields should be overridden, including for non-scalar fields
  • when and how merging should be done (it's complicated).

I'm open to discussing if we should simplify this by only allowing one Policy of each Kind to be in effect on any object at once, but I think that it would substantially decrease the usefulness of this pattern.

I've got pretty detailed to try and eliminate edge cases, but you can sum up the rules for Hierarchical Policy interactions like this:

  • overrides override everything from the top down
  • defaults set defaults, with more specific beating less specific defaults
  • If a Policy affects a field inside an object, then that field is the most specific possible value, and so will beat any default from anywhere.

@youngnick
Copy link
Contributor Author

Oh, I've also folded most of the new changes into the doc page as well as the GEP, as I think that it should make the doc page clearer.

Also, I've specifically left the section about adding a sectionName to targetRef alone for now, although I do think we should change it, this PR is big enough already.

Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work on this @youngnick! There are some really great improvements in this. I ran out of time to make it all the way through this, but added a bunch of small comments. Overall, I'm in agreement with all of the major changes/additions I've read so far.

geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Show resolved Hide resolved
geps/gep-713.md Show resolved Hide resolved
Comment on lines +839 to +848
Note that this will not be very discoverable for Gateway owners in the absence of
a solution to the Policy status problem. This is being worked on and this GEP will
be updated once we have a design.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a massive GEP. It's easy to lose track of what we still have in progress. It would be very helpful to prominently track open questions/work tracks somewhere in this GEP.

geps/gep-713.md Outdated Show resolved Hide resolved
Copy link
Contributor

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just left a few nitpicks. Nice!

geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
_all_ of the specified fields should take effect on the affected object.

Examples to further illustrate these rules are given below.

## API
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example that I don't know is covered directly by Direct or Hierarchical would be additive policies. For example, we have an AuthorizationPolicy; you can have a rule "allow GET requests" and another "allow requests to /public" or whatever. There isn't one result, the resulting behavior is all of the policies merged together. These can be applied cluster-wide, namespace-wide, or per-workload.

This is probably not Hierarchical since there is no overrides and defaults, but Direct seems to suggest it should only apply to a single type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they can be applied on multiple levels, then they should be an Inherited (renamed from Hierarchical) Policy. I agree there's no good way to represent this exact use case for now though, so can we come back and follow this one up? Maybe we can add another reserved top-level key that makes something an Inherited Policy?

geps/gep-713.md Outdated Show resolved Hide resolved
each existing item in the list in the affected object should have each of its
fields compared to the corresponding fields in the Policy.

For non-scalar field _values_, like a list of strings, or a `map[string]string`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep making the mistake of coming back to this GEP in the morning before coffee, but I am having trouble 100% following this.

A little table of

|Top level config|Lowest level config|Result|
|-|-|

Would be super helpful I think

(not blocking)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its described a bit more further down which made sense, table may still be nice to have though

Copy link
Contributor Author

@youngnick youngnick Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a small table, which hopefully helps. The key is that there's no merging in the non-scalar case anyway. (which is particularly relevant if a Policy wants to control annotations on something, since those are map[string]string).

a solution to the Policy status problem. This is being worked on and this GEP will
be updated once we have a design.

Conceivably, a security or admin team may want to _force_ Gateways to have at least
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a bit of a nitpick, but technically with RBAC limitations they probably cannot achieve their desired results like this, since its namespace bound. Unless they do super fine-grained RBAC privileges and not just "give user X TLSMinimumVersionPolicy create in default namespace", but IIRC you cannot do that for CREATE privilege.

Basically if they have this privilege they can just remove the namespace-wide override?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know, but I was running out of good-example brain juice. PRs to improve this welcomed once this goes in? 😄

geps/gep-713.md Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
Copy link
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for incorporating the "Direct" policy type and clarifying the semantics for Hierarchical policies

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arkodg, shaneutt, sunjayBhatia, youngnick

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@youngnick
Copy link
Contributor Author

Okay, I think I've got both those rounds, thanks @robscott and @howardjohn!

The big change in this last revision is that I picked "Inherited" for the other Policy name, so we now have "Direct" or "Inherited" Policy.

@howardjohn
Copy link
Contributor

LGTM, since this was a pretty popular PR and just changed I'll let someone else give the official approval though to give some time to review. Ping me if that never happens 🙂

Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @youngnick! This is a really great improvement. Had a few areas where I'm still confused but agree with direction of everything here.

site-src/references/policy-attachment.md Outdated Show resolved Hide resolved
site-src/references/policy-attachment.md Outdated Show resolved Hide resolved
site-src/references/policy-attachment.md Outdated Show resolved Hide resolved
geps/gep-713.md Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
geps/gep-713.md Show resolved Hide resolved
geps/gep-713.md Outdated Show resolved Hide resolved
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Focused on docs page this time, I think this is mostly nits at this point, but a couple larger questions about if we can reduce the scope of this PR a bit.

site-src/references/policy-attachment.md Outdated Show resolved Hide resolved
site-src/references/policy-attachment.md Outdated Show resolved Hide resolved
site-src/references/policy-attachment.md Outdated Show resolved Hide resolved
site-src/references/policy-attachment.md Outdated Show resolved Hide resolved
site-src/references/policy-attachment.md Outdated Show resolved Hide resolved
site-src/references/policy-attachment.md Outdated Show resolved Hide resolved
site-src/references/policy-attachment.md Outdated Show resolved Hide resolved
site-src/references/policy-attachment.md Outdated Show resolved Hide resolved
site-src/references/policy-attachment.md Outdated Show resolved Hide resolved
Also distinguish between Direct and Inherited Policy Attachment.

Signed-off-by: Nick Young <[email protected]>
@robscott
Copy link
Member

Thanks for the massive amount of work on this @youngnick! This is a huge step forward for our policy attachment model.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 29, 2023
@k8s-ci-robot k8s-ci-robot merged commit 004eb1b into kubernetes-sigs:main Mar 29, 2023
@shaneutt shaneutt modified the milestones: v1.0.0, v0.7.0 Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/documentation Categorizes issue or PR as related to documentation. kind/gep PRs related to Gateway Enhancement Proposal(GEP) lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

10 participants