KEP-2136: NetworkPolicy Versioning and Status #2137

danwinship · 2020-11-09T14:31:33Z

I've been thinking about this for a few reasons lately:

@jayunit100 was trying to figure out whether new NP tests are failing with various plugins because of bugs in the tests or because of bugs/missing features in the plugins
@vpickard was debugging ovn-kubernetes e2e flakes caused by the NP tests not waiting long enough for the plugin to start enforcing the new policy
If we add features like KEP 2079 - Add PortRange KEP #2090 and WIP Add a matchName selection option to the NetworkPolicy namespace selector API #2113 then there will be a transition period where some plugins support the new features and some don't
- As with any new features, they need to be implemented in a way that won't cause older NetworkPolicy implementations to badly misinterpret new policies, which is often hard to do nicely.
fix documentation and maybe validation of NetworkPolicy CIDRs kubernetes#94484 discusses what a plugin should do with NetworkPolicies that are accepted by the apiserver but maybe shouldn't have been

KEP template isn't fully filled out yet; wanted some feedback before continuing

k8s-ci-robot · 2020-11-09T14:31:50Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: danwinship
To complete the pull request process, please assign caseydavenport after the PR has been reviewed.
You can assign the PR to them by writing /assign @caseydavenport in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-network/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

keps/sig-network/2136-np-version-status/README.md

jayunit100 · 2020-11-09T18:21:32Z

keps/sig-network/2136-np-version-status/README.md

+The `"Enforcing"` condition indicates whether the plugin is currently
+enforcing the policy described by a NetworkPolicy.
+
+- If the plugin is intentionally not enforcing the policy (eg, because


this is great for the loopback policy condition, which may or may not be enforced based
on wether kube-proxy is being used in one manner or other. cc @mattfenwick

it would also allow us possibly to actually define wether loopback policy is supported or not w/o breaking anything in the validation

(Clarifying: the issue Jay is talking about is that it's unspecified whether an isolated pod is implicitly allowed to connect to itself, or if it can only connect to itself when there is a policy explicitly allowing that. With most plugins, self-connection is implicitly allowed, because NPs are enforced outside of the pod's namespace, so a self-connection never hits the NP rules. But with some plugins, like Cilium, isolated pods are not allowed to connect to themselves unless there is a policy allowing it.)

I'm not sure anything currently in this proposal is a good way of explaining this though. For one, if we did expect plugins to indicate whether loopback connections were or were not being blocked, then they would have to set that condition on basically every NetworkPolicy object, since it is not a property of any specific NetworkPolicy. (eg, if you have a policy that says "allow from X to Y" then that also either does or does not imply "block from Y to Y" depending on NP implementation.)

This feels more like a global implementation detail than a per-policy status. I had thought about whether we should have some global status in addition to the per-policy statuses. eg, then you could know for sure if this was a plugin that implements NetworkPolicyStatus or not... Also, maybe it could set a supportedVersion field so you could know ahead of time what things were supported...

ah, yeah, so i guess, publishing " global provider metadata " as part of this policy status field is an impedence mismatch ....

... selfishly, i would like that global metadata (or just hacking the metadata into this stuff asis) from that "user story netpol validator guy " perspective just bc then we could validate loopback in netpol matrices .

But if you decide that doesnt fit in here, thats fine to. just being selfish :)

But of course i realize that theres alot more to "validating" loopback then just supporting it or not, bc the behaviour is unspecified. So maybe thats a sematnic we could model somehow... maybe there's other unspecified behaviours we may want to publish info about? ?

Just food for thought, no strong opinion

But if you decide that doesnt fit in here

well, I was saying I don't think it fits into the API that's currently in the KEP, but there are definitely arguments for adding something to cover that as well

The loopback loophole seems orthogonal - we should define what we expect it to do and then push that to implementations, with a CVE if needed

keps/sig-network/2136-np-version-status/README.md

jayunit100 · 2020-11-09T18:37:52Z

keps/sig-network/2136-np-version-status/README.md

+  but will not become "Ready" until it is subject to the policy.
+
+```
+<<[UNRESOLVED pod-readiness ]>>


hmmm... i wonder if maybe the readiness thing is really required here. it might add semantics that can confuse people. enforcing is clear - its a CNI providers way of saying "to the best of my knowledge, ive implemented the rules for this policy"

Readiness is more an eye of the beholder thing, which is really tough to reason about (loopback use case again), or maybe some masquerading use cases might trip that up ?

But what does "enforcing" mean without "readiness"?

If I create a Pod and a NetworkPolicy at about the same time, and the Pod becomes Ready and the NetworkPolicy becomes Enforcing, then is it guaranteed that the Pod is subject to the NetworkPolicy? If you say "no", then "Enforcing" is much less useful. If you say "yes" then that implies a mandatory connection between NP Enforcing and Pod Readiness.

I think people generally want Pod Readiness to track NP enforcement when possible...

I guess this hinges on what we mean by being "subject" to a policy ?

am i subject to a policy iff the policy is correctly implemented and no unauthorized entities can commmuncate with me (call it NUE for abbrev).
OR

am i subject to a policy iff my CNI provider , to the best of its abilities, has tried to update its firewall/iptables/ovs/whatever rules to enforce "NUE".

The former. If Enforcing True only means "I'm pretty sure that the rules are working now", then the e2e suite can't trust it, and so it still needs to just retry for some arbitrary length of time after that, and if it's going to do that, there's really not much point in having the status at all.

For iptables, once the rules are in the kernel, they're in effect. So if you know you have called iptables or iptables-restore on each relevant node, and gotten back success, then you know NUE. If you've merely set a flag indicating that you plan to update iptables the next time a certain timer goes off, then you don't claim Enforcing yet.

once the rules are in the kernel they're in effect

what if theres a bug in iptables or a CVE or something? i know were going down the rabbit hole here, but feels like its worth probing ...

If there's a bug then buggy behavior will occur. That goes without saying. The spec is defining what correct behavior is, not magically obligating you to deliver only correct code.

In this particular case, if there was some bug in iptables that caused a plugin to report Enforcing True when it wasn't actually enforcing, then that would cause the e2e tests to flake, and this would be noticed, and the plugin author would need to decide whether to (a) remain flaky, (b) start reporting { type: Enforcing, status: Unknown, reason: IPTables, message: "Unable to determine enforcing status due to iptables bugs" } instead, (c) find a workaround for the iptables bug, (d) debug and fix the iptables bug.

acknowledged.. worth clarifying in this KEP imo , but your argument is sound

I may be going far from the original Kep proposal here, but maybe this concern here can be solved with something like a Pod that can verify if the current Enforcing status reflects what is expected (like 'API is reporting status.Enforcing = true so I can try to connect to whatever rule has been created), and if it can't, exit with 1, leading the Pod to an error that can be reported to the e2e test.

This approach seems reasonable to me, so you can have a 'common network policy tester image' that can wait for the condition, test and check if the plugin is reporting correctly

keps/sig-network/2136-np-version-status/README.md

jayunit100 · 2020-11-09T20:59:52Z

so, if this data was reliably published by CNI providers then this would make it much easier for end users to really get to the bottom of what their policies were doing, and much easier for us to write validation tests. the only one thing i wonder is, when we start allowing cni providers to implement bits and peices of the API, defining "compliance" its tricky... but i suppose at some point we;ll need to do this anyway if we expand the api

danwinship · 2020-11-10T14:03:12Z

when we start allowing cni providers to implement bits and peices of the API, defining "compliance" its tricky... but i suppose at some point we;ll need to do this anyway if we expand the api

Yeah, my argument here was:

Plugins are already implementing bits and pieces of the API, it's just that currently there's no transparency around it. (Though maybe this is less true now that there are more e2e tests; eg, named ports were almost universally unimplemented before we added a test case for them. I guess you may have more data on this?)
It seems totally reasonable to allow it to happen in the future if multiple features are added at once. eg, like if 1.21 adds both port ranges and namespaces-by-name, but some plugin only has time to implement one of the two features before its next release, then it seems reasonable to say it can ship with support for one but not the other.

aojea · 2020-11-10T16:16:37Z

keps/sig-network/2136-np-version-status/README.md

+It's not clear how big a problem this is, especially if we suggest that
+implementations should create an "empty" `status` right away if it's
+going to take them a while to determine the final `status`.
+


There is some literature about this in the api-conventions doc https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties

ah, I was looking for that doc but couldn't find it.

Yeah, I think this could be an issue, especially in the e2e network policy tests. If the tests wait for network policy status to be enforcing before validating connectivity/no-connectivity with client pods, but the plugin never sets it... then what?

Would it be an option to make the networkPolicyStatus mandatory?

I mean, I assume that if we define this, most people would want to implement it, but there will still be some length of time where you have to deal with people running older versions of plugins...

vpickard

Great writeup on this KEP!

keps/sig-network/2136-np-version-status/README.md

vpickard · 2020-11-11T20:15:52Z

keps/sig-network/2136-np-version-status/README.md

+meaning I want the test code to be able to tell at what point a
+newly-created NetworkPolicy is expected to be in effect, so that I don't
+time out the test case before the network plugin has managed to process
+the policy.


Is there a way to determine if a particular network plugin supports the status field? Just thinking about the e2e test cases in network_policy.go. Once this KEP is approved, I would imagine the e2e test would wait for the net policy status to be enforcing before spinning up the client pods to validate connectivity. Ah... I see that section below... let me look there.

vpickard · 2020-11-11T20:20:28Z

keps/sig-network/2136-np-version-status/README.md

+It's not clear how big a problem this is, especially if we suggest that
+implementations should create an "empty" `status` right away if it's
+going to take them a while to determine the final `status`.
+


Yeah, I think this could be an issue, especially in the e2e network policy tests. If the tests wait for network policy status to be enforcing before validating connectivity/no-connectivity with client pods, but the plugin never sets it... then what?

Would it be an option to make the networkPolicyStatus mandatory?

rikatz · 2020-11-12T19:27:17Z

@danwinship Thank you very much for raising this :)

I've been thinking a little bit about 'how the glue' works between Kubernetes and other non core componentes.

In our last Network Policy Meeting, @andrewsykim raised about how 'conflicting' is the proposal with the idea of 'shouldn't this be a feature gate', and I have some opinions about this:

As far as I know, FG are more related to 'let's have an enable/disable feature for the cluster because it can break things' other than 'what capabilities exists in this cluster'.

As an example, the existence of a Feature Gate could 'broadcast' in a cluster-capabilities object for the cluster what caps it have enabled, so other components can make use of it instead of relying with version matrix. Once that Feature Gate is removed, the main idea of 'fg that represents capabilities' die, so in my head, it makes sense that the cluster should announce through it's API what capabilities it have enabled (by default if it's a stable feature, or when a FG is enabled).

Also, I think it's an old problem that apiserver does not have a way to announce if a FeatureGate is enabled (I remember there was a thread about that in Slack), so for components relying on a FeatureGate enabled it must just be sure it exists, otherwise throwing an error because some field wasn't found. Unless it could know BEFORE using a feature if it's enabled (or the cluster have that capability).

EDIT: found the thread

The same problem for CSI?
So, I'm wondering how the CSI solves this problems? CSI snapshots have been shipped in Kubernetes 1.12 as alpha, a second alpha that breaks things in v1.13 and then beta in v1.17 (and still it is).

Because CSI are external components, each one created by a Storage Provider, it seems hard to me (and also a case for your KEP to be broader than Network Policy) for the user to know if a snapshot has been executed, and for the CSI Provider to know which capabilities the Cluster have.

As a real world example: Ceph RBD Snapshot is Alpha in the CSI, and only supported in Ceph Nautilus, so creating a snapshot in a kubernetes v1.17 cluster might have a lot of issues (like the CSI or the backend storage not supporting). And the reverse path of CSI 'knowing' if that field is supported either is something to be taken care.

Honestly I didn't took a look into the CSI code (and Kubernetes integration), but IMO if CSI solves this problem in some way, Network Policy provider should solve the same way, otherwise we should look into a solution for both (and anything else plugable and that needs to report status and know the cluster capabilities).

I'll keep reviewing the Kep doc, don't know if my opinion makes sense and it's clear enough :)

jayunit100 · 2020-11-12T20:24:55Z

Devils advocate: Just an idea ....

This solves a concrete problem of variant conformance to the Netpol API.

It creates a potential usability issue that now users need to know the minVersion of their policies.

The solution to this new problem, might be leapfrogging , .... moving NetworkPolicy V2, to be out of tree so people could easily update their policy APIs ? AND so that we could rapidly evolve new policy APIs over time w/o necessarily forcing any complexities on the k8s API.

THEN you'd solve the problem of users having to know what minVersion they need, bc updating the policy API would be so easy.

Havent thought through all the corners here, but just putting it out there....

danwinship · 2020-11-17T11:45:34Z

As far as I know, FG are more related to 'let's have an enable/disable feature for the cluster because it can break things' other than 'what capabilities exists in this cluster'.

Yes. In particular, if you were defining a new NetworkPolicy feature where there was a chance the syntax or semantics might change, then you'd put that new feature behind a feature gate. But that's totally separate from "what percentage of network plugins support this NP feature", and even from "does the current NP support this feature".

danwinship · 2020-11-17T12:08:29Z

This solves a concrete problem of variant conformance to the Netpol API.

I think at the moment the fact that it solves the problem of NP API versioning is maybe more interesting. Eg, if we had already implemented this in the past, then there wouldn't need to be any discussion of the backward-compatible way to add namespaceNames in #2113; you could just add it however you wanted, and use minVersion to ensure that old plugins didn't get confused by new policies.

It creates a potential usability issue that now users need to know the minVersion of their policies.

I guess actually the apiserver ought to be able to figure this out itself; if your policy contains protocol: SCTP anywhere then it's minVersion: 1.12. If it uses namespaceNames, it's minVersion: 1.21, etc.

So rather than requiring the user to set spec.minVersion, the apiserver could fill it in itself at creation time. Though this still doesn't solve the problem of knowing whether the network plugin actually implements the versioning spec...

(EDIT: updated the spec to have the apiserver fill in minVersion)

danwinship · 2020-11-17T12:09:14Z

The solution to this new problem, might be leapfrogging , .... moving NetworkPolicy V2, to be out of tree so people could easily update their policy APIs ? AND so that we could rapidly evolve new policy APIs over time w/o necessarily forcing any complexities on the k8s API.

I assume what you mean there is making NetworkPolicy (or rather, its successor) be a CRD, where the CRD would be added to the cluster by the network plugin, not by the kubernetes core? And then as long as it defines proper validation with the "no undefined fields" flag, then users would be unable to define a policy that didn't match the NetworkPolicy CRD provided by the plugin.

k8s-ci-robot · 2020-11-17T13:44:27Z

@danwinship: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-enhancements-verify	`289f136`	link	`/test pull-enhancements-verify`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

thockin · 2020-11-17T18:10:05Z

keps/sig-network/2136-np-version-status/README.md

+```
+<<[UNRESOLVED snowflake ]>>
+
+Just how special/unusual is NetworkPolicy in this regard? Have any


Micro-versioning of APIs has been discussed, but it comes with its own set of non-trivial costs. So far we've decided to avoid it and instead rely on more careful API evolution.

I meant particularly the fact that "what the apiserver knows" and "what the actual implementation of the API knows" are not necessarily the same.

Like, with... [scans 1.19 release notes looking for a feature] "Immutable ConfigMaps", I know that if I'm in a 1.19 cluster, the feature is there, and if I'm in an older cluster, it's not. (And even if I don't bother checking the cluster version, I can tell after creating a pod whether the cluster recognized the feature or not.)

But with NetworkPolicy, the fact that the cluster is of a particular version doesn't tell me whether the network plugin supports a particular feature.

That's not even true. A "cluster" can have apiservers that accept a field and kubelets which don't know about it.

That's not even true. A "cluster" can have apiservers that accept a field and kubelets which don't know about it.

That's a transient problem, and we go to great lengths to mitigate the problems that could result from it. Eg, you are required to consider the problems that might occur due to version skew when writing a KEP for a new feature.

With NetworkPolicy, the potential skew is completely unbounded. You could be running a cluster in which every kubernetes component is version 1.21, but in which the network plugin doesn't implement some NetworkPolicy feature that was added to the API in kubernetes 1.15.

thockin · 2020-11-17T18:57:41Z

keps/sig-network/2136-np-version-status/README.md

+is added in 1.21 but then changed in 1.22, then a 1.22 cluster with a
+network plugin implementing the 1.21 version would probably be unable
+to use either version of the feature. We might want to establish a
+somewhat more restrictive set of alpha API rules for NetworkPolicy


Maybe. I want to DTRT as much as possible, but Alpha stuff is Alpha for a reason. The guarantees are very low.

thockin · 2020-11-17T19:06:54Z

keps/sig-network/2136-np-version-status/README.md

+	// conforms to the specified version. If it is not specified, the apiserver
+	// will fill in the correct minVersion based on the features used by the policy.
+	// +optional
+	MinVersion NetworkPolicyVersion `json:"minVersion,omitempty" protobuf:"bytes,5,name=minVersion"`


Absent a general mechanism to do this automatically, I don't think this is really going to work.

We have to modify ValidateNetworkPolicySpec every time we add a new feature anyway. It would not be difficult at all to modify it to keep track of what features it has seen at the same time.

eg, if there is a NetworkPolicyPort.Protocol, it must be one of the valid v1.Protocol values. And, if that value is "SCTP" then the minVersion is now at least "1.12"

if there is a NetworkPolicyPort.EndPort, then there must also be NetworkPolicyPort.Port, and Port must be smaller than EndPort. And minVersion is now at least "1.21"

etc

Anything is possible, it's just code. But the code around that becomes complicated quickly. "I saw field foo, so it must be at least 1.22, but it had value bar which means it is at least 1.23.

I'm not even against the idea overall, it might be cleaner to validate in discrete steps. I just a) don't want to open code it; and b) don't want to walk this road alone. This is what IDLs are for.

thockin · 2020-11-17T19:11:18Z

keps/sig-network/2136-np-version-status/README.md

+
+```
+// NetworkPolicyStatus contains information about the processing of a NetworkPolicy
+type NetworkPolicyStatus struct {


I am tentatively OK with adding conditions, but we need to define what each condition really means. What happens when Enforcing is true, then I update the policy? Do we auto-reset it to false, or do we leave it stale? What happens if one node out of 5000 is out to lunch, and is not implementing the policy? What if I have different versions of the agent on different subsets of nodes?

What happens when Enforcing is true, then I update the policy? Do we auto-reset it to false, or do we leave it stale?

Hm... yeah, I was only thinking about creation time. What happened with pod ReadinessGates? I feel like people agreed that NetworkPolicy programming should block pod readiness at creation time, but pods should not become unready later on just because new policies were added.

Even if we say Enforcing doesn't track updates, if you really cared about that case you could know when it was ready by creating a new policy rather than updating the old one...

What happens if one node out of 5000 is out to lunch, and is not implementing the policy? What if I have different versions of the agent on different subsets of nodes?

I feel like those questions must apply to the sorts of things people wanted Pod ReadinessGates for too. Did we come up with answers there?

We decided that readiness was exclusively about the initial readiness. But that's driven by small-scale signals. E.g. "is my LB programmed" not "is every single node in my 5K cluster reporting ready"

thockin · 2020-11-17T19:13:40Z

keps/sig-network/2136-np-version-status/README.md

+When a user creates a NetworkPolicy, and is using a network plugin that
+implements this specification:
+
+  - If the NetworkPolicy uses API fields which are not known to the


The APIserver will just discard fields it doesn't know

Ah, so it's just kubectl create magic that rejects unknown fields?

OK, so in that case:

If the user explicitly specifies a minVersion that is newer than the apiserver, then we can recognize that at validation time and reject it

If the user doesn't explicitly specify minVersion and lets the apiserver default it, then the apiserver may fail to notice dropped fields and the policy may end up wrong.

So... meh, that's not useful

Ah, so it's just kubectl create magic that rejects unknown fields?

Yeah - it pulls the openapi doc and does some validation.

thockin · 2020-11-17T19:17:09Z

keps/sig-network/2136-np-version-status/README.md

+The `"Enforcing"` condition indicates whether the plugin is currently
+enforcing the policy described by a NetworkPolicy.
+
+- If the plugin is intentionally not enforcing the policy (eg, because


The loopback loophole seems orthogonal - we should define what we expect it to do and then push that to implementations, with a CVE if needed

thockin · 2020-11-17T19:19:53Z

keps/sig-network/2136-np-version-status/README.md

+  and `message`. This indicates to the user that the enforcing status is
+  not known and is not going to become known.
+
+- A plugin which is able to determine when a NetworkPolicy is fully in


The update-race applies here. At time t0, the Enforcing condition was "true". At time t1, user changes the policy. At time t2, something updates the Enforcing condition to "false". Between t1 and t2, the API is lying.

thockin · 2020-11-17T19:21:34Z

keps/sig-network/2136-np-version-status/README.md

+  `message`. In particular, it should do this if it thinks there may be
+  a noticeable delay before it is able to set the condition to `"True"`.
+
+- A plugin which is able to determine when a NetworkPolicy is fully in


Here there be dragons. What if a new node joins which isn't enforcing yet? What if a policy comes undone (e.g. someone flushes iptables). Especially around security, we need to be careful what we claim to be truth.

It takes some time, especially in scaled environments, for network policy to be programmed. This PR adds 2 retries for pod connectivity test failures when a network policy is created, so that the test isn't marked as failed during the window of time it takes for the network policy to be enforced. There is a KEP for adding a "status" field to the network policy object. Once that is available, the test code can be enhanced to check the status of the network policy to be "enforcing" before spinning up client pods to validate connectivity/no-connectivity. KEP: kubernetes/enhancements#2137 Signed-off-by: vpickard <[email protected]>

jayunit100 · 2020-12-10T02:01:53Z

I guess either could work. In my imagination The crd would be like a versioned with some basic functionality related to managing its own api, and installed by its own operator which might handle other things as well, like dns policy operators or port range operators or other things that were "api heavy" and not reliant as much on the underlying CNI .

I Might be overthinking though... maybe out of bounds for this specific KeP .

fejta-bot · 2021-03-10T02:44:02Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

rikatz · 2021-03-23T12:37:14Z

I know this PR is almost stale, but another proposal called by attention:

#2582

As far as I could see, there's an intent to label namespaces with the Pod Isolation Policy and which baseline version is supported, so as an example, if from v1.19 to v1.20 is decided to block users starting with "jay" on the restricted baseline, the cluster admin can opt-in to it just by changing the label version.

Why I'm bringing this here? Because apparently it's appearing some precedence of feature versioning announcement and selection through well known labels. On the KEP this is done by the cluster-admin. I think we

This might solve or at least make it easy to solve or justify the Network Policy versioning part of this KEP. We've been struggling to extend/evolve the Network Policy selectors because of the fail-open problem: if we add some unknown selector the CNI might simply ignore the field, and allow it.

Announcing the version of network policy features (through labels, the same way we did with namespace by name defaulting) does not solve this problem with old CNIs that are not aware of the versioning (they will still ignore the label, for example) but can be used in a future for the major CNIs to know that the NetworkPolicy should be ignored because it is not compliant with the Kubernetes version that the CNI supports. If this is a coordinated effort with the major CNI providers so they can be aware of the new way of announcing features, this can be something doable.

jayunit100 · 2021-04-12T20:32:29Z

keps/sig-network/2136-np-version-status/README.md

+
+### User Stories
+
+#### Story 1 - Testing Features of Different NetworkPolicy Implementations


@cantbewong

jayunit100 · 2021-04-12T20:34:17Z

keps/sig-network/2136-np-version-status/README.md

+Kubernetes core, the existing versioning mechanisms in Kubernetes do
+not work well for it; instead of having a two-way version skew between
+"what features the user wants to use" and "what features the cluster
+supports", there is a three-way split between "what features the user


@cantbewong ^

jayunit100 · 2021-04-12T20:41:31Z

we discussed this in the networkpolicy subproject today ~ it seems like it might converge with matt's ab-initio cyclonus suite ~ https://github.com/mattfenwick/cyclonus ~ i.e. we could use the results from that to categorize CNIs in terms of support matrices somehow discussing this with @cantbewong and other folks

danwinship · 2021-04-13T14:37:57Z

So I was trying to squish two somewhat-but-not-totally-related things into this KEP, and I feel like they've ended up having mostly non-overlapping sets of issues:

The versioning stuff turns out to not really solve the problem I want it to as discussed in KEP-2136: NetworkPolicy Versioning and Status #2137 (comment) (though we could still potentially say "as long as you use kubectl create --validate=true or equivalent, then you get full checking, but if you just blindly create an object, eg using client-go, then there are no guarantees" (though maybe we could also provide library code to do the equivalent of kubectl create --validate=true?) It needs more thinking-about.
The Enforcing status stuff runs into all sorts of tricky edge cases and probably can't be made to support even all of the cases that the e2e cases care about. (eg, create a pod and a NetworkPolicy, and then change the pod labels; how do you know when the NP applies to it?) So we may want to just drop that?

fejta-bot · 2021-05-13T15:31:06Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

squeed · 2021-05-17T20:29:23Z

keps/sig-network/2136-np-version-status/README.md

+	// +patchMergeKey=type
+	// +patchStrategy=merge
+	Conditions []NetworkPolicyStatusCondition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type" protobuf:"bytes,3,rep,name=conditions"`
+}


Any reason to not add ObservedGeneration?

fejta-bot · 2021-06-16T21:25:31Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-06-16T21:25:37Z

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

srampal · 2021-09-15T19:02:14Z

@danwinship I see this KEP has been closed but a discussion came up again to revisit this in the network policy subgroup meeting asking for additional feedback hence adding this comment ...

#2137 (comment)

I agree the versioning piece is a different function so should be separate in any case.
There is some value in having just the "Supported" field (which needs to also cover the case of partial support).
Wrt the issues around "Enforcing" status accuracy, one could consider providing 3 possible status values. The Enforcing status for an instance of policy is either (a) False (b) In-progress/ Partial (c) True. "False" is more or less equivalent to not supported. In-progress/ Partial allows someone to see that certainly some traffic is getting affected which may help for troubleshooting. Use of the keyword "Partial" also covers cases where the CNI plugin only partially implements the logic of the policy (for instance supports IPBlock but not IPBlock Exception). "True" implies the policy is fully being enforced on all applicable pods and nodes. If the E2E tests see the value as True, they need not retry testing but if they do see "in-progress" they may choose to retry + timeout like today.

In most common cases, it will be possible for a CNI plugin to determine whether the Enforcing status is True (a guideline for controllers could be that if no data plane updates were performed for 3 reconciliation loops or 10 seconds (whichever is greater), and all features of the policy instance are supported, then the Enforcing status can be declared to be True else can remain "In-progress/ partial".

Anyway .. just wanted to add my 2c to the KEP in case there is a motion to reopen it in future. I cant say I have a super strong position on it.

k8s-ci-robot requested review from aojea, jayunit100, rikatz and thockin November 9, 2020 14:31

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 9, 2020

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/network Categorizes an issue or PR as relevant to SIG Network. labels Nov 9, 2020

danwinship mentioned this pull request Nov 9, 2020

NetworkPolicy versioning and status #2136

Closed

4 tasks

jayunit100 reviewed Nov 9, 2020

View reviewed changes

danwinship force-pushed the networkpolicy-versioning branch from 4d9a6d5 to b7abaef Compare November 10, 2020 15:31

danwinship mentioned this pull request Nov 10, 2020

WIP Add a matchName selection option to the NetworkPolicy namespace selector API #2113

Closed

aojea reviewed Nov 10, 2020

View reviewed changes

vpickard mentioned this pull request Nov 11, 2020

Add retry to failed network policy tests kubernetes/kubernetes#96476

Closed

vpickard reviewed Nov 11, 2020

View reviewed changes

KEP-2136: NetworkPolicy Versioning and Status

289f136

danwinship force-pushed the networkpolicy-versioning branch from b7abaef to 289f136 Compare November 17, 2020 13:43

thockin reviewed Nov 17, 2020

View reviewed changes

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 10, 2021

rikatz mentioned this pull request Apr 1, 2021

Add support for cluster-scoped AdminNetworkPolicy resource #2522

Merged

jayunit100 reviewed Apr 12, 2021

View reviewed changes

rikatz mentioned this pull request May 12, 2021

Propose EndPort graduation to Beta #2709

Merged

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 13, 2021

squeed reviewed May 17, 2021

View reviewed changes

k8s-ci-robot closed this Jun 16, 2021

tfherbert mentioned this pull request Jun 30, 2021

KEP-2732 NetworkPolicy Versioning #2806

Closed

astoycos mentioned this pull request Oct 6, 2022

Fix netpol internal locking ovn-kubernetes/ovn-kubernetes#3205

Merged


		### User Stories

		#### Story 1 - Testing Features of Different NetworkPolicy Implementations

KEP-2136: NetworkPolicy Versioning and Status #2137

KEP-2136: NetworkPolicy Versioning and Status #2137

Conversation

danwinship commented Nov 9, 2020 • edited Loading

k8s-ci-robot commented Nov 9, 2020

Choose a reason for hiding this comment

danwinship Nov 9, 2020 • edited Loading

Choose a reason for hiding this comment

jayunit100 Nov 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayunit100 Nov 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayunit100 Nov 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayunit100 commented Nov 9, 2020 • edited Loading

danwinship commented Nov 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vpickard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rikatz commented Nov 12, 2020 • edited Loading

jayunit100 commented Nov 12, 2020

danwinship commented Nov 17, 2020

danwinship commented Nov 17, 2020 • edited Loading

danwinship commented Nov 17, 2020

k8s-ci-robot commented Nov 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayunit100 commented Dec 10, 2020

fejta-bot commented Mar 10, 2021

rikatz commented Mar 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jayunit100 commented Apr 12, 2021

danwinship commented Apr 13, 2021 • edited Loading

fejta-bot commented May 13, 2021

Choose a reason for hiding this comment

fejta-bot commented Jun 16, 2021

k8s-ci-robot commented Jun 16, 2021

srampal commented Sep 15, 2021

danwinship commented Nov 9, 2020 •

edited

Loading

danwinship Nov 9, 2020 •

edited

Loading

jayunit100 Nov 9, 2020 •

edited

Loading

jayunit100 Nov 9, 2020 •

edited

Loading

jayunit100 Nov 9, 2020 •

edited

Loading

jayunit100 commented Nov 9, 2020 •

edited

Loading

rikatz commented Nov 12, 2020 •

edited

Loading

danwinship commented Nov 17, 2020 •

edited

Loading

danwinship commented Apr 13, 2021 •

edited

Loading