Optimizing Gatekeeper policy with large inventory #563

skaven81 · 2024-03-12T18:10:02Z

skaven81
Mar 12, 2024

Hi, folks.

I have implemented a Gatekeeper Constraint Template that allows us to create Constraints that prevent tenants of a Kubernetes cluster from creating Pods if they have too many Pods in a Pending state, or too many Pods in a Running state, etc.

In order to get the counts, the Rego must (as best I can tell) be written such that it enumerates the entire pod inventory. On our large multi-tenant clusters that can have several hundred namespaces and tens of thousands of Pods, this runs very slowly, taking several seconds to execute in the best case, and a few tens of seconds to execute in the worst case (and thus falls afoul of our validating webhook timeout).

I'd like to understand how best to optimize this. Here's a snippet of the policy I have now:

        object_ns := input.review.namespace

        # Pre-compute the list of pods in the project (and thus in the namespace as well).
        # This speeds up the per-namespace and per-project computation by avoiding the need
        # to iterate across the entire list of Pods in the inventory twice
        object_projectId := data.inventory.cluster["v1"]["Namespace"][object_ns].metadata.labels["field.cattle.io/projectId"]
        project_pods := [ pod |
                          some ns, name
                          data.inventory.cluster["v1"]["Namespace"][ns].metadata.labels["field.cattle.io/projectId"] == object_projectId
                          pod := data.inventory.namespace[ns]["v1"]["Pod"][name]
                        ]

        # per-namespace violation
        violation contains {"msg": msg} if {
          passes_common_constraint_checks
          some rule in per_namespace_maximums
          pods_matching_rule := { pod_name |
                                  some pod in project_pods
                                  pod.metadata.namespace == object_ns
                                  pod_phase_match(pod, rule.statuses)
                                  pod_name := pod.metadata.name
                                }
          pod_count := count(pods_matching_rule)
          pod_count > rule.max
          msg := sprintf("Cannot create Pod, as there are too many Pods in %v phase in the %v namespace: current=%v, maximum=%v", [rule.statuses, object_ns, pod_count
        }

        # per-project violation
        violation contains {"msg": msg} if {
          passes_common_constraint_checks
          some rule in per_project_maximums
          pods_matching_rule := { fullname |
                                  some pod in project_pods
                                  pod_phase_match(pod, rule.statuses)
                                  fullname := sprintf("%v-%v", [ pod.metadata.namespace, pod.metadata.name ])
                                }
          pod_count := count(pods_matching_rule)
          pod_count > rule.max
          msg := sprintf("Cannot create Pod, as there are too many Pods in %v phase in the %v Project: current=%v, maximum=%v", [rule.statuses, object_projectId, pod_
        }

As best I can tell, the evaluation and assignment of project_pods is what takes the longest.

But as you can see, the objective is not to actually look at any specific details of the Pods (aside from seeing what phase they're in). The goal is to just count the pods and trigger a violation if those counts exceed limits defined in the Constraint.

So I'm considering setting up an external data provider that provides these pod counts directly. The external data provider would periodically (probably once a minute) fetch the list of all pods from the kube-apiserver and then generate the list of counts per-project and per-namespace, then provide those (cached) values as key-value pairs that can be retrieved by Gatekeeper. Gatekeeper would further cache the provider API responses for a minute or two, so the provider would not need to be particularly powerful. Then the policy could be simplified to simply do an external data provider lookup for the counts needed, making the policy run in constant time.

Of course, the downside to this approach is the complexity of adding and maintaining an external data provider. There's also the (not trivial) extra load added to the kube-apiserver and etcd of a controller out there querying all the Pods every minute. I could perhaps make the controller establish a watch on Pods and just increment/decrement counters at runtime to make that more efficient.

I would prefer not to go to all this trouble if there's a Rego trick that would optimize this sort of computation. Any suggestions?

Aside: you know ResourceQuotas exist, right? why not just deploy ResourceQuotas in every namespace to control this?

Yes, I'm aware of ResourceQuotas. The problem is, the quotas for pod counts are just a static count. You can't set more complex limits like "you may have only 10 Pending pods in this namespace, after which no new Pods may be created; you may run up to 50 Pods if they are all running; You may not create any more Pods if you have more than 5 Pods in CrashLoopBackOff" -- these sorts of rules are not expressable in a ResourceQuota, but are essential for us to keep our multi-tenant on-prem bare-metal clusters healthy.

maxsmythe · 2024-03-12T19:59:57Z

maxsmythe
Mar 12, 2024
Maintainer

Hi Paul!

This is the main page I'm aware of WRT Rego performance optimization:

https://www.openpolicyagent.org/docs/latest/policy-performance/

Here is the entry point for Rego code that the constraint framework generates, in case that affects optimizations (note the use of the with statement):

https://github.com/open-policy-agent/frameworks/blob/0ae71292da16e30f3bcfe931f1af21b431ffa264/constraint/pkg/client/drivers/rego/rego.go#L22-L47

As far as I can tell, Rego does not perform any kind of indexing over cached data itself, merely Rego code. Happy to be corrected if wrong there.

Another idea could be to write a TargetHandler specifically for aggregating data at-cache-time. Probably re-using a lot of the code from Gatekeeper's TargetHandler. Here is where Gatekeeper adds its TargetHandler to the Constraint Framework. You would be able to write constraint templates against this new target (as opposed to k8sadmission).

Writing your own TargetHandler would trade the infrastructure complexity of maintaining a separate external data cache, watches, etc. with the complexity of maintaining a Gatekeeper fork. It's not clear to me which one overall would be less complexity over time.

3 replies

skaven81 Mar 12, 2024
Author

I've reviewed the Rego performance document but the examples it gives for "fast/indexed" vs "slow/evaluated" are difficult for me to draw a correlation to what I'm doing, other that the basic "data structures 101" stuff like "don't build a giant list of things then search through them linearly -- use a dict/hash instead!".

I did find experimentally that using array comprehensions [ ] were noticeably faster than set comprehensions { }, presumably because each generated item in the comprehension can simply be appended with an array, but with a set it has to be compared with all the other members of the set (or at least a hash computation has to be done). So an array comprehension is likely O(n) while a set comprehension is likely O(n^2) at the worst or O(n log n) at best.

Another idea could be to write a TargetHandler specifically for aggregating data at-cache-time

Is this an undocumented Gatekeeper feature? Or are you suggesting a Gatekeeper RFE? Because yeah, having built-in Gatekeeper aggregation features would be amazing. Being able to register an aggregation (perhaps using a custom resource, or maybe by extending SyncSets?) and then having Gatekeeper itself take care of counting the matching items and providing the aggregation data, would make constraints like this that need to count things, run in O(1) time (assuming the value they need is already cached by the aggregator).

But unless I'm missing something in your response, it sounds like there's nothing available in Gatekeeper today (other than external data) that I could use to make an order-of-magnitude improvement in performance of this constraint template.

maxsmythe Mar 13, 2024
Maintainer

Is this an undocumented Gatekeeper feature?

This is a feature that the Constraint Framework provides. The Constraint Framework provides the underlying logic for actual policy enforcement for Gatekeeper -- handles the caching of data, templates, constraints, evaluation, that kind of thing. Gatekeeper is mostly a set of controllers that feed data to/from the constraint framework.

Constraint Framework is platform-neutral, it relies on dependency injection to be trained on how to "speak Kubernetes".

This would be a fork of Gatekeeper that injects an additional Target that supports a more customized data cache. I'd expect it to be a very minor fork, given all of CF is initialized in main.go, but a fork is still a fork. The biggest risk is that CF's interfaces are still considered dev releases, so there is the possibility of needing to make adjustments to continue to satisfy the interface.

Being able to register an aggregation (perhaps using a custom resource, or maybe by extending SyncSets?) and then having Gatekeeper itself take care of counting the matching items and providing the aggregation data

That would indeed be cool! I'm not sure if there is a general solution for how to do this performantly with simple expressions. The risk would be G8r needing to re-compute the entire aggregation every time data is synced, which is expensive (hence you starting this thread). Simplifying forking (or using external data) gives users a way to optimize their aggregation (caching only necessary data, applying deltas where feasible, indexing, etc.) in ways that a more limited syntax may not be able to.

skaven81 Mar 16, 2024
Author

I'm not sure if there is a general solution for how to do this performantly with simple expressions. The risk would be G8r needing to re-compute the entire aggregation every time data is synced, which is expensive

If it helps, the solution I landed on for my external data provider uses KOPF's "in-memory indexing" system: https://kopf.readthedocs.io/en/stable/indexing/ which manages to abstract the task of indexing pretty cleverly, such that it only requires a simple one-line return statement in a decorated function to achieve a wide variety of different indexes. I then extended this with a few lines of code for reducing the indexes down to aggregated counts, which is then fed into Redis using predictable key names. The external data provider then provides a thin veneer in front of Redis to serve those keys.

To be honest, it would have been even easier if Gatekeeper natively supported connecting to a Redis database for external data lookups, and avoided the need to implement the HTTP layer entirely.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open Policy Agent

Optimizing Gatekeeper policy with large inventory #563

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Open Policy Agent

Optimizing Gatekeeper policy with large inventory #563

skaven81 Mar 12, 2024

Replies: 1 comment · 3 replies

maxsmythe Mar 12, 2024 Maintainer

skaven81 Mar 12, 2024 Author

maxsmythe Mar 13, 2024 Maintainer

skaven81 Mar 16, 2024 Author

skaven81
Mar 12, 2024

Replies: 1 comment 3 replies

maxsmythe
Mar 12, 2024
Maintainer

skaven81 Mar 12, 2024
Author

maxsmythe Mar 13, 2024
Maintainer

skaven81 Mar 16, 2024
Author