Automatic resource pruning EP #109

ryantking · 2022-01-28T21:12:35Z

Signed-off-by: Ryan King [email protected]

Signed-off-by: Ryan King <[email protected]>

jmrodri

First pass review. A few typos. I will do a more thorough review next week.

enhancements/automatic-resource-pruning.md

fgiloux

good work. Some input from my side

enhancements/automatic-resource-pruning.md

fgiloux · 2022-01-29T09:24:37Z

enhancements/automatic-resource-pruning.md

+
+#### Story 4
+
+As an operator author, I want to prune a custom resources with specific status information when there is a certain


"a custom resource"

I am not sure that we want to limit it to "status" informtaion. Fields in spec may in addition also get considered, a class for VolumeSnapshotContent for instance.

I did not take status to mean Kubernetes status field. But a more generic status as in a specific condition or state. @ryantking thoughts?

I think in practice it will be contained in the status field, but the best practice for where to store operator-maintained data is out of scope for this EP so yes this can refer to any generic state information stored on the resource.

enhancements/automatic-resource-pruning.md

fgiloux · 2022-01-29T09:38:12Z

enhancements/automatic-resource-pruning.md

+type IsPruneableFunc func(obj *runtime.Object) error
+
+// RegisterIsPruneableFunc registers a function to check whether it is safe to prune a resources of a certain type.
+func RegisterIsPrunableFunc(gvk schema.GroupVersionKind, isPruneable IsPruneableFunc) { /* ... */ }


I am not sure why IsPruneableFunc get registered and StrategyFunc not

I think because IsPrunableFunc has intimate knowledge of the GVK. While the Strategy is agnostic

enhancements/automatic-resource-pruning.md

jmrodri

Finished my review. A few suggested rephrasings. Nicely done.

- Swap word "ephemeral" out for "unbounded" - Add alternative Go API proposal. Signed-off-by: Ryan King <[email protected]>

enhancements/automatic-resource-pruning.md

Signed-off-by: Ryan King <[email protected]>

jmrodri · 2022-02-02T16:03:35Z

@ryantking the API proposed in Appendix A was easier for me to rationalize and understand. I was able to easily picture a workflow for that API. The API proposed in Appendix B was definitely more flexible but the workflow felt more awkward to me. Like it was more complicated but for little gain in the short term.

Basically I could not think of a scenario that was not doable with the API from Appendix A.

fgiloux · 2022-02-02T17:01:29Z

@ryantking
One thing that I am missing is that operators are often used in multi-tenant environments. Each tenant may want to have, if not a separate retention policy, at least the possibility to parameterize the retention policy.
Let take a CI/CD pipeline operator as an example. You may have a Pipeline custom resource, which creates pods for the processing. In case of nightly runs these pods (with the logs) may be kept for 3 days. In case of the execution of a Pipeline for a final release you may want to keep them for longer. Another scenario is that teams may want to have different retention periods for different projects.
With the approach in appendix A my understanding is that the IsPruneableFunc would need to be re-registered each time a new Pipeline is created possibly using a closure holding the retention period for each Pipeline. Is that the way you see it?

joelanford · 2022-02-02T17:21:21Z

enhancements/automatic-resource-pruning.md

+
+### Implementation-specific
+
+- What type of Kubernetes object should we generically work with? E.g. `metav1.Object`or `runtime.Object`?


My vote would be client.Object from controller-runtime, which is:

type Object interface { metav1.Object runtime.Object }

That interface supports all well-behaved root API types and users get compile-time checks rather than having to resort to runtime type assertions when they need to access the other half of the object.

@joelanford will this work with an operator that was written with library-go?

I don't know for sure, but I would think so. You can pass a client.Object to any function that accepts a metav1.Object or a runtime.Object, and all API root types (e.g. custom objects of CRDs and all built-in API types that are apply-able on-cluster) implement both interfaces.

joelanford · 2022-02-02T17:29:01Z

enhancements/automatic-resource-pruning.md

+An alternative approach would be adding this logic to the core SDK and scaffolding it optionally during operation
+generation. The primary drawbacks with this approach are the increased complexity to the implementation and adding it to
+existing operators.


A second alternative would be to implement a brand new operator that exposes a new set of prune APIs, e.g.:

PruneStrategy

ClusterPruneStrategy

Where the spec could look something like:

spec: objects: - group: example.com version: v1 kind: MyKind matchers: - selector: matchLabels: example.com/is-completed: true - maxAge: 10h default: matchers: - selector: matchLabels: example.com/is-completed: true - maxCount: 50

@joelanford would this operator be deployed with each operator? I can't see how a new operator would help individual operators cleanup their resources.

It would likely be deployed once per cluster, and then two scenarios would be in play:

each operator could self-configure pruning by laying down one of the prune strategy objects, either as part of the operator deployment itself or associated with an operand

a cluster admin could provide their own prune strategy objects to cover their specific needs.

The feature already exists for jobs and TTL. Here is the KEP, which foresees that it can be extended to pods.

IMHO a drawback of a generic controller is that it is either limited to the greatest common factor, which basically means the resource metadata or you need to implement special logic for each resource.
A good example of that it the quota controller. Generic is quota on count of instances. And then you have specialized components for pods, services and PVCs with specific logic for resource requests and limits.

Having the specialized logic together with the controller that created the resources may offer more flexibility to the operator author.

@fgiloux Yep, all good points. I'm not necessarily suggesting we should actually pursue the alternative, but it may be helpful to consider it in whatever design we come up with.

The main motivation for something like a separate controller is to make it extremely easy for an operator author to say, "here, prune this stuff in this way" without having to worry about the details and mechanics of how all that happens.

We could implement this as a library that operator authors plug into their controllers as well. For example, perhaps they would just add something in main.go like this:

maxAge := time.Minute * 60 if err := prune.Prune(ctx, mgr, prune.Config{ Selectors: prune.Selectors{ &myapi.OtherKind{}: prune.MaxAge(maxAge), }, PruneInterval: 5 * time.Minute, }); err != nll { setupLog.Error(err, "pruner failed") os.Exit(1) }

where the prune package has something like:

package prune type Selectors map[client.Object]Selector type Selector interface { Select(ctx context.Context, in []client.Object) ([]client.Object, error) } type SelectorFunc func(ctx context.Context, in []client.Object) ([]client.Object, error) func (f SelectorFunc) Select(ctx context.Context, in []client.Object) ([]client.Object, error) { return f(ctx, in) } func MaxAge(maxAge time.Duration) SelectorFunc { return SelectorFunc(func(ctx context.Context, in []client.Object) ([]client.Object, error) { var out []client.Object for _, obj := range in { if in.GetCreationTimestamp().Before(time.Now().Add(-maxAge)) { out = append(out, obj) } } return out }) }

fgiloux · 2022-04-05T07:41:57Z

enhancements/automatic-resource-pruning.md

+// IsPruneableFunc is a function that checks a the data of an object to see whether or not it is safe to prune it.
+// It should return `nil` if it is safe to prune, `ErrUnpruneable` if it is unsafe, or another error.
+// It should safely assert the object is the expected type, otherwise it might panic.
+type IsPruneableFunc func(obj *runtime.Object) error


It would be nice to be able to pass a logger. Either explicitly (my preference) or through the context as controller-runtime does. A logger should be passed to StrategyFunc and Prune which have context as parameter, in the same way.

fgiloux · 2022-04-05T07:45:00Z

@ryantking I have had a look at the latest version of the enhancement proposal and it looks good to me. The only things that I have slight concerns with are:

the tenancy aspect as mentioned previously. I don't see how different strategies can be configured by tenant for the same resource type.
getting a logger externally configured and passed to StrategyFunc and IsPrunableFunc

ryantking · 2022-04-05T15:37:23Z

@fgiloux As far as the tenancy aspect goes, in the example of nightly pipelines vs release pipelines, I think that you could solve it with an IsPruneable func?

func pipelineIsPruneable(obj client.Object) error {
    pipeline, ok := obj.(*pipelinesv1.Pipeline)
    // check assertion
    if isRelease(pipeline) and age(pipeline) < 3 * time.Week {
        return ErrUnprunable
    }
    // additional logic
    return nil
}

Would that pseudo-code work in theory? Another option for multi-tenancy is to create multiple Pruner objects, maybe one per project? I'll integrate the registry type from appendix B into A so each pruner can have its own set of IsPruneable funcs if that is desirable. I'm imagining a pattern similar to http.ServeMux where new http.Server objects start with http.DefaultServeMux, but the user can swap that out to a custom one.

Re: logging, I'll make sure the proposal incorporates the logging changes you made to the current implementation.

fgiloux · 2022-04-06T07:56:07Z

@ryantking I think it is important to consider personas:

the operator author designs capabilities and leverages this library for pruning. It does not decide on whether the user is in a multi-tenant environment and how each namespace or custom resource is configured. There is no upfront knowledge of something like isRelease: in cluster A they may decide to have only CI pipelines and in cluster B only release pipelines. In a smaller company they may only have a single cluster and dev, CI, tests, release pipelines all on it. They may also want to have different options for product 1 and product 2.
All the scenarios described above need to be configurable by a user/cluster or namespace administrator.
The question is how do we design this library to make it easy for the operator author to have different parameters (retention time or number of instances) taken in consideration during pruning. These parameters are only known at runtime so that something statically coded like "isRelease" is not an option. The approach of creating multiple Pruners, one by namespace could work but I am concerned by the cost when there are 10,000 namespaces. If the retention parameters is directly configured in the resource or retrievable for instance through an ownerRef the logic could be embedded into pipelineIsPruneable something like here.

fgiloux · 2022-04-06T08:00:23Z

enhancements/automatic-resource-pruning.md

+
+// Pruner is an object that runs a prune job.
+type Pruner struct {
+  // ...


Having had a closer look at the EP I would very much like to see what you intend to have in type Pruner struct to understand what can be configurable through type PrunerOption func(p *Pruner)

ryantking · 2022-04-06T15:40:23Z

@fgiloux I'm still struggling to understand which specific design choices in the proposed API limit the use cases you are laying out. If you are giving me use cases that you want to make sure are covered, then can you present them as user stories so I can add them to the EP? If there are use cases that you think are not possible with the proposed API then I think we should talk offline in greater detail to figure out what changes I need to make. The way I look at it, there are two real ways the user implements prune logic:

The IsPruneable function registry. This allows the operator author to define per-GVK when a resource is eligible for pruning. This function can know about namespaces or read configuration from a different resource or do basically anything that can be determined from the object itself, the type of the object, and whatever information it can retrieve from external sources including the cluster itself.
The StrategyFunc: This allows the operator author to define how many pruneable objects should be removed, and how to select the specific objects from the set.

A concrete example would be an IsPruneable function that determines if a pipeline object is a release or nightly job then a strategy that keeps the latest 32 nightly runs. You could even create two pruners, one for releases that keeps only the supported versions' pipelines and one for nightly that keeps the last 30 days worth.

Let me know your thoughts or feel free to reach out if you want to talk in more detail.

fgiloux · 2022-05-03T18:25:25Z

@ryantking I am fine with the proposal. Mind that I have little karma on this repository as I am not part of the Operator Framework organisation.

add auto-pruning EP

d0b6bf6

Signed-off-by: Ryan King <[email protected]>

ryantking requested review from jmrodri, joelanford and gallettilance January 28, 2022 21:12

jmrodri reviewed Jan 28, 2022

View reviewed changes

fgiloux reviewed Jan 29, 2022

View reviewed changes

jmrodri reviewed Jan 29, 2022

View reviewed changes

enhancements/automatic-resource-pruning.md Show resolved Hide resolved

jmrodri reviewed Jan 29, 2022

View reviewed changes

enhancements/automatic-resource-pruning.md Outdated Show resolved Hide resolved

jmrodri requested changes Jan 29, 2022

View reviewed changes

update based on PR comments

c74b8d9

- Swap word "ephemeral" out for "unbounded" - Add alternative Go API proposal. Signed-off-by: Ryan King <[email protected]>

fgiloux reviewed Feb 1, 2022

View reviewed changes

enhancements/automatic-resource-pruning.md Outdated Show resolved Hide resolved

ryantking added 3 commits February 1, 2022 14:37

fix typo

6b88da2

Signed-off-by: Ryan King <[email protected]>

add open question

f5db387

Signed-off-by: Ryan King <[email protected]>

fix function names

cf5e9aa

Signed-off-by: Ryan King <[email protected]>

jmrodri approved these changes Feb 2, 2022

View reviewed changes

joelanford reviewed Feb 2, 2022

View reviewed changes

fgiloux mentioned this pull request Mar 21, 2022

Prune logging cannot be set operator-framework/operator-lib#99

Closed

fgiloux mentioned this pull request Mar 29, 2022

Closes #99: Make logging configurable. operator-framework/operator-lib#100

Merged

ryantking added 2 commits March 29, 2022 10:41

Add additional alternative proposal

687d618

Merge branch 'master' into autopruning

15d5381

fgiloux reviewed Apr 5, 2022

View reviewed changes

fgiloux reviewed Apr 6, 2022

View reviewed changes

everettraven mentioned this pull request Apr 8, 2022

Feature/auto pruning operator-framework/operator-lib#105

Merged

Remove unused API

02cd266

ryantking merged commit fe91f5e into operator-framework:master May 3, 2022

ryantking deleted the autopruning branch May 3, 2022 20:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic resource pruning EP #109

Automatic resource pruning EP #109

ryantking commented Jan 28, 2022

jmrodri left a comment

fgiloux left a comment

fgiloux Jan 29, 2022

fgiloux Jan 29, 2022

jmrodri Jan 29, 2022

jmrodri Jan 29, 2022

ryantking Jan 31, 2022

fgiloux Jan 29, 2022

jmrodri Jan 29, 2022

jmrodri left a comment

jmrodri commented Feb 2, 2022

fgiloux commented Feb 2, 2022

joelanford Feb 2, 2022 •

edited

Loading

jmrodri Feb 4, 2022

joelanford Feb 4, 2022

joelanford Feb 2, 2022

jmrodri Feb 4, 2022

joelanford Feb 4, 2022

fgiloux Feb 4, 2022

joelanford Feb 4, 2022

fgiloux Apr 5, 2022

fgiloux commented Apr 5, 2022

ryantking commented Apr 5, 2022

fgiloux commented Apr 6, 2022

fgiloux Apr 6, 2022

ryantking commented Apr 6, 2022

fgiloux commented May 3, 2022


		#### Story 4

		As an operator author, I want to prune a custom resources with specific status information when there is a certain


		### Implementation-specific

		- What type of Kubernetes object should we generically work with? E.g. `metav1.Object`or `runtime.Object`?

Automatic resource pruning EP #109

Automatic resource pruning EP #109

Conversation

ryantking commented Jan 28, 2022

jmrodri left a comment

Choose a reason for hiding this comment

fgiloux left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmrodri left a comment

Choose a reason for hiding this comment

jmrodri commented Feb 2, 2022

fgiloux commented Feb 2, 2022

joelanford Feb 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fgiloux commented Apr 5, 2022

ryantking commented Apr 5, 2022

fgiloux commented Apr 6, 2022

Choose a reason for hiding this comment

ryantking commented Apr 6, 2022

fgiloux commented May 3, 2022

joelanford Feb 2, 2022 •

edited

Loading