Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Custom Health Checks for Kustomization using Common Expression Language(CEL) #4528

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

souleb
Copy link
Member

@souleb souleb commented Jan 5, 2024

This RFC proposes to add support for custom heath checks during the health check phase of a kustomization reconciliation using CEL expressions.

@souleb
Copy link
Member Author

souleb commented Jan 5, 2024

cc @tmmorin

@stefanprodan stefanprodan added the area/rfc Feature request proposals in the RFC format label Jan 11, 2024
// Kind of the custom health check.
// +required
Kind string `json:"kind"`
// InProgress is the CEL expression that verifies that the status
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that it would be nice to here have a Name field, to allow definitng a healthCheck applicable to a specific resource, with a finer grained granularity than the CRD.

The kind of scenario where this would be relevant would be situations where a Kustomization would produce multiple resources of the same kind but the right healthcheck to do would be different among those resources.

An optional Name field would be interpreted as: the healthCheck to use for a resource of Kind Y and Name X, would be the item in CustomHealthCheckExprs that specifies Kind Y and Name X if any, or if there is none an item for Kind Y and Name unspecified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The health check expression is common for all instances of a Kind version, as it targets the status conditions which are the same for all. Can you give an example where you would write different CEL expressions for the same Kind?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will gladly admit not having any exact use case in mind.

The idea expressed here might be one, though, but whether or not it's relevant is certainly debatable. The intersection of "I want this Deployment to be healhcheck'd in a specific way" and "my Kustomization produces multiple Deploments" is perhaps empty.

From a user standpoint, it simply feels like "if this is going to be extensible, why not make it fully flexible". But from an implementation and code maintenance standpoint, I would understand that this idea might be perceived as over-engineering.

I just wanted to share this to see how relevant and implementable the idea might be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you group objects based on expected behaviour? Or chain Kustomizations?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the slow feedback

yes @souleb there would be workarounds

and this is overall a very useful evolution to what we have today which works pretty well for many cases already, so the wise path is of course, to keep it simple

@tmmorin
Copy link

tmmorin commented Jan 23, 2024

Thanks for starting this RFC, which looks very promising. 👍


### Non-Goals

- We do not plan to support custom `healthChecks` for core resources.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why this restriction ?

It seems perhaps arbitrary to not allow custom heathChecks for core resources. For instance for a Deployment, depending on the situation we might sometimes want ready to mean "all pods are ready", or "at least one pod is ready", or "at least minAvailable pods are ready".

Copy link
Member

@stefanprodan stefanprodan Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this is that kstatus covers all the builtin types, excluding them on a per kind bases is not possible, so we would need to drop them all and make users write hundreds of lines of CEL.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seen from a user standpoint it feel like this might be implemented as "use kstatus for kind XXX unless CustomHealthChecksExprs defines something for XXX".

Looking closer, I understand that the proposal is to reuse the kstatus statusreaders extension point.

However, after looking at kstatus code, I was wondering if the built-in statusreaders couldn't be all dropped and then the ones not covered by any CustomHealthChecksExprs would be added back (NewReplicaSetStatusReader/NewDeploymentResourceReader/NewStatefulSetResourceReader).

(Of course, don't delay this RFC just based on this topic, I haven't digged enough to have any idea of what is reasonably feasible with kstatus.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scope of this RFC is to allow defining health checks for custom resources. Once this is implemented, we could consider expanding CEL to native resources in a followup RFC, if it deems to be feasible and if Flux users will ask for it. So far, no one asked for more than what kstatus does for builtin kinds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expand on Stefan answer, the default poller uses a slice of readers:

  statusReaders = append(statusReaders,
    deploymentStatusReader,
    statefulSetStatusReader,
    replicaSetStatusReader,
    defaultStatusReader,
  )

There is a Support() method that will be used to find a reader that support a given objet Kind. The defaultStatusReader is the one that is used for all custom resources. It is able to compute the status if the CR is compliant with Kstatus. The actual proposal is to expand this list. So in order to support native kinds we would need to remove the other Readers.

Another thing to consider is the SSA. Objects are applied in stages, and we perform a normalization of all native kinds mainly working around various upstream Kubernetes API bugs. This implies that we expect a certain behaviour. Also normalization could invalidate a CEL expression as we pass the whole object to the evaluator.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to keeping the thing simple

my ideas of refinements may prove relevant later... or not ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/rfc Feature request proposals in the RFC format
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants