The Agent/Gateway healthy condition is more reliable #1236
Labels
area/manager
Manager or module changes
kind/feature
Categorizes issue or PR as related to a new feature.
Milestone
Description
As part of #728 the user should be able to define alerts on the kyma module status. At the moment, there are situations where the telemetry module state is "unhealthy" by design while there is no unhealthiness. That happens mainly in upgrade procedures in node eviction situations where gracefully pods are getting replaced. The rollout is non-disruptive, but the module state indicates a disruption. and an alert will be fired.
The goal is to realize that situations differently so that the module gets "unhealthy" only in problematic situations where the user should react.
Hereby, the gateway/agentHealthy condition needs to be improved by not just checking if the pods are running, but checking for unhealthy conditions instead.
Criterias
Hints
The manager should not look anymore to the desired vs available replicas, instead it should check if a minimal amount of pods is available and if all pods are in a healthy state. We might have to watch the pods of the components additionally in order to react to pod status changes.
The text was updated successfully, but these errors were encountered: