Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add webhook to trigger status reconciliations faster #903

Closed
a-thaler opened this issue Mar 18, 2024 · 1 comment
Closed

Add webhook to trigger status reconciliations faster #903

a-thaler opened this issue Mar 18, 2024 · 1 comment
Assignees
Labels
area/manager Manager or module changes kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@a-thaler
Copy link
Collaborator

a-thaler commented Mar 18, 2024

Description
As part of #425 the detection of a bad situation by prometheus should result in an immediate status change in the related k8s resource. As the manager is pulling prometheus, that can take up to one minute. Goal is to introduce a push notification by leveraging the alertmanager protocol to have an immediate reaction in the manager.

Details

  • Have a dedicated a path exposing a webhook
  • Configure prometheus to use the webhook as alertmanager endpoint
  • The webhook will trigger reconcilation
  • Have a metric counting the invocations, being labeled with the alert type
  • Have a test assuring that the webhook gets called, verified by metrics

Criterias

  • An alert on prometheus gets propagated immediately (before you had to wait for next reconcilation)
  • Operator can see usage of webhook via metrics
  • the self-monitor is allowed to access the manager port of the webhook only, no other port

Links
https://github.com/kyma-project/telemetry-manager/pull/753/files

@a-thaler a-thaler added area/manager Manager or module changes kind/feature Categorizes issue or PR as related to a new feature. labels Mar 18, 2024
@skhalash skhalash self-assigned this Mar 19, 2024
@skhalash
Copy link
Collaborator

All acceptance criteria were implemented except the last one:

  • Since it's not feasible to whitelist Kubernetes API server IP in the self-monitor network policy, the existing "loose" policy is left unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/manager Manager or module changes kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants