Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Prometheus metrics cache events and stale events #9826

Merged
merged 9 commits into from
Feb 11, 2022

Conversation

rcanderson23
Copy link
Contributor

@rcanderson23 rcanderson23 commented Jan 18, 2022

This adds two Prometheus metrics teleport_cache_events and teleport_cache_stale_events with one label indicating the service.

# HELP teleport_cache_events_received Number events received by a Teleport service cache
# TYPE teleport_cache_events_received counter
teleport_cache_events{cache_component="apps"} 8
teleport_cache_events{cache_component="auth"} 33
teleport_cache_events{cache_component="proxy"} 25
teleport_cache_events{cache_component="remote-proxy"} 25
teleport_cache_events{cache_component="db"} 17
teleport_cache_events{cache_component="kube"} 15

# HELP teleport_cache_stale_events_received Number of stale events received by a Teleport service cache. A high percentage of stale events can indicate a degraded backend.
# TYPE teleport_cache_stale_events_received counter
teleport_cache_stale_events{cache_component="auth"} 5

Changes log statement to debug as requested in issue.

Part of https://github.com/gravitational/cloud/issues/946 and closes #8802

lib/cache/cache.go Outdated Show resolved Hide resolved
@rcanderson23 rcanderson23 requested review from ptgott and r0mant February 2, 2022 21:24
@russjones russjones added the cloud Cloud label Feb 10, 2022
@russjones russjones requested review from fspmarshall, rosstimothy and espadolini and removed request for xinding33 and r0mant February 10, 2022 00:20
@russjones
Copy link
Contributor

@rosstimothy @espadolini Can either of you review and approve?

Copy link
Contributor

@espadolini espadolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but wouldn't it be worth it to also add a metric for cache startup/reset?

@rcanderson23 rcanderson23 requested a review from r0mant February 10, 2022 17:07
@rcanderson23 rcanderson23 merged commit edff372 into master Feb 11, 2022
@rcanderson23 rcanderson23 deleted the carson/cache-stale-event-metric branch February 11, 2022 16:14
rcanderson23 added a commit that referenced this pull request Feb 11, 2022
This adds two Prometheus metrics teleport_cache_events and teleport_cache_stale_events with one label indicating the service.
rcanderson23 added a commit that referenced this pull request Feb 16, 2022
This adds two Prometheus metrics teleport_cache_events and teleport_cache_stale_events with one label indicating the service.
@webvictim webvictim mentioned this pull request Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make stale event warning a Prometheus metric
7 participants