Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Metrics #490

Merged
merged 26 commits into from
Oct 21, 2022
Merged

Add Metrics #490

merged 26 commits into from
Oct 21, 2022

Conversation

raghu-nandan-bs
Copy link
Contributor

@raghu-nandan-bs raghu-nandan-bs commented Sep 26, 2022

Fixes #

Changes proposed on the PR:

  • Add metrics instrumentation.

Metrics

example:

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.0958e-05
go_gc_duration_seconds{quantile="0.25"} 0.0001185
go_gc_duration_seconds{quantile="0.5"} 0.000258375
go_gc_duration_seconds{quantile="0.75"} 0.000497416
go_gc_duration_seconds{quantile="1"} 0.096007292
go_gc_duration_seconds_sum 0.552302211
go_gc_duration_seconds_count 122
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 29
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.17.13"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 5.786408e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 3.0117592e+08
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.577539e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 2.044792e+06
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 5.6802e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 5.786408e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 8.011776e+06
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 7.815168e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 32390
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 6.71744e+06
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 1.5826944e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.665748876690159e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 2.077182e+06
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 6000
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 123488
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 163840
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 9.646672e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.033597e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 950272
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 950272
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 2.5248776e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 11
# HELP kooper_controller_event_in_queue_duration_seconds The duration of an event in the queue.
# TYPE kooper_controller_event_in_queue_duration_seconds histogram
kooper_controller_event_in_queue_duration_seconds_bucket{controller="redisfailover",le="0.01"} 33
kooper_controller_event_in_queue_duration_seconds_bucket{controller="redisfailover",le="0.05"} 33
kooper_controller_event_in_queue_duration_seconds_bucket{controller="redisfailover",le="0.1"} 33
kooper_controller_event_in_queue_duration_seconds_bucket{controller="redisfailover",le="0.25"} 33
kooper_controller_event_in_queue_duration_seconds_bucket{controller="redisfailover",le="0.5"} 33
kooper_controller_event_in_queue_duration_seconds_bucket{controller="redisfailover",le="1"} 33
kooper_controller_event_in_queue_duration_seconds_bucket{controller="redisfailover",le="3"} 33
kooper_controller_event_in_queue_duration_seconds_bucket{controller="redisfailover",le="10"} 33
kooper_controller_event_in_queue_duration_seconds_bucket{controller="redisfailover",le="20"} 33
kooper_controller_event_in_queue_duration_seconds_bucket{controller="redisfailover",le="60"} 33
kooper_controller_event_in_queue_duration_seconds_bucket{controller="redisfailover",le="150"} 33
kooper_controller_event_in_queue_duration_seconds_bucket{controller="redisfailover",le="300"} 33
kooper_controller_event_in_queue_duration_seconds_bucket{controller="redisfailover",le="+Inf"} 33
kooper_controller_event_in_queue_duration_seconds_sum{controller="redisfailover"} 0.001836207
kooper_controller_event_in_queue_duration_seconds_count{controller="redisfailover"} 33
# HELP kooper_controller_event_queue_length Length of the controller resource queue.
# TYPE kooper_controller_event_queue_length gauge
kooper_controller_event_queue_length{controller="redisfailover"} 0
# HELP kooper_controller_processed_event_duration_seconds The duration for an event to be processed.
# TYPE kooper_controller_processed_event_duration_seconds histogram
kooper_controller_processed_event_duration_seconds_bucket{controller="redisfailover",success="true",le="0.005"} 0
kooper_controller_processed_event_duration_seconds_bucket{controller="redisfailover",success="true",le="0.01"} 0
kooper_controller_processed_event_duration_seconds_bucket{controller="redisfailover",success="true",le="0.025"} 0
kooper_controller_processed_event_duration_seconds_bucket{controller="redisfailover",success="true",le="0.05"} 0
kooper_controller_processed_event_duration_seconds_bucket{controller="redisfailover",success="true",le="0.1"} 0
kooper_controller_processed_event_duration_seconds_bucket{controller="redisfailover",success="true",le="0.25"} 0
kooper_controller_processed_event_duration_seconds_bucket{controller="redisfailover",success="true",le="0.5"} 0
kooper_controller_processed_event_duration_seconds_bucket{controller="redisfailover",success="true",le="1"} 6
kooper_controller_processed_event_duration_seconds_bucket{controller="redisfailover",success="true",le="2.5"} 33
kooper_controller_processed_event_duration_seconds_bucket{controller="redisfailover",success="true",le="5"} 33
kooper_controller_processed_event_duration_seconds_bucket{controller="redisfailover",success="true",le="10"} 33
kooper_controller_processed_event_duration_seconds_bucket{controller="redisfailover",success="true",le="+Inf"} 33
kooper_controller_processed_event_duration_seconds_sum{controller="redisfailover",success="true"} 42.105479689999996
kooper_controller_processed_event_duration_seconds_count{controller="redisfailover",success="true"} 33
# HELP kooper_controller_queued_events_total Total number of events queued.
# TYPE kooper_controller_queued_events_total counter
kooper_controller_queued_events_total{controller="redisfailover",requeue="false"} 33
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 15.05
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 13
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 3.9358464e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.66574720408e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 7.60573952e+08
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 84
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
# HELP redis_operator_controller_cluster_ok Number of failover clusters managed by the operator.
# TYPE redis_operator_controller_cluster_ok gauge
redis_operator_controller_cluster_ok{name="redisfailover",namespace="default"} 1
# HELP redis_operator_controller_ensure_resource number of successful 'ensure' operations on a resource performed by the controller.
# TYPE redis_operator_controller_ensure_resource counter
redis_operator_controller_ensure_resource{object_kind="ConfigMap",object_name="rfr-readiness-redisfailover",object_namespace="default",resource_name="redisfailover",status="SUCCESS"} 33
redis_operator_controller_ensure_resource{object_kind="ConfigMap",object_name="rfr-redisfailover",object_namespace="default",resource_name="redisfailover",status="SUCCESS"} 33
redis_operator_controller_ensure_resource{object_kind="ConfigMap",object_name="rfr-s-redisfailover",object_namespace="default",resource_name="redisfailover",status="SUCCESS"} 33
redis_operator_controller_ensure_resource{object_kind="ConfigMap",object_name="rfs-redisfailover",object_namespace="default",resource_name="redisfailover",status="SUCCESS"} 33
redis_operator_controller_ensure_resource{object_kind="Deployment",object_name="rfs-redisfailover",object_namespace="default",resource_name="redisfailover",status="SUCCESS"} 33
redis_operator_controller_ensure_resource{object_kind="PodDisruptionBudget",object_name="rfr-redisfailover",object_namespace="default",resource_name="redisfailover",status="SUCCESS"} 33
redis_operator_controller_ensure_resource{object_kind="PodDisruptionBudget",object_name="rfs-redisfailover",object_namespace="default",resource_name="redisfailover",status="SUCCESS"} 33
redis_operator_controller_ensure_resource{object_kind="Service",object_name="rfs-redisfailover",object_namespace="default",resource_name="redisfailover",status="SUCCESS"} 33
redis_operator_controller_ensure_resource{object_kind="StatefulSet",object_name="rfr-redisfailover",object_namespace="default",resource_name="redisfailover",status="SUCCESS"} 33
# HELP redis_operator_controller_k8s_operations number of operations performed on k8s
# TYPE redis_operator_controller_k8s_operations counter
redis_operator_controller_k8s_operations{err="NA",kind="ConfigMap",namespace="default",object="rfr-readiness-redisfailover",operation="CREATE",status="SUCCESS"} 1
redis_operator_controller_k8s_operations{err="NA",kind="ConfigMap",namespace="default",object="rfr-readiness-redisfailover",operation="GET",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="NA",kind="ConfigMap",namespace="default",object="rfr-readiness-redisfailover",operation="UPDATE",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="NA",kind="ConfigMap",namespace="default",object="rfr-redisfailover",operation="CREATE",status="SUCCESS"} 1
redis_operator_controller_k8s_operations{err="NA",kind="ConfigMap",namespace="default",object="rfr-redisfailover",operation="GET",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="NA",kind="ConfigMap",namespace="default",object="rfr-redisfailover",operation="UPDATE",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="NA",kind="ConfigMap",namespace="default",object="rfr-s-redisfailover",operation="CREATE",status="SUCCESS"} 1
redis_operator_controller_k8s_operations{err="NA",kind="ConfigMap",namespace="default",object="rfr-s-redisfailover",operation="GET",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="NA",kind="ConfigMap",namespace="default",object="rfr-s-redisfailover",operation="UPDATE",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="NA",kind="ConfigMap",namespace="default",object="rfs-redisfailover",operation="CREATE",status="SUCCESS"} 1
redis_operator_controller_k8s_operations{err="NA",kind="ConfigMap",namespace="default",object="rfs-redisfailover",operation="GET",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="NA",kind="ConfigMap",namespace="default",object="rfs-redisfailover",operation="UPDATE",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="NA",kind="Deployment",namespace="default",object="rfs-redisfailover",operation="CREATE",status="SUCCESS"} 1
redis_operator_controller_k8s_operations{err="NA",kind="Deployment",namespace="default",object="rfs-redisfailover",operation="GET",status="SUCCESS"} 93
redis_operator_controller_k8s_operations{err="NA",kind="Deployment",namespace="default",object="rfs-redisfailover",operation="UPDATE",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="NA",kind="Pod",namespace="default",object="rfr-redisfailover-0",operation="GET",status="SUCCESS"} 28
redis_operator_controller_k8s_operations{err="NA",kind="Pod",namespace="default",object="rfr-redisfailover-0",operation="PATCH",status="SUCCESS"} 1
redis_operator_controller_k8s_operations{err="NA",kind="Pod",namespace="default",object="rfr-redisfailover-1",operation="GET",status="SUCCESS"} 28
redis_operator_controller_k8s_operations{err="NA",kind="Pod",namespace="default",object="rfr-redisfailover-2",operation="GET",status="SUCCESS"} 28
redis_operator_controller_k8s_operations{err="NA",kind="PodDisruptionBudget",namespace="default",object="rfr-redisfailover",operation="CREATE",status="SUCCESS"} 1
redis_operator_controller_k8s_operations{err="NA",kind="PodDisruptionBudget",namespace="default",object="rfr-redisfailover",operation="GET",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="NA",kind="PodDisruptionBudget",namespace="default",object="rfr-redisfailover",operation="UPDATE",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="NA",kind="PodDisruptionBudget",namespace="default",object="rfs-redisfailover",operation="CREATE",status="SUCCESS"} 1
redis_operator_controller_k8s_operations{err="NA",kind="PodDisruptionBudget",namespace="default",object="rfs-redisfailover",operation="GET",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="NA",kind="PodDisruptionBudget",namespace="default",object="rfs-redisfailover",operation="UPDATE",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="NA",kind="RedisFailover",namespace="",object="NA",operation="LIST",status="SUCCESS"} 1
redis_operator_controller_k8s_operations{err="NA",kind="RedisFailover",namespace="",object="NA",operation="WATCH",status="SUCCESS"} 2
redis_operator_controller_k8s_operations{err="NA",kind="Service",namespace="default",object="rfs-redisfailover",operation="CREATE",status="SUCCESS"} 1
redis_operator_controller_k8s_operations{err="NA",kind="Service",namespace="default",object="rfs-redisfailover",operation="GET",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="NA",kind="Service",namespace="default",object="rfs-redisfailover",operation="UPDATE",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="NA",kind="StatefulSet",namespace="default",object="rfr-redisfailover",operation="CREATE",status="SUCCESS"} 1
redis_operator_controller_k8s_operations{err="NA",kind="StatefulSet",namespace="default",object="rfr-redisfailover",operation="GET",status="SUCCESS"} 335
redis_operator_controller_k8s_operations{err="NA",kind="StatefulSet",namespace="default",object="rfr-redisfailover",operation="UPDATE",status="SUCCESS"} 32
redis_operator_controller_k8s_operations{err="RESOURCE_NOT_FOUND",kind="ConfigMap",namespace="default",object="rfr-readiness-redisfailover",operation="GET",status="FAIL"} 1
redis_operator_controller_k8s_operations{err="RESOURCE_NOT_FOUND",kind="ConfigMap",namespace="default",object="rfr-redisfailover",operation="GET",status="FAIL"} 1
redis_operator_controller_k8s_operations{err="RESOURCE_NOT_FOUND",kind="ConfigMap",namespace="default",object="rfr-s-redisfailover",operation="GET",status="FAIL"} 1
redis_operator_controller_k8s_operations{err="RESOURCE_NOT_FOUND",kind="ConfigMap",namespace="default",object="rfs-redisfailover",operation="GET",status="FAIL"} 1
redis_operator_controller_k8s_operations{err="RESOURCE_NOT_FOUND",kind="Deployment",namespace="default",object="rfs-redisfailover",operation="GET",status="FAIL"} 1
redis_operator_controller_k8s_operations{err="RESOURCE_NOT_FOUND",kind="PodDisruptionBudget",namespace="default",object="rfr-redisfailover",operation="GET",status="FAIL"} 1
redis_operator_controller_k8s_operations{err="RESOURCE_NOT_FOUND",kind="PodDisruptionBudget",namespace="default",object="rfs-redisfailover",operation="GET",status="FAIL"} 1
redis_operator_controller_k8s_operations{err="RESOURCE_NOT_FOUND",kind="RedisFailover",namespace="",object="NA",operation="LIST",status="FAIL"} 21
redis_operator_controller_k8s_operations{err="RESOURCE_NOT_FOUND",kind="Service",namespace="default",object="rfr-redisfailover",operation="GET",status="FAIL"} 33
redis_operator_controller_k8s_operations{err="RESOURCE_NOT_FOUND",kind="Service",namespace="default",object="rfs-redisfailover",operation="GET",status="FAIL"} 1
redis_operator_controller_k8s_operations{err="RESOURCE_NOT_FOUND",kind="StatefulSet",namespace="default",object="rfr-redisfailover",operation="GET",status="FAIL"} 1
# HELP redis_operator_controller_redis_check indicates any error encountered in managed redis instance(s)
# TYPE redis_operator_controller_redis_check counter
redis_operator_controller_redis_check{indicator="APPLY_REDIS_CONFIG",instance="NA",namespace="default",resource="redisfailover",status="HEALTHY"} 28
redis_operator_controller_redis_check{indicator="MASTER_COUNT_IS_NOT_ONE",instance="NA",namespace="default",resource="redisfailover",status="HEALTHY"} 27
redis_operator_controller_redis_check{indicator="MASTER_COUNT_IS_NOT_ONE",instance="NA",namespace="default",resource="redisfailover",status="UNHEALTHY"} 6
redis_operator_controller_redis_check{indicator="REDIS_STATEFULSET_REPLICAS_MISMATCH",instance="NA",namespace="default",resource="redisfailover",status="HEALTHY"} 66
redis_operator_controller_redis_check{indicator="SLAVE_IS_CONFIGURED_WITH_WRONG_MASTER_IP",instance="NA",namespace="default",resource="redisfailover",status="HEALTHY"} 28
# HELP redis_operator_controller_redis_operations number of operations performed on redis
# TYPE redis_operator_controller_redis_operations counter
redis_operator_controller_redis_operations{IP="10.244.0.13",err="NA",kind="REDIS",operation="APPLY_REDIS_CONFIG",status="SUCCESS"} 28
redis_operator_controller_redis_operations{IP="10.244.0.13",err="NA",kind="REDIS",operation="CHECK_IF_INSTANCE_IS_MASTER",status="SUCCESS"} 116
redis_operator_controller_redis_operations{IP="10.244.0.13",err="NA",kind="REDIS",operation="CHECK_IF_SLAVE_IS_READY",status="SUCCESS"} 28
redis_operator_controller_redis_operations{IP="10.244.0.13",err="NA",kind="REDIS",operation="GET_MASTER_OF_GIVEN_SLAVE_INSTANCE",status="SUCCESS"} 28
redis_operator_controller_redis_operations{IP="10.244.0.13",err="NA",kind="REDIS",operation="MAKE_SLAVE_OF_GIVEN_MASTER_INSTANCE",status="SUCCESS"} 1
redis_operator_controller_redis_operations{IP="10.244.0.14",err="NA",kind="REDIS",operation="APPLY_REDIS_CONFIG",status="SUCCESS"} 28
redis_operator_controller_redis_operations{IP="10.244.0.14",err="NA",kind="REDIS",operation="CHECK_IF_INSTANCE_IS_MASTER",status="SUCCESS"} 116
redis_operator_controller_redis_operations{IP="10.244.0.14",err="NA",kind="REDIS",operation="CHECK_IF_SLAVE_IS_READY",status="SUCCESS"} 28
redis_operator_controller_redis_operations{IP="10.244.0.14",err="NA",kind="REDIS",operation="GET_MASTER_OF_GIVEN_SLAVE_INSTANCE",status="SUCCESS"} 28
redis_operator_controller_redis_operations{IP="10.244.0.14",err="NA",kind="REDIS",operation="MAKE_SLAVE_OF_GIVEN_MASTER_INSTANCE",status="SUCCESS"} 1
redis_operator_controller_redis_operations{IP="10.244.0.15",err="NA",kind="REDIS",operation="APPLY_REDIS_CONFIG",status="SUCCESS"} 28
redis_operator_controller_redis_operations{IP="10.244.0.15",err="NA",kind="REDIS",operation="CHECK_IF_INSTANCE_IS_MASTER",status="SUCCESS"} 144
redis_operator_controller_redis_operations{IP="10.244.0.15",err="NA",kind="REDIS",operation="MAKE_INSTANCE_AS_MASTER",status="FAIL"} 1
redis_operator_controller_redis_operations{IP="10.244.0.15",err="SENTINEL_REGEX_NOT_FOUND",kind="REDIS",operation="GET_MASTER_OF_GIVEN_SLAVE_INSTANCE",status="FAIL"} 28
redis_operator_controller_redis_operations{IP="10.244.0.16",err="NA",kind="REDIS",operation="SET_SENTINEL_TO_MONITOR_REDIS_WITH_GIVEN_PORT",status="SUCCESS"} 1
redis_operator_controller_redis_operations{IP="10.244.0.16",err="NA",kind="SENTINEL",operation="APPLY_SENTINEL_CONFIG",status="SUCCESS"} 56
redis_operator_controller_redis_operations{IP="10.244.0.16",err="NA",kind="SENTINEL",operation="GET_NUMBER_OF_REDIS_SLAVES_IN_MEMORY",status="SUCCESS"} 28
redis_operator_controller_redis_operations{IP="10.244.0.16",err="NA",kind="SENTINEL",operation="GET_NUMBER_OF_SENTINELS_IN_MEMORY",status="SUCCESS"} 28
redis_operator_controller_redis_operations{IP="10.244.0.16",err="NA",kind="SENTINEL",operation="RESET_ALL_SENTINEL_CONFIG",status="SUCCESS"} 1
redis_operator_controller_redis_operations{IP="10.244.0.16",err="NA",kind="SENTINEL",operation="SENTINEL_GET_MASTER_INSTANCE",status="SUCCESS"} 28
redis_operator_controller_redis_operations{IP="10.244.0.17",err="NA",kind="REDIS",operation="SET_SENTINEL_TO_MONITOR_REDIS_WITH_GIVEN_PORT",status="SUCCESS"} 1
redis_operator_controller_redis_operations{IP="10.244.0.17",err="NA",kind="SENTINEL",operation="APPLY_SENTINEL_CONFIG",status="SUCCESS"} 56
redis_operator_controller_redis_operations{IP="10.244.0.17",err="NA",kind="SENTINEL",operation="GET_NUMBER_OF_REDIS_SLAVES_IN_MEMORY",status="SUCCESS"} 28
redis_operator_controller_redis_operations{IP="10.244.0.17",err="NA",kind="SENTINEL",operation="GET_NUMBER_OF_SENTINELS_IN_MEMORY",status="SUCCESS"} 28
redis_operator_controller_redis_operations{IP="10.244.0.17",err="NA",kind="SENTINEL",operation="RESET_ALL_SENTINEL_CONFIG",status="SUCCESS"} 2
redis_operator_controller_redis_operations{IP="10.244.0.17",err="NA",kind="SENTINEL",operation="SENTINEL_GET_MASTER_INSTANCE",status="SUCCESS"} 28
redis_operator_controller_redis_operations{IP="10.244.0.18",err="NA",kind="REDIS",operation="SET_SENTINEL_TO_MONITOR_REDIS_WITH_GIVEN_PORT",status="SUCCESS"} 1
redis_operator_controller_redis_operations{IP="10.244.0.18",err="NA",kind="SENTINEL",operation="APPLY_SENTINEL_CONFIG",status="SUCCESS"} 56
redis_operator_controller_redis_operations{IP="10.244.0.18",err="NA",kind="SENTINEL",operation="GET_NUMBER_OF_REDIS_SLAVES_IN_MEMORY",status="SUCCESS"} 28
redis_operator_controller_redis_operations{IP="10.244.0.18",err="NA",kind="SENTINEL",operation="GET_NUMBER_OF_SENTINELS_IN_MEMORY",status="SUCCESS"} 28
redis_operator_controller_redis_operations{IP="10.244.0.18",err="NA",kind="SENTINEL",operation="RESET_ALL_SENTINEL_CONFIG",status="SUCCESS"} 1
redis_operator_controller_redis_operations{IP="10.244.0.18",err="NA",kind="SENTINEL",operation="SENTINEL_GET_MASTER_INSTANCE",status="SUCCESS"} 28
# HELP redis_operator_controller_sentinel_check indicates any error encountered in managed sentinel instance(s)
# TYPE redis_operator_controller_sentinel_check counter
redis_operator_controller_sentinel_check{indicator="10.244.0.16",instance="NA",namespace="default",resource="redisfailover",status="HEALTHY"} 27
redis_operator_controller_sentinel_check{indicator="10.244.0.16",instance="NA",namespace="default",resource="redisfailover",status="UNHEALTHY"} 1
redis_operator_controller_sentinel_check{indicator="10.244.0.17",instance="NA",namespace="default",resource="redisfailover",status="HEALTHY"} 27
redis_operator_controller_sentinel_check{indicator="10.244.0.17",instance="NA",namespace="default",resource="redisfailover",status="UNHEALTHY"} 1
redis_operator_controller_sentinel_check{indicator="10.244.0.18",instance="NA",namespace="default",resource="redisfailover",status="HEALTHY"} 27
redis_operator_controller_sentinel_check{indicator="10.244.0.18",instance="NA",namespace="default",resource="redisfailover",status="UNHEALTHY"} 1
redis_operator_controller_sentinel_check{indicator="APPLY_SENTINEL_CONFIG",instance="10.244.0.16",namespace="default",resource="redisfailover",status="HEALTHY"} 28
redis_operator_controller_sentinel_check{indicator="APPLY_SENTINEL_CONFIG",instance="10.244.0.17",namespace="default",resource="redisfailover",status="HEALTHY"} 28
redis_operator_controller_sentinel_check{indicator="APPLY_SENTINEL_CONFIG",instance="10.244.0.18",namespace="default",resource="redisfailover",status="HEALTHY"} 28
redis_operator_controller_sentinel_check{indicator="REDIS_SLAVES_NUMBER_IN_MEMORY_MISMATCH",instance="10.244.0.16",namespace="default",resource="redisfailover",status="HEALTHY"} 28
redis_operator_controller_sentinel_check{indicator="REDIS_SLAVES_NUMBER_IN_MEMORY_MISMATCH",instance="10.244.0.17",namespace="default",resource="redisfailover",status="HEALTHY"} 27
redis_operator_controller_sentinel_check{indicator="REDIS_SLAVES_NUMBER_IN_MEMORY_MISMATCH",instance="10.244.0.17",namespace="default",resource="redisfailover",status="UNHEALTHY"} 1
redis_operator_controller_sentinel_check{indicator="REDIS_SLAVES_NUMBER_IN_MEMORY_MISMATCH",instance="10.244.0.18",namespace="default",resource="redisfailover",status="HEALTHY"} 28
redis_operator_controller_sentinel_check{indicator="SENTINEL_DEPLOYMENT_REPLICAS_MISMATCH",instance="NA",namespace="default",resource="redisfailover",status="HEALTHY"} 33
redis_operator_controller_sentinel_check{indicator="SENTINEL_NUMBER_IN_MEMORY_MISMATCH",instance="10.244.0.16",namespace="default",resource="redisfailover",status="HEALTHY"} 27
redis_operator_controller_sentinel_check{indicator="SENTINEL_NUMBER_IN_MEMORY_MISMATCH",instance="10.244.0.16",namespace="default",resource="redisfailover",status="UNHEALTHY"} 1
redis_operator_controller_sentinel_check{indicator="SENTINEL_NUMBER_IN_MEMORY_MISMATCH",instance="10.244.0.17",namespace="default",resource="redisfailover",status="HEALTHY"} 27
redis_operator_controller_sentinel_check{indicator="SENTINEL_NUMBER_IN_MEMORY_MISMATCH",instance="10.244.0.17",namespace="default",resource="redisfailover",status="UNHEALTHY"} 1
redis_operator_controller_sentinel_check{indicator="SENTINEL_NUMBER_IN_MEMORY_MISMATCH",instance="10.244.0.18",namespace="default",resource="redisfailover",status="HEALTHY"} 27
redis_operator_controller_sentinel_check{indicator="SENTINEL_NUMBER_IN_MEMORY_MISMATCH",instance="10.244.0.18",namespace="default",resource="redisfailover",status="UNHEALTHY"} 1

@raghu-nandan-bs raghu-nandan-bs requested a review from a team as a code owner September 26, 2022 06:03
@raghu-nandan-bs
Copy link
Contributor Author

raghu-nandan-bs commented Sep 30, 2022

Redis failover health dashboard

Gives details around various aspect for a given redisfailover CR aka "tenant", from the operator's perspective.

json of the dashboard: link

screencapture-localhost-8080-d-redis-failover-redis-operator-details-2022-10-14-17_28_42

@samof76
Copy link
Contributor

samof76 commented Oct 1, 2022

@raghu-nandan-bs looks good. Is this still WIP?
cc: @ese

@raghu-nandan-bs raghu-nandan-bs changed the title [WIP] Add Metrics Add Metrics Oct 1, 2022
@raghu-nandan-bs
Copy link
Contributor Author

Hi @samof76 , Done with the changes, updated the title.

@samof76 @ese please review.

@ese
Copy link
Member

ese commented Oct 10, 2022

Great! Thanks. I'm ok with it, just check the CI jobs since seems to be some deadcode.

@samof76
Copy link
Contributor

samof76 commented Oct 17, 2022

@ese please review this. This looks good, and will add a lot of value to the prod deployments.

@samof76
Copy link
Contributor

samof76 commented Oct 21, 2022

@ese all the checks seemed to have passed, can this be merged?

@ese
Copy link
Member

ese commented Oct 21, 2022

Thank you so much!

@ese ese merged commit c224cad into spotahome:master Oct 21, 2022
@raghu-nandan-bs
Copy link
Contributor Author

@ese thanks for merging this... Shortly, I will share a small change to GC stale metrics and updated dashboard.

@jonathon2nd
Copy link

Hello!

This looks awesome! I just updated the operator and crd and checked to make sure metrics were flowing.
image

However, I am having issue with the grafana dashboard
image

I was wondering what I needed to adjust on my config to fix it?

@raghu-nandan-bs
Copy link
Contributor Author

raghu-nandan-bs commented Oct 22, 2022

Hi @jonathon2nd really sorry for that, I will share an updated dashboard today.

Also, which version of grafana are you on? the error seems to be something different

@jonathon2nd
Copy link

I am using what is built into Rancher monitoring. Grafana v7.5.11 is what it is using.

@raghu-nandan-bs
Copy link
Contributor Author

@jonathon2nd
Copy link

No change unfortunately.
image

@raghu-nandan-bs
Copy link
Contributor Author

@jonathon2nd I created this dashboard on v9.1.5 unfortunately. :(

@jonathon2nd
Copy link

I am looking to see about upgrading the grafana image that is part of the Rancher Monitoring stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants