Kiam can exports both Prometheus and StatsD metrics to determine the health of the
system, check the timing of each RPC call, and monitor the size of the
credentials cache. By default, Prometheus metrics are exported on localhost:9620
.
StatsD metrics is disabled by default, read below on how to enable them.
A example Grafana dashboard with Prometheus as datasource is provided here, it displays the basic metrics and includes daemonset status from kube-state-metrics & container metrics from cAdvisor if available.
- The
statsd
flag controls the address to which to send StatsD metrics. This is by default""
. To enable statsD provide a server adress, for example127.0.0.1:8125
- The
statsd-prefix
flag controls the initial prefix that will be appended to Kiam's StatsD metrics. This is by defaultkiam
. - The
statsd-interval
flag controls how frequently the in-memory metrics buffer will be flushed to the specified StatsD endpoint. Metrics are not aggregated in this buffer and the raw counts will be flushed to the underlying StatsD sink. This is by default100ms
. - The
prometheus-listen-addr
controls which address Kiam should create a Prometheus endpoint on. This is by defaultlocalhost:9620
. The metrics themselves can be accessed at<prometheus-listen-addr>/metrics
. - The
prometheus-sync-interval
flag controls how frequently Prometheus metrics should be updated. This is by default5s
.
kiam_metadata_handler_latency_seconds
- Bucketed histogram of handler timings. Tagged by handlerkiam_metadata_credential_fetch_errors_total
- Number of errors fetching the credentials for a podkiam_metadata_credential_encode_errors_total
- Number of errors encoding credentials for a podkiam_metadata_find_role_errors_total
- Number of errors finding the role for a podkiam_metadata_empty_role_total
- Number of empty roles returnedkiam_metadata_success_total
- Number of successful responses from a handlerkiam_metadata_responses_total
- Responses from mocked out metadata handlerskiam_metadata_proxy_requests_blocked_total
- Number of access requests to the proxy handler that were blocked by the regexp
kiam_sts_cache_hit_total
- Number of cache hits to the metadata cachekiam_sts_cache_miss_total
- Number of cache misses to the metadata cachekiam_sts_issuing_errors_total
- Number of errors issuing credentialskiam_sts_assumerole_timing_seconds
- Bucketed histogram of assumeRole timingskiam_sts_assumerole_current
- Number of assume role calls currently executing
kiam_k8s_dropped_pods_total
- Number of dropped pods because of full buffer
grpc_server_handled_total
- Total number of RPCs completed on the server, regardless of success or failure.grpc_server_msg_received_total
- Total number of RPC stream messages received on the server.grpc_server_msg_sent_total
- Total number of gRPC stream messages sent by the server.grpc_server_started_total
- Total number of RPCs started on the server.
grpc_client_handled_total
- Total number of RPCs completed by the client, regardless of success or failure.grpc_client_msg_received_total
- Total number of RPC stream messages received by the client.grpc_client_msg_sent_total
- Total number of gRPC stream messages sent by the client.grpc_client_started_total
- Total number of RPCs started on the client.
gateway.rpc.GetRole
- Observed client side latency of GetRole RPCgateway.rpc.GetCredentials
- Observed client side latency of GetCredentials RPCserver.rpc.GetRoleCredentials
- Observed server side latency of GetRoleCredentials RPCserver.rpc.IsAllowedAssumeRole
- Observed server side latency of IsAllowedAssumeRole RPCserver.rpc.GetHealth
- Observed server side latency of GetHealth RPCserver.rpc.GetPodRole
- Observed server side latency of GetPodRole RPCserver.rpc.GetRoleCredentials
- Observed server side latency of GetRoleCredentials RPChandler.role_name
- Observed latency of role_name handlerhandler.health
- Observed latency of health handlerhandler.credentials
- Observed latency of credentials handleraws.assume_role
- Observed latency of aws assume role request