Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(kuma-cp) metrics #993

Merged
merged 12 commits into from
Sep 9, 2020
Merged

feat(kuma-cp) metrics #993

merged 12 commits into from
Sep 9, 2020

Conversation

jakubdyszkiewicz
Copy link
Contributor

Summary

This PR introduces Control Plane metrics:

  • Latencies and response codes etc. for API Server/Admin Server/Bootstrap Server/DNS Server/SDS/XDS/KDS
  • XDS: summary of XDS generation (time, count)
  • XDS: active connections
  • SDS: summary of SDS generation (time, count)
  • SDS: cert generations
  • KDS: summary of KDS generation (time, count)
  • KDS: client-side stats
  • Store: latencies of underlying storage
  • Store cache: number of hits and misses for cache
  • Static Info about the CP
  • Leader election
  • Go (GC, threads etc) and process info

It does not include dashboards.

Implementation

My first approach was to use promauto with global default Prometheus registry and MustRegister that panic, but it was a disaster in tests, therefore I implicitly pass the registry in the Metric object.

I try to register metrics as "high" in Setup/component.go etc. as possible. I was trying to avoid spreading Prometheus code across the codebase.

For latencies, I try to use a Summary, not Histogram. You can read about the differences here https://prometheus.io/docs/practices/histograms/ as long as there is no aggregation, Histograms are just easier to use IMHO.

Documentation

  • todo After I introduce dashboards

Signed-off-by: Jakub Dyszkiewicz <[email protected]>
Signed-off-by: Jakub Dyszkiewicz <[email protected]>
Signed-off-by: Jakub Dyszkiewicz <[email protected]>
Signed-off-by: Jakub Dyszkiewicz <[email protected]>
Signed-off-by: Jakub Dyszkiewicz <[email protected]>
@jakubdyszkiewicz jakubdyszkiewicz requested a review from a team as a code owner August 24, 2020 14:43
Signed-off-by: Jakub Dyszkiewicz <[email protected]>
}

func (h *SimpleDNSServer) parseQuery(m *dns.Msg) {
for _, q := range m.Question {
switch q.Qtype {
case dns.TypeA:
serverLog.Info("Query for " + q.Name)
serverLog.V(1).Info("query for " + q.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V(1)? we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's overkill to log every DNS request on the info level. It's like with API Server, we don't request every single request. With many services that uses DNS, logs will be spammed with "Query for..."

Signed-off-by: Jakub Dyszkiewicz <[email protected]>
Signed-off-by: Jakub Dyszkiewicz <[email protected]>
@jakubdyszkiewicz jakubdyszkiewicz merged commit 4eb1773 into master Sep 9, 2020
@jakubdyszkiewicz jakubdyszkiewicz deleted the feat/cp-metrics branch September 9, 2020 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants