-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CoreDNS metrics for cache plugin and runtime #11738
Add CoreDNS metrics for cache plugin and runtime #11738
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -2590,7 +2590,7 @@ Contains statistics related to the coreDNS service | |||
|
|||
|
|||
|
|||
*`coredns.stats.panic.count.total`*:: | |||
*`coredns.stats.panic.count`*:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you didn't mention this change, I understand it's intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so true,
I'm following this convention here:
https://www.elastic.co/guide/en/beats/devguide/current/event-conventions.html#abbreviations
If you think this is ok, I'll go on and add it to the PR description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First worried about the change as it's breaking but realised the CoreDNS module is not release yet. So all good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add tests for the new fields?
@@ -2590,7 +2590,7 @@ Contains statistics related to the coreDNS service | |||
|
|||
|
|||
|
|||
*`coredns.stats.panic.count.total`*:: | |||
*`coredns.stats.panic.count`*:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First worried about the change as it's breaking but realised the CoreDNS module is not release yet. So all good.
@odacremolbap To get the tests green you need to run Good to see our test system works :-) |
jenkins, test this |
@@ -1,5 +1,5 @@ | |||
# Start from coredns base Docker image | |||
FROM coredns/coredns:latest | |||
FROM coredns/coredns:1.5.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
"coredns_cache_hits_total": prometheus.Metric("dns.cache.hits.count"), | ||
"coredns_cache_misses_total": prometheus.Metric("dns.cache.misses.count"), | ||
|
||
// go runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we expect these also in other places, I wonder if we should introduce runtime
on the root level instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if I'm on the same page.
Right now event path is coredns.stats
which makes the event look like
{
"coredns": {
"stats": {
"panic": {
"count": 0
},
"runtime": {
"go": {
"routines": {
"count": 21
}
},
"memory": {
"alloc": {
"bytes": 1843744
}
},
"process": {
"threads": {
"count": 18
}
}
}
}
},
"event": {
"dataset": "coredns.stats",
"duration": 115000,
"module": "coredns"
},
"metricset": {
"name": "stats"
},
"service": {
"address": "127.0.0.1:55555",
"type": "coredns"
}
},
do you registering creating a new metricset by the name coredns.runtime
? and parsing only the runtime data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree runtime
should be in the root level of the event, that means:
{
"runtime": {
"go": {
"routines": {
"count": 21
},
},
},
"coredns": {
"stats": {
"panic": {
"count": 0
},
}
},
...
},
That said, I think the Prometheus helper framework doesn't support this? We have done tricks like this in the past:
beats/metricbeat/module/kubernetes/state_pod/state_pod.go
Lines 109 to 117 in 2499449
var moduleFieldsMapStr common.MapStr | |
moduleFields, ok := event[mb.ModuleDataKey] | |
if ok { | |
moduleFieldsMapStr, ok = moduleFields.(common.MapStr) | |
if !ok { | |
m.Logger().Errorf("error trying to convert '%s' from event to common.MapStr", mb.ModuleDataKey) | |
} | |
} | |
delete(event, mb.ModuleDataKey) |
Also, getting runtime
metrics from the stats
metricset may sound wrong? It could make sense to add a runtime
metricset, common to all modules supporting this, to make things uniform.
Perhaps we should move runtime
metrics to a different issue to unblock this, then add support for them the right way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
about having a different metricset
, I take that back, as that would mean double requesting the same info from the exporter. I'm ok with this 👍
About moving things to the root level, do you foresee this becoming an ECS field @ruflin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes for ECS. I remember we had somewhere else a discussion about this with @webmat .
For having things on the root level: Can we extend the prometheus helper to make this possible? I assume now that all is based on the reporter interface this should be easier.
My take to unblock this PR would be to add for now runtime
inside the metricset with all the prefixes and abstract it out later. Not optimal but keeps us moving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@exekias Interesting question if 1 request should be able to create 2 events or if metricsets are ok. To keep complexity low I would argue having 2 requests for the same thing is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abstracting it out later would imply deprecating these metrics in favor of the new names, is that correct? If we are discussing adding these for all services I think they deserve a better solution, so we can share mapping. How about leaving them out of this PR and open an issue to tackle runtime metrics with prometheus helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruflin @exekias
I removed runtime metrics before reading these new comments.
If you are ok I'll move forward, runtime
should come from here #11836
I don't think it will take us very long to come to an agreement, and then I will add back the metrics. In the meantime I'll open an issue for adding the root namespaced items at the prometheus helper.
Is that ok with you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of a runtime
fieldset, please ping me on the PR whenever you work on it 👀 @odacremolbap
mbtest "github.com/elastic/beats/metricbeat/mb/testing" | ||
) | ||
|
||
func TestData(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? CoreDNS uses the new testing framework to generate the data.json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should have an integration test instead?
Cache misses count for the cache plugin | ||
|
||
|
||
- name: runtime.memory.alloc.bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! thanks for pointing that out
jenkins, test this |
jenkins, test this looks like quay is funny |
@odacremolbap One thing I just realised: We have dashboards for CoreDNS. Can you double check if the field changes did not break the dashboards? |
I think you mean this yes, I was waiting for this merge to dive in there. |
A fairly common
CoreDNS
setting is activating thecache
plugin.That plugin has a couple useful metrics that we should include.
There are also some runtime
go
metrics that we should be adding.