Add CoreDNS metrics for cache plugin and runtime #11738

odacremolbap · 2019-04-10T06:46:37Z

A fairly common CoreDNS setting is activating the cache plugin.

That plugin has a couple useful metrics that we should include.
There are also some runtime go metrics that we should be adding.

exekias

LGTM

exekias · 2019-04-10T08:38:33Z

metricbeat/docs/fields.asciidoc

@@ -2590,7 +2590,7 @@ Contains statistics related to the coreDNS service



-*`coredns.stats.panic.count.total`*::
+*`coredns.stats.panic.count`*::


you didn't mention this change, I understand it's intended?

so true,

I'm following this convention here:
https://www.elastic.co/guide/en/beats/devguide/current/event-conventions.html#abbreviations

If you think this is ok, I'll go on and add it to the PR description

First worried about the change as it's breaking but realised the CoreDNS module is not release yet. So all good.

exekias

Could you add tests for the new fields?

ruflin · 2019-04-11T08:00:37Z

metricbeat/docs/fields.asciidoc

@@ -2590,7 +2590,7 @@ Contains statistics related to the coreDNS service



-*`coredns.stats.panic.count.total`*::
+*`coredns.stats.panic.count`*::


First worried about the change as it's breaking but realised the CoreDNS module is not release yet. So all good.

ruflin · 2019-04-11T08:03:22Z

@odacremolbap To get the tests green you need to run go test . -generate in https://github.com/elastic/beats/tree/master/metricbeat/mb/testing/data It will update the golden files with the new metrics.

Good to see our test system works :-)

odacremolbap · 2019-04-11T09:42:01Z

so much thanks for your early review @ruflin

I need to sort this out #11759 (prometheus tests) before continuing here
and then move on to #11619 (kibana dashboards for CoreDNS)

odacremolbap · 2019-04-12T21:57:46Z

jenkins, test this

odacremolbap · 2019-04-13T06:37:13Z

@ruflin @exekias
I re-requested review since I added a couple commits, mostly a test.

ruflin · 2019-04-15T08:43:32Z

metricbeat/module/coredns/_meta/Dockerfile

@@ -1,5 +1,5 @@
 # Start from coredns base Docker image
-FROM coredns/coredns:latest
+FROM coredns/coredns:1.5.0


ruflin · 2019-04-15T08:45:39Z

metricbeat/module/coredns/stats/stats.go

+		"coredns_cache_hits_total":   prometheus.Metric("dns.cache.hits.count"),
+		"coredns_cache_misses_total": prometheus.Metric("dns.cache.misses.count"),
+
+		// go runtime


As we expect these also in other places, I wonder if we should introduce runtime on the root level instead.

not sure if I'm on the same page.

Right now event path is coredns.stats which makes the event look like

{ "coredns": { "stats": { "panic": { "count": 0 }, "runtime": { "go": { "routines": { "count": 21 } }, "memory": { "alloc": { "bytes": 1843744 } }, "process": { "threads": { "count": 18 } } } } }, "event": { "dataset": "coredns.stats", "duration": 115000, "module": "coredns" }, "metricset": { "name": "stats" }, "service": { "address": "127.0.0.1:55555", "type": "coredns" } },

do you registering creating a new metricset by the name coredns.runtime ? and parsing only the runtime data?

I agree runtime should be in the root level of the event, that means:

{ "runtime": { "go": { "routines": { "count": 21 }, }, }, "coredns": { "stats": { "panic": { "count": 0 }, } }, ... },

That said, I think the Prometheus helper framework doesn't support this? We have done tricks like this in the past:

beats/metricbeat/module/kubernetes/state_pod/state_pod.go

Lines 109 to 117 in 2499449

var moduleFieldsMapStr common.MapStr

moduleFields, ok := event[mb.ModuleDataKey]

if ok {

moduleFieldsMapStr, ok = moduleFields.(common.MapStr)

if !ok {

m.Logger().Errorf("error trying to convert '%s' from event to common.MapStr", mb.ModuleDataKey)

}

}

delete(event, mb.ModuleDataKey)

but I would prefer to keep them to a minimum, we should implement a better way to get this.

Also, getting runtime metrics from the stats metricset may sound wrong? It could make sense to add a runtime metricset, common to all modules supporting this, to make things uniform.

Perhaps we should move runtime metrics to a different issue to unblock this, then add support for them the right way.

about having a different metricset, I take that back, as that would mean double requesting the same info from the exporter. I'm ok with this 👍

About moving things to the root level, do you foresee this becoming an ECS field @ruflin?

Yes for ECS. I remember we had somewhere else a discussion about this with @webmat .

For having things on the root level: Can we extend the prometheus helper to make this possible? I assume now that all is based on the reporter interface this should be easier.

My take to unblock this PR would be to add for now runtime inside the metricset with all the prefixes and abstract it out later. Not optimal but keeps us moving.

@exekias Interesting question if 1 request should be able to create 2 events or if metricsets are ok. To keep complexity low I would argue having 2 requests for the same thing is ok.

abstracting it out later would imply deprecating these metrics in favor of the new names, is that correct? If we are discussing adding these for all services I think they deserve a better solution, so we can share mapping. How about leaving them out of this PR and open an issue to tackle runtime metrics with prometheus helper?

@ruflin @exekias
I removed runtime metrics before reading these new comments.

If you are ok I'll move forward, runtime should come from here #11836

I don't think it will take us very long to come to an agreement, and then I will add back the metrics. In the meantime I'll open an issue for adding the root namespaced items at the prometheus helper.

Is that ok with you?

I like the idea of a runtime fieldset, please ping me on the PR whenever you work on it 👀 @odacremolbap

ruflin · 2019-04-15T08:46:25Z

metricbeat/module/coredns/stats/stats_integration_test.go

+	mbtest "github.com/elastic/beats/metricbeat/mb/testing"
+)
+
+func TestData(t *testing.T) {


Why do we need this? CoreDNS uses the new testing framework to generate the data.json.

Perhaps we should have an integration test instead?

exekias · 2019-04-16T13:49:18Z

metricbeat/module/coredns/stats/_meta/fields.yml

+        Cache misses count for the cache plugin
+
+
+    - name: runtime.memory.alloc.bytes


yes! thanks for pointing that out

odacremolbap · 2019-04-16T14:30:59Z

jenkins, test this

odacremolbap · 2019-04-16T17:32:01Z

jenkins, test this

looks like quay is funny

ruflin · 2019-04-17T09:35:20Z

@odacremolbap One thing I just realised: We have dashboards for CoreDNS. Can you double check if the field changes did not break the dashboards?

odacremolbap · 2019-04-17T10:01:27Z

I think you mean this
#11619

yes, I was waiting for this merge to dive in there.
thanks for reminding!

add CoreDNS metrics and adapt some existing names

3ab5f70

odacremolbap added in progress Pull request is currently in progress. Metricbeat Metricbeat Team:Integrations Label for the Integrations team labels Apr 10, 2019

odacremolbap requested a review from a team as a code owner April 10, 2019 06:46

odacremolbap added 2 commits April 10, 2019 09:20

add new fields to fields.yml

935ef96

add generated docs

62678b5

odacremolbap requested a review from a team as a code owner April 10, 2019 07:32

exekias approved these changes Apr 10, 2019

View reviewed changes

exekias reviewed Apr 10, 2019

View reviewed changes

exekias self-requested a review April 10, 2019 08:41

odacremolbap added 2 commits April 10, 2019 12:07

set CoreDNS docker tag to 1.5.0, simplify conf

36fd113

add cache module for CoreDNS

ddf21f2

ruflin approved these changes Apr 11, 2019

View reviewed changes

odacremolbap added 4 commits April 12, 2019 18:34

Merge branch 'master' into task/add-more-coredns-metrics

b8300c8

move go metrics to what seems to be a better namespace

8f1adc3

add all generated and golden files

358c1e3

add CoreDNS integration tests

8b4c9b8

odacremolbap added review and removed in progress Pull request is currently in progress. labels Apr 12, 2019

odacremolbap requested a review from ruflin April 12, 2019 20:36

ruflin reviewed Apr 15, 2019

View reviewed changes

odacremolbap added 2 commits April 15, 2019 19:40

remove data.json generation from coredns test

136cb2f

remove go runtime stats for now

db2e413

exekias reviewed Apr 16, 2019

View reviewed changes

remove runtime metrics from fields.yml at coredns

f9c8c16

exekias approved these changes Apr 16, 2019

View reviewed changes

odacremolbap merged commit 872d7fc into elastic:master Apr 16, 2019

odacremolbap deleted the task/add-more-coredns-metrics branch April 16, 2019 18:55

odacremolbap mentioned this pull request Apr 17, 2019

[Metricbeat] CoreDNS module: Add Kibana Dashboard #11619

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CoreDNS metrics for cache plugin and runtime #11738

Add CoreDNS metrics for cache plugin and runtime #11738

odacremolbap commented Apr 10, 2019

exekias left a comment

exekias Apr 10, 2019

odacremolbap Apr 10, 2019

exekias Apr 10, 2019

ruflin Apr 11, 2019

exekias left a comment

ruflin Apr 11, 2019

ruflin commented Apr 11, 2019

odacremolbap commented Apr 11, 2019

odacremolbap commented Apr 12, 2019

odacremolbap commented Apr 13, 2019

ruflin Apr 15, 2019

ruflin Apr 15, 2019

odacremolbap Apr 15, 2019

exekias Apr 15, 2019 •

edited

Loading

exekias Apr 15, 2019

ruflin Apr 16, 2019

ruflin Apr 16, 2019

exekias Apr 16, 2019

odacremolbap Apr 16, 2019

webmat Apr 16, 2019

ruflin Apr 15, 2019

ruflin Apr 15, 2019

exekias Apr 16, 2019

odacremolbap Apr 16, 2019

odacremolbap commented Apr 16, 2019

odacremolbap commented Apr 16, 2019

ruflin commented Apr 17, 2019

odacremolbap commented Apr 17, 2019

		@@ -2590,7 +2590,7 @@ Contains statistics related to the coreDNS service



		`coredns.stats.panic.count.total`::
		`coredns.stats.panic.count`::

	var moduleFieldsMapStr common.MapStr
	moduleFields, ok := event[mb.ModuleDataKey]
	if ok {
	moduleFieldsMapStr, ok = moduleFields.(common.MapStr)
	if !ok {
	m.Logger().Errorf("error trying to convert '%s' from event to common.MapStr", mb.ModuleDataKey)
	}
	}
	delete(event, mb.ModuleDataKey)

		Cache misses count for the cache plugin


		- name: runtime.memory.alloc.bytes

Add CoreDNS metrics for cache plugin and runtime #11738

Add CoreDNS metrics for cache plugin and runtime #11738

Conversation

odacremolbap commented Apr 10, 2019

exekias left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

exekias left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin commented Apr 11, 2019

odacremolbap commented Apr 11, 2019

odacremolbap commented Apr 12, 2019

odacremolbap commented Apr 13, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

exekias Apr 15, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

odacremolbap commented Apr 16, 2019

odacremolbap commented Apr 16, 2019

ruflin commented Apr 17, 2019

odacremolbap commented Apr 17, 2019

exekias Apr 15, 2019 •

edited

Loading