Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CoreDNS metrics for cache plugin and runtime #11738

Merged

Conversation

odacremolbap
Copy link
Contributor

A fairly common CoreDNS setting is activating the cache plugin.

That plugin has a couple useful metrics that we should include.
There are also some runtime go metrics that we should be adding.

@odacremolbap odacremolbap added in progress Pull request is currently in progress. Metricbeat Metricbeat Team:Integrations Label for the Integrations team labels Apr 10, 2019
@odacremolbap odacremolbap requested a review from a team as a code owner April 10, 2019 06:46
@odacremolbap odacremolbap requested a review from a team as a code owner April 10, 2019 07:32
Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -2590,7 +2590,7 @@ Contains statistics related to the coreDNS service



*`coredns.stats.panic.count.total`*::
*`coredns.stats.panic.count`*::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you didn't mention this change, I understand it's intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so true,

I'm following this convention here:
https://www.elastic.co/guide/en/beats/devguide/current/event-conventions.html#abbreviations

If you think this is ok, I'll go on and add it to the PR description

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First worried about the change as it's breaking but realised the CoreDNS module is not release yet. So all good.

Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests for the new fields?

@exekias exekias self-requested a review April 10, 2019 08:41
@@ -2590,7 +2590,7 @@ Contains statistics related to the coreDNS service



*`coredns.stats.panic.count.total`*::
*`coredns.stats.panic.count`*::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First worried about the change as it's breaking but realised the CoreDNS module is not release yet. So all good.

@ruflin
Copy link
Member

ruflin commented Apr 11, 2019

@odacremolbap To get the tests green you need to run go test . -generate in https://github.com/elastic/beats/tree/master/metricbeat/mb/testing/data It will update the golden files with the new metrics.

Good to see our test system works :-)

@odacremolbap
Copy link
Contributor Author

so much thanks for your early review @ruflin

I need to sort this out #11759 (prometheus tests) before continuing here
and then move on to #11619 (kibana dashboards for CoreDNS)

@odacremolbap odacremolbap added review and removed in progress Pull request is currently in progress. labels Apr 12, 2019
@odacremolbap odacremolbap requested a review from ruflin April 12, 2019 20:36
@odacremolbap
Copy link
Contributor Author

jenkins, test this

@odacremolbap
Copy link
Contributor Author

@ruflin @exekias
I re-requested review since I added a couple commits, mostly a test.

@@ -1,5 +1,5 @@
# Start from coredns base Docker image
FROM coredns/coredns:latest
FROM coredns/coredns:1.5.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

"coredns_cache_hits_total": prometheus.Metric("dns.cache.hits.count"),
"coredns_cache_misses_total": prometheus.Metric("dns.cache.misses.count"),

// go runtime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we expect these also in other places, I wonder if we should introduce runtime on the root level instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if I'm on the same page.

Right now event path is coredns.stats which makes the event look like

    {
        "coredns": {
            "stats": {
                "panic": {
                    "count": 0
                },
                "runtime": {
                    "go": {
                        "routines": {
                            "count": 21
                        }
                    },
                    "memory": {
                        "alloc": {
                            "bytes": 1843744
                        }
                    },
                    "process": {
                        "threads": {
                            "count": 18
                        }
                    }
                }
            }
        },
        "event": {
            "dataset": "coredns.stats",
            "duration": 115000,
            "module": "coredns"
        },
        "metricset": {
            "name": "stats"
        },
        "service": {
            "address": "127.0.0.1:55555",
            "type": "coredns"
        }
    },

do you registering creating a new metricset by the name coredns.runtime ? and parsing only the runtime data?

Copy link
Contributor

@exekias exekias Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree runtime should be in the root level of the event, that means:

    {
      "runtime": {
        "go": {
          "routines": {
            "count": 21
          },
        },
      },
      "coredns": {
          "stats": {
              "panic": {
                  "count": 0
              },
          }
      },
        ...
    },

That said, I think the Prometheus helper framework doesn't support this? We have done tricks like this in the past:

var moduleFieldsMapStr common.MapStr
moduleFields, ok := event[mb.ModuleDataKey]
if ok {
moduleFieldsMapStr, ok = moduleFields.(common.MapStr)
if !ok {
m.Logger().Errorf("error trying to convert '%s' from event to common.MapStr", mb.ModuleDataKey)
}
}
delete(event, mb.ModuleDataKey)
but I would prefer to keep them to a minimum, we should implement a better way to get this.

Also, getting runtime metrics from the stats metricset may sound wrong? It could make sense to add a runtime metricset, common to all modules supporting this, to make things uniform.

Perhaps we should move runtime metrics to a different issue to unblock this, then add support for them the right way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about having a different metricset, I take that back, as that would mean double requesting the same info from the exporter. I'm ok with this 👍

About moving things to the root level, do you foresee this becoming an ECS field @ruflin?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for ECS. I remember we had somewhere else a discussion about this with @webmat .

For having things on the root level: Can we extend the prometheus helper to make this possible? I assume now that all is based on the reporter interface this should be easier.

My take to unblock this PR would be to add for now runtime inside the metricset with all the prefixes and abstract it out later. Not optimal but keeps us moving.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@exekias Interesting question if 1 request should be able to create 2 events or if metricsets are ok. To keep complexity low I would argue having 2 requests for the same thing is ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abstracting it out later would imply deprecating these metrics in favor of the new names, is that correct? If we are discussing adding these for all services I think they deserve a better solution, so we can share mapping. How about leaving them out of this PR and open an issue to tackle runtime metrics with prometheus helper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin @exekias
I removed runtime metrics before reading these new comments.

If you are ok I'll move forward, runtime should come from here #11836

I don't think it will take us very long to come to an agreement, and then I will add back the metrics. In the meantime I'll open an issue for adding the root namespaced items at the prometheus helper.

Is that ok with you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of a runtime fieldset, please ping me on the PR whenever you work on it 👀 @odacremolbap

mbtest "github.com/elastic/beats/metricbeat/mb/testing"
)

func TestData(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? CoreDNS uses the new testing framework to generate the data.json.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should have an integration test instead?

Cache misses count for the cache plugin


- name: runtime.memory.alloc.bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! thanks for pointing that out

@odacremolbap
Copy link
Contributor Author

jenkins, test this

@odacremolbap
Copy link
Contributor Author

jenkins, test this

looks like quay is funny

@odacremolbap odacremolbap merged commit 872d7fc into elastic:master Apr 16, 2019
@odacremolbap odacremolbap deleted the task/add-more-coredns-metrics branch April 16, 2019 18:55
@ruflin
Copy link
Member

ruflin commented Apr 17, 2019

@odacremolbap One thing I just realised: We have dashboards for CoreDNS. Can you double check if the field changes did not break the dashboards?

@odacremolbap
Copy link
Contributor Author

I think you mean this
#11619

yes, I was waiting for this merge to dive in there.
thanks for reminding!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Metricbeat Metricbeat review Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants