Merge `state_` and common metricsets for node, pod & container #7381

exekias · 2018-06-20T17:03:22Z

This PR is a rework of #6158, closes #6637

Merging state_pod & pod metricsets is tricky, mainly because state_* metricsets get events for the whole cluster (a single metricbeat instance for a kubernetes cluster), and pod metricset get's events for the current node. This is why we need both a DaemonSet and a Deployment when using Metricbeat.

Previous PR was flawed because it didn't have that into account, so the only way to get pct fields was running state_* metricsets in all nodes. That would send same events for a container from all nodes!

This change introduces kube-state-metrics fetching to pod, container and node metricsets, so they can use both kubelet /stats/summary/ endpoint and kube-sate-metrics together, merging their metrics into a single event per entity.

Config looks like this:

- module: kubernetes
  metricsets:
    - pod
    - container
    - node
  hosts: ["192.168.99.100:10255"]  # kubelet
  period: 10s
  state_metrics:
    host: "kube-state-metrics:8080"

This change narrows down the supported pct fields to (by the moment):

kubernetes.container.cpu.usage.limit.pct
kubernetes.container.memory.usage.limit.pct

I find pod ones not needed, as they are a sum of container pct's. node.pct can be added, but I leave that out for a follow up PR, as this one is big already.

I think we should deprecate state_node, state_pod and state_container in favor of this (7.0).

jsoriano

It LGTM in general, only some questions and comments.

jsoriano · 2018-06-21T09:46:20Z

metricbeat/mb/parse/url.go

+	if ok {
+		user, ok = t.(string)
+		if !ok {
+			return mb.HostData{}, errors.Errorf("'username' config for module %v is not a string", module)


Could this configuration be Unpacked to a struct so we don't have to check types here?

Well, I guess we do it this way to check if these settings are set at all.

jsoriano · 2018-06-21T10:14:15Z

metricbeat/module/kubernetes/container/data.go

+
+	// Calculate pct fields
+	for _, event := range events {
+		memLimit := util.GetFloat64(event, "memory.limit.bytes")


Are memory and cpu limits of type float?

jsoriano · 2018-06-21T11:29:42Z

metricbeat/module/kubernetes/util/mapstr.go

+		skip := false
+		for k, v := range filter {
+			if GetString(event, k) != v {
+				skip = true


continue with label directly here?

jsoriano · 2018-06-21T11:30:27Z

metricbeat/module/kubernetes/util/mapstr.go

+// MergeEvents from b events into a. The process will ensure that:
+//   - only events matching the given filter are processed.
+//   - fields in the delete list will be removed from the event
+//   - match fields will be used to match events from a & b, if all fields are equal, they will be merged


Could we separate this into three functions? one for merging, another one for filtering and another one to delete fields?

jsoriano · 2018-06-21T11:51:40Z

metricbeat/module/kubernetes/container/data.go

+	events := util.MergeEvents(containers, stateMetrics,
+		map[string]string{
+			mb.ModuleDataKey + ".node.name": node.NodeName,
+		},


I guess the /metrics endpoint doesn't offer any filtering... There can be a huge list here with big clusters.

Yes, I have some concerns about this approach, I'm trying something else, let's keep this PR unmerged in the meanwhile.

Thanks for the review!

Hi, let me chime in because I'm being affected by this right now 😄

Yes, so my best bet was to "silently" run state_* metricsets to populate the perfMetrics only for consumption by the pod metricset. In this mode, state_* metricsets shouldn't publish anything:

- module: kubernetes metricsets: - pod - container - node hosts: ["192.168.99.100:10255"] # kubelet period: 10s state_metrics: host: "kube-state-metrics:8080"

And state_* metricsets are enabled for publishing only when you explicitly listed them under the metricsets:

- module: kubernetes metricsets: - state_container - state_node period: 10s state_metrics: host: "kube-state-metrics:8080"

Then you can deploy the former one in a k8s daemonset, whereas the latter is deployed in a k8s deployment.

But obviously my suggested implementation doesn't resolve #6637.

My current plan is:

Once #7470 is merged I can fill perfMetrics from the enrichment process, then they should work for all nodes without special tricks.

I'm closing this PR now as this new approach is more scalable

#7470 seems awesome! But not sure what you mean by the perfMetrics part yet.

Would you be enabling an enhanced version of NodeMetadataEnricher on the container metricset here, in addition to the ContainerMetadataEnricher, in order to fill perfMetrics.NodeCoresAllocatable that is used for calculating container.cpu.usage.node.pct?

Thanks anyway for your efforts as always! I'll keep eyes on #7470 👍

yes, that's the plan 😺

jsoriano · 2018-06-21T12:03:41Z

metricbeat/module/kubernetes/container/container.go

@@ -61,7 +76,15 @@ func (m *MetricSet) Fetch() ([]common.MapStr, error) {
 		return nil, err
 	}

-	events, err := eventMapping(body, util.PerfMetrics)
+	var stateMetrics []common.MapStr
+	if m.state != nil {


How can this be nil? or it is only for testing?

exekias · 2018-07-11T10:00:53Z

Closed in favor of the new approach in #7470

Carlos Pérez-Aradros Herce added 4 commits June 18, 2018 10:44

Make HTTP helper reusable in custom code

043f8bc

Make Prometheus helper reusable in custom code

8cf460d

Make URLHostParserBuilder helper reusable in custom code

5db2600

Merge state and common metricsets for node, pod and container

cd377bb

exekias added enhancement in progress Pull request is currently in progress. review Metricbeat Metricbeat needs_docs containers Related to containers use case labels Jun 20, 2018

exekias requested review from jsoriano and ruflin June 20, 2018 17:08

Update docs

4442831

jsoriano approved these changes Jun 21, 2018

View reviewed changes

exekias removed the review label Jun 25, 2018

exekias closed this Jul 11, 2018

exekias removed in progress Pull request is currently in progress. needs_docs labels Aug 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge `state_` and common metricsets for node, pod & container #7381

Merge `state_` and common metricsets for node, pod & container #7381

exekias commented Jun 20, 2018 •

edited

Loading

jsoriano left a comment

jsoriano Jun 21, 2018

jsoriano Jun 21, 2018

jsoriano Jun 21, 2018

jsoriano Jun 21, 2018

jsoriano Jun 21, 2018

jsoriano Jun 21, 2018

exekias Jun 21, 2018

mumoshu Jul 11, 2018 •

edited

Loading

exekias Jul 11, 2018

mumoshu Jul 11, 2018

exekias Jul 12, 2018

jsoriano Jun 21, 2018

exekias commented Jul 11, 2018

Merge state_ and common metricsets for node, pod & container #7381

Merge state_ and common metricsets for node, pod & container #7381

Conversation

exekias commented Jun 20, 2018 • edited Loading

jsoriano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mumoshu Jul 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

exekias commented Jul 11, 2018

Merge `state_` and common metricsets for node, pod & container #7381

Merge `state_` and common metricsets for node, pod & container #7381

exekias commented Jun 20, 2018 •

edited

Loading

mumoshu Jul 11, 2018 •

edited

Loading