Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge state_ and common metricsets for node, pod & container #7381

Closed
wants to merge 5 commits into from

Conversation

exekias
Copy link
Contributor

@exekias exekias commented Jun 20, 2018

This PR is a rework of #6158, closes #6637

Merging state_pod & pod metricsets is tricky, mainly because state_* metricsets get events for the whole cluster (a single metricbeat instance for a kubernetes cluster), and pod metricset get's events for the current node. This is why we need both a DaemonSet and a Deployment when using Metricbeat.

Previous PR was flawed because it didn't have that into account, so the only way to get pct fields was running state_* metricsets in all nodes. That would send same events for a container from all nodes!

This change introduces kube-state-metrics fetching to pod, container and node metricsets, so they can use both kubelet /stats/summary/ endpoint and kube-sate-metrics together, merging their metrics into a single event per entity.

Config looks like this:

- module: kubernetes
  metricsets:
    - pod
    - container
    - node
  hosts: ["192.168.99.100:10255"]  # kubelet
  period: 10s
  state_metrics:
    host: "kube-state-metrics:8080"

This change narrows down the supported pct fields to (by the moment):

kubernetes.container.cpu.usage.limit.pct
kubernetes.container.memory.usage.limit.pct

I find pod ones not needed, as they are a sum of container pct's. node.pct can be added, but I leave that out for a follow up PR, as this one is big already.

I think we should deprecate state_node, state_pod and state_container in favor of this (7.0).

@exekias exekias added enhancement in progress Pull request is currently in progress. review Metricbeat Metricbeat needs_docs containers Related to containers use case labels Jun 20, 2018
@exekias exekias requested review from jsoriano and ruflin June 20, 2018 17:08
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It LGTM in general, only some questions and comments.

if ok {
user, ok = t.(string)
if !ok {
return mb.HostData{}, errors.Errorf("'username' config for module %v is not a string", module)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this configuration be Unpacked to a struct so we don't have to check types here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I guess we do it this way to check if these settings are set at all.


// Calculate pct fields
for _, event := range events {
memLimit := util.GetFloat64(event, "memory.limit.bytes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are memory and cpu limits of type float?

skip := false
for k, v := range filter {
if GetString(event, k) != v {
skip = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue with label directly here?

// MergeEvents from b events into a. The process will ensure that:
// - only events matching the given filter are processed.
// - fields in the delete list will be removed from the event
// - match fields will be used to match events from a & b, if all fields are equal, they will be merged
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we separate this into three functions? one for merging, another one for filtering and another one to delete fields?

events := util.MergeEvents(containers, stateMetrics,
map[string]string{
mb.ModuleDataKey + ".node.name": node.NodeName,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the /metrics endpoint doesn't offer any filtering... There can be a huge list here with big clusters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have some concerns about this approach, I'm trying something else, let's keep this PR unmerged in the meanwhile.

Thanks for the review!

Copy link

@mumoshu mumoshu Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, let me chime in because I'm being affected by this right now 😄


Yes, so my best bet was to "silently" run state_* metricsets to populate the perfMetrics only for consumption by the pod metricset. In this mode, state_* metricsets shouldn't publish anything:

- module: kubernetes
  metricsets:
    - pod
    - container
    - node
  hosts: ["192.168.99.100:10255"]  # kubelet
  period: 10s
  state_metrics:
    host: "kube-state-metrics:8080"

And state_* metricsets are enabled for publishing only when you explicitly listed them under the metricsets:

- module: kubernetes
  metricsets:
    - state_container
    - state_node
  period: 10s
  state_metrics:
    host: "kube-state-metrics:8080"

Then you can deploy the former one in a k8s daemonset, whereas the latter is deployed in a k8s deployment.

But obviously my suggested implementation doesn't resolve #6637.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current plan is:

Once #7470 is merged I can fill perfMetrics from the enrichment process, then they should work for all nodes without special tricks.

I'm closing this PR now as this new approach is more scalable

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#7470 seems awesome! But not sure what you mean by the perfMetrics part yet.

Would you be enabling an enhanced version of NodeMetadataEnricher on the container metricset here, in addition to the ContainerMetadataEnricher, in order to fill perfMetrics.NodeCoresAllocatable that is used for calculating container.cpu.usage.node.pct?

Thanks anyway for your efforts as always! I'll keep eyes on #7470 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's the plan 😺

@@ -61,7 +76,15 @@ func (m *MetricSet) Fetch() ([]common.MapStr, error) {
return nil, err
}

events, err := eventMapping(body, util.PerfMetrics)
var stateMetrics []common.MapStr
if m.state != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this be nil? or it is only for testing?

@exekias exekias removed the review label Jun 25, 2018
@exekias
Copy link
Contributor Author

exekias commented Jul 11, 2018

Closed in favor of the new approach in #7470

@exekias exekias closed this Jul 11, 2018
@exekias exekias removed in progress Pull request is currently in progress. needs_docs labels Aug 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
containers Related to containers use case enhancement Metricbeat Metricbeat
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider combining "<type>" and "state_<type>" kubernetes metricsets into a single metricset/event.
3 participants