-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge state_
and common metricsets for node, pod & container
#7381
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It LGTM in general, only some questions and comments.
if ok { | ||
user, ok = t.(string) | ||
if !ok { | ||
return mb.HostData{}, errors.Errorf("'username' config for module %v is not a string", module) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this configuration be Unpacked to a struct so we don't have to check types here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I guess we do it this way to check if these settings are set at all.
|
||
// Calculate pct fields | ||
for _, event := range events { | ||
memLimit := util.GetFloat64(event, "memory.limit.bytes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are memory and cpu limits of type float?
skip := false | ||
for k, v := range filter { | ||
if GetString(event, k) != v { | ||
skip = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continue
with label directly here?
// MergeEvents from b events into a. The process will ensure that: | ||
// - only events matching the given filter are processed. | ||
// - fields in the delete list will be removed from the event | ||
// - match fields will be used to match events from a & b, if all fields are equal, they will be merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we separate this into three functions? one for merging, another one for filtering and another one to delete fields?
events := util.MergeEvents(containers, stateMetrics, | ||
map[string]string{ | ||
mb.ModuleDataKey + ".node.name": node.NodeName, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the /metrics
endpoint doesn't offer any filtering... There can be a huge list here with big clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have some concerns about this approach, I'm trying something else, let's keep this PR unmerged in the meanwhile.
Thanks for the review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, let me chime in because I'm being affected by this right now 😄
Yes, so my best bet was to "silently" run state_*
metricsets to populate the perfMetrics
only for consumption by the pod
metricset. In this mode, state_*
metricsets shouldn't publish anything:
- module: kubernetes
metricsets:
- pod
- container
- node
hosts: ["192.168.99.100:10255"] # kubelet
period: 10s
state_metrics:
host: "kube-state-metrics:8080"
And state_*
metricsets are enabled for publishing only when you explicitly listed them under the metricsets
:
- module: kubernetes
metricsets:
- state_container
- state_node
period: 10s
state_metrics:
host: "kube-state-metrics:8080"
Then you can deploy the former one in a k8s daemonset, whereas the latter is deployed in a k8s deployment.
But obviously my suggested implementation doesn't resolve #6637.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current plan is:
Once #7470 is merged I can fill perfMetrics from the enrichment process, then they should work for all nodes without special tricks.
I'm closing this PR now as this new approach is more scalable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#7470 seems awesome! But not sure what you mean by the perfMetrics part yet.
Would you be enabling an enhanced version of NodeMetadataEnricher
on the container metricset here, in addition to the ContainerMetadataEnricher, in order to fill perfMetrics.NodeCoresAllocatable
that is used for calculating container.cpu.usage.node.pct
?
Thanks anyway for your efforts as always! I'll keep eyes on #7470 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's the plan 😺
@@ -61,7 +76,15 @@ func (m *MetricSet) Fetch() ([]common.MapStr, error) { | |||
return nil, err | |||
} | |||
|
|||
events, err := eventMapping(body, util.PerfMetrics) | |||
var stateMetrics []common.MapStr | |||
if m.state != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this be nil? or it is only for testing?
Closed in favor of the new approach in #7470 |
This PR is a rework of #6158, closes #6637
Merging
state_pod
&pod
metricsets is tricky, mainly becausestate_*
metricsets get events for the whole cluster (a single metricbeat instance for a kubernetes cluster), andpod
metricset get's events for the current node. This is why we need both aDaemonSet
and aDeployment
when using Metricbeat.Previous PR was flawed because it didn't have that into account, so the only way to get
pct
fields was runningstate_*
metricsets in all nodes. That would send same events for a container from all nodes!This change introduces
kube-state-metrics
fetching topod
,container
andnode
metricsets, so they can use both kubelet/stats/summary/
endpoint andkube-sate-metrics
together, merging their metrics into a single event per entity.Config looks like this:
This change narrows down the supported
pct
fields to (by the moment):I find
pod
ones not needed, as they are a sum ofcontainer
pct's.node.pct
can be added, but I leave that out for a follow up PR, as this one is big already.I think we should deprecate
state_node
,state_pod
andstate_container
in favor of this (7.0).