Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8scluster] switch k8s.node.condition to mdatagen #27617

Closed
povilasv opened this issue Oct 11, 2023 · 28 comments
Closed

[k8scluster] switch k8s.node.condition to mdatagen #27617

povilasv opened this issue Oct 11, 2023 · 28 comments

Comments

@povilasv
Copy link
Contributor

povilasv commented Oct 11, 2023

Component(s)

receiver/k8scluster

Describe the issue you're reporting

As noted in #27586 (comment) comment, we should switch k8s.node.condition_* metrics to mdatagen.

I think we would need to deprecate this config:

k8s_cluster:
  node_conditions_to_report:
    - Ready
    - MemoryPressure

We could keep existing default, i.e k8s.node_condition_ready enabled, other conditions disabled.

@povilasv povilasv added the needs triage New item requiring triage label Oct 11, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jinja2
Copy link
Contributor

jinja2 commented Oct 12, 2023

Will this work with the existing metric naming convention for the condition metrics? The current metric names have the condition in it. So for the condition type MemoryPressure, we generate the metric name k8s.node.condition_memory_pressure. But if I am not mistaken, to move these to mdatagen we'll need an exhaustive list of node conditions to list all the metrics(?). Fwiw k8s node's conditions can be set by external controllers, and there is no single exhaustive list for us to reference for the possible conditions. I guess we could make the condition an attribute instead and name the metric k8s.node.condition.

@povilasv
Copy link
Contributor Author

I see, I guess we can leave it as is, given that we don't have full list of node conditions.

@crobert-1
Copy link
Member

I'm going to close since it seems like we can't do this with our current functionality, but please let me know if I've misunderstood something here.

@crobert-1 crobert-1 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2023
@TylerHelmuth TylerHelmuth reopened this Oct 16, 2023
@TylerHelmuth
Copy link
Member

TylerHelmuth commented Oct 16, 2023

I would like to continue considering what it would take to do this. Since it is a scraper receiver it would be nice if it can use mdatagen for everything, even if that means updating mdatagen. At the moment these metrics cannot be enabled/disabled via the metrics config and the docs are not auto-generating. Both are inconsistencies with the rest of the metrics and present a degraded user experience.

@jinja2
Copy link
Contributor

jinja2 commented Oct 16, 2023

We can update mdatagen for such use case or we can re-evaluate how node conditions are reported.

What do we think about reporting the condition as metric k8s.node.condition with the condition sent in as attribute condition?

I would also propose removing the explicit list of conditions to provide with the config node_conditions_to_report and instead generate metrics for all conditions seen on a node object. I think reporting all conditions is more appropriate here since managed clusters could have conditions added to nodes with upgrades which an admin might not be aware of and might not add to the list. Speaking as someone who has run prod-grade k8s clusters, I find it a better user experience to have all conditions reported instead of listing out each one in the config. When I did update conditions to report with frameworks like node-problem-detector, the 2nd step to update the collector config felt unnecessary, and just having a single flag to indicate whether to report conditions would be better. We could optionally have the inverse config here, so node_conditions_to_exclude but if we go with the change in metric, this might be not necessary since dropping metric with condition attribute check is a simple enough config addition.

Additionally, I noticed that the receiver will report any condition mentioned in node_conditions_to_report even when it is not found in node.status.condition as Unknown which imo shouldn't be how we handle it. A node condition, especially the ones reported by kubelet like Ready|Memory|PID|DiskPressure when marked as Unknown is indicative of the kubelet not being able to reach the api server, and the unknown state here means that the api server hasn't heard from the kubelet on the node, and is usually followed by actions like marking the node unschedulable. So the state unknown has meaning to it and does not mean that this condition was not found in the node's status. For e.g. passing in a non-existent condition Fake to the receiver, not reporting the metric or a 4th type NotFound might be more appropriate here.

image

@TylerHelmuth
Copy link
Member

@jinja2 I agree with everything you said. We'd have to follow a deprecation process, but since the condition value isn't really an enum I think handling it as a datapoint attribute is appropriate.

@povilasv
Copy link
Contributor Author

povilasv commented Oct 18, 2023

Yeah, this sounds good to me too.

To wrap everything up:

  1. we want to add new metric:
k8s.node.condition{condition="Ready|Memory|PID|DiskPressure|<all other available on Node Status>"}

I think we should keep the same values as we have now:
1 - True
0 - False
-1 - Unknown

Regarding Unknown Kubernetes docs state:

"ConditionUnknown" means kubernetes
// can't decide if a resource is in the condition or not. 

It looks like these are the possible values K8S API returns -> https://github.com/kubernetes/api/blob/master/core/v1/types.go#L2705-L2716

Enabled by default, reports all conditions. If people don't want some condition they can write some ottl to remove the uneeded attributes or keep the ones they want.

  1. Deprecate old configuration / metrics:
k8s_cluster:
  node_conditions_to_report:
    - Ready
    - MemoryPressure

If this config is set throw a warning that this is deprecated and tell users to use the new metric.

@povilasv
Copy link
Contributor Author

FYI: Opened a PR with 1st part #27838 let me know what you think :)

jpkrohling pushed a commit that referenced this issue Oct 24, 2023
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

Add new k8s.node.condition metric, so that we can deprecate
k8s.node.condition_* metrics.


**Link to tracking Issue:** <Issue number if applicable>


#27617

**Testing:** <Describe what testing was performed and which tests were
added.>

- added unit tests

**Documentation:** <Describe the documentation added.>

- added docs
@povilasv povilasv changed the title switch k8s.node.condition to mdatagen [k8scluster] switch k8s.node.condition to mdatagen Oct 25, 2023
sigilioso pushed a commit to carlossscastro/opentelemetry-collector-contrib that referenced this issue Oct 27, 2023
)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

Add new k8s.node.condition metric, so that we can deprecate
k8s.node.condition_* metrics.


**Link to tracking Issue:** <Issue number if applicable>


open-telemetry#27617

**Testing:** <Describe what testing was performed and which tests were
added.>

- added unit tests

**Documentation:** <Describe the documentation added.>

- added docs
@crobert-1 crobert-1 added enhancement New feature or request and removed needs triage New item requiring triage labels Oct 27, 2023
@sirianni
Copy link
Contributor

sirianni commented Nov 3, 2023

I think we should keep the same values as we have now:
1 - True
0 - False
-1 - Unknown

Can we 🙏 please 🙏 stop perpetuating this anti-pattern of encoding states as metric values? See k8s.pod.phase for another (worse) instance of this. Maybe this works well for Splunk but I don't think it is idiomatic in general, and doesn't work well for other backends that I have used (Datadog, New Relic, etc.).

Instead, the status should be another attribute the way kube-state-metrics models it. 👇

Metric name Metric type Description Unit (where applicable) Labels/tags Status
kube_node_status_condition Gauge The condition of a cluster node node=<node-address>
condition=<node-condition>
status=<true|false|unknown>
STABLE

As an example, if I want to get the count of ready nodes in the cluster, how do I do it given that -1 is now polluting the metric values for "unknowns"?

@povilasv
Copy link
Contributor Author

povilasv commented Nov 3, 2023

I don't follow your comment. The actual suggestion and refactor of k8s.node.condition does this:

k8s.node.condition{condition="Ready|Memory|PID|DiskPressure|<all other available on Node Status>"} = 1

The plan is to remove ``node_conditions_to_report` config and the old metric. So what is the issue with this?

@sirianni
Copy link
Contributor

sirianni commented Nov 3, 2023

This = 1 is exactly the problem. Many metrics backends don't let you filter on metric value. Instead you

  • filter on dimension
  • apply aggregations to the value (sum, max, min, etc.).

For example, in Datadog with its kube-state-metrics integration I do this:

sum:kubernetes_state.node.by_condition{condition:ready, status:true} by {kube_cluster_name}

@sirianni
Copy link
Contributor

sirianni commented Nov 3, 2023

Let me ask the question a different way...

Why not follow the same pattern as kube-state-metrics? kube-state-metrics is very widely used and practically a "de facto standard" here. It's also idiomatic in OpenMetrics/Prometheus to use a StateSet for this pattern instead of encoding enums as integers like -1 (or even worse 1, 2, 3, 4 like k8s.pod.phase does).

From the OpenMetrics spec:

# TYPE foo stateset
foo{entity="controller",foo="a"} 1.0
foo{entity="controller",foo="bb"} 0.0
foo{entity="controller",foo="ccc"} 0.0
foo{entity="replica",foo="a"} 1.0
foo{entity="replica",foo="bb"} 0.0
foo{entity="replica",foo="ccc"} 1.0

@sirianni
Copy link
Contributor

sirianni commented Nov 3, 2023

The plan is to remove ``node_conditions_to_report` config and the old metric. So what is the issue with this?

I agree with this change ☝️ . My issue is with encoding the status in the value instead of as a dimension.

@povilasv
Copy link
Contributor Author

povilasv commented Nov 3, 2023

I see, I'm not against encoding status = true|false to 0 and 1 values, but we really need an otel way if defining enumerations and then implement it same way in all metrics. Reference issue: open-telemetry/opentelemetry-specification#1712

Also similar discussion and arguments against having one state per metric: #24425 (comment)

@povilasv
Copy link
Contributor Author

povilasv commented Nov 3, 2023

Edit: we actually merged PR which encodes status = True| False | Unknown to numeric values:

var nodeConditionValues = map[corev1.ConditionStatus]int64{
	corev1.ConditionTrue:    1,
	corev1.ConditionFalse:   0,
	corev1.ConditionUnknown: -1,
}

So it's actually fine.

@sirianni
Copy link
Contributor

sirianni commented Nov 3, 2023

merged PR which encodes status = True| False | Unknown to numeric values:
So it's actually fine.

I'm not following. Isn't this encoding what I was arguing against?

@sirianni
Copy link
Contributor

sirianni commented Nov 3, 2023

@TylerHelmuth @jinja2 @crobert-1 - any thoughts on this?

@sirianni
Copy link
Contributor

sirianni commented Nov 3, 2023

Also similar discussion and arguments against having one state per metric: #24425 (comment)

I disagree with those arguments

This will introduce 5x bigger payload to deliver the same information.

The payload doesn't have to be larger if you do not emit metrics with 0 values. The OpenMetrics StateSet semantics require this, and kube-state-metrics does this but I don't see a reason it must work this way. I don't see much of a semantic difference between nulls and 0 for this use case.

When you look at the chart, it's pretty convenient to see what states the pod has been in over time using just one time series.

Since there is no inherent semantic meaning to the numeric values, this seems like a pretty weak argument. For k8s.pod.phase is a user really going to interpret a change in value from 2 to 4 means that the pod wen't from running to failed?

This seems like a pretty intuitive way to view changes in conditions over time.

sum:kubernetes_state.node.by_condition{status:true, kube_cluster_name:k8s-dev4xqmk} by {condition}

image

@povilasv
Copy link
Contributor Author

povilasv commented Nov 6, 2023

I'm not following. Isn't this encoding what I was arguing against?

Sorry, misread things now I think I understand what you are trying to say. I would suggest you move this discussion into actual issue into OTEL enums and statesets.

@sirianni
Copy link
Contributor

sirianni commented Nov 6, 2023

One last example on why the current formulation is problematic. Many metrics backends will store pre-aggregated time-based rollups to reduce storage and increase performance for queries over longer time intervals.

Consider these two scenarios of an hour's worth of k8s.node.condition metrics where the status changes over the course of a 1hr interval:

Formulation 1 - encoding status as metric value:

k8s.node.condition{node="my-node", condition="DiskPressure"} 1
k8s.node.condition{node="my-node", condition="DiskPressure"} 1
... for 01:00 - 00:29
k8s.node.condition{node="my-node", condition="DiskPressure"} -1
k8s.node.condition{node="my-node", condition="DiskPressure"} -1
... for 01:30 - 00:59

Formulation 2 - encoding status as dimension:

k8s.node.condition{node="my-node", condition="DiskPressure", status="true"} 1
k8s.node.condition{node="my-node", condition="DiskPressure", status="false"} 0
k8s.node.condition{node="my-node", condition="DiskPressure", status="unknown"} 0
k8s.node.condition{node="my-node", condition="DiskPressure", status="true"} 1
k8s.node.condition{node="my-node", condition="DiskPressure", status="false"} 0
k8s.node.condition{node="my-node", condition="DiskPressure", status="unknown"} 0
... for 01:00 - 00:29
k8s.node.condition{node="my-node", condition="DiskPressure", status="true"} 0
k8s.node.condition{node="my-node", condition="DiskPressure", status="false"} 0
k8s.node.condition{node="my-node", condition="DiskPressure", status="unknown"} 1
k8s.node.condition{node="my-node", condition="DiskPressure", status="true"} 0
k8s.node.condition{node="my-node", condition="DiskPressure", status="false"} 0
k8s.node.condition{node="my-node", condition="DiskPressure", status="unknown"} 1
... for 01:30 - 00:59

With Formulation 1, if the backend stores a pre-aggregated rollup for the 1 hour 01:00 - 02:00 the following value will be stored

Metric Attributes Min Max Sum Count
k8s.node.condition {node="my-node", condition="DiskPressure"} -1 1 0 60

At query time, most frontends will apply an average time aggregation, dividing Sum by Count resulting in a 0 value for the metric:

k8s.node.condition{node="my-node", condition="DiskPressure"} 0.0

This gives the false impression that DiskPressure was false for the hour, when in fact it in the true and unknown states each for 30 minutes in that hour.

Contrast this with the suggested Formulation 2

Metric Attributes Min Max Sum Count
k8s.node.condition {node="my-node", condition="DiskPressure", status="true"} 0 0 30 60
k8s.node.condition {node="my-node", condition="DiskPressure", status="false"} 0 0 0 60
k8s.node.condition {node="my-node", condition="DiskPressure", status="unknown"} 0 0 30 60

With this encoding, the query on the rolled-up data with average aggregation returns the expected data

k8s.node.condition{node="my-node", condition="DiskPressure", status="true"} 0.5
k8s.node.condition{node="my-node", condition="DiskPressure", status="unknown"} 0.5

@sirianni
Copy link
Contributor

sirianni commented Nov 6, 2023

I would suggest you move this discussion into actual issue into OTEL enums and statesets.

Fair.... but I don't understand why you're assuming that the way it's encoded here is the current best practice and we need a committee decision / spec change to use the suggested alternative? Can you point to any other metrics in OTel Collector (other than k8sclusterreceiver and k8s.pod.phase) where enums are encoded into numerical metric values like this PR is doing?

My suggested alternative has precedent in kube-state-metrics, which has far more adoption than k8sclusterreceiver. You seem to be assuming that the current k8sclusterreceiver pattern is a blessed standard that we need to blindly continue following.

@sirianni
Copy link
Contributor

sirianni commented Nov 6, 2023

Take a look at the hw.status metric in OTel Semantic Conventions which uses the value-as-attribute approach I'm suggesting. In particular, see this note

hw.status is currently specified as an UpDownCounter but would ideally be represented using a StateSet as defined in OpenMetrics. This semantic convention will be updated once StateSet is specified in OpenTelemetry. This planned change is not expected to have any consequence on the way users query their timeseries backend to retrieve the values of hw.status over time.

image

@sirianni
Copy link
Contributor

sirianni commented Nov 6, 2023

This metric is actually pretty close to the recommended formulation. I think if you just remove the -1 for Unknown then it would have the desired aggregation semantics.

@sirianni
Copy link
Contributor

sirianni commented Nov 6, 2023

The time based rollup example I gave is a bit trickier, but a simpler example is spatial aggregation. Let's say I want to get a count of nodes in my cluster having DiskPressure condition. The desired query would sum up the k8s.node.condition metric where condition = "DiskPressure". However, the -1s for unknown will actually subtract from the total, which is not what we want.

jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

Add new k8s.node.condition metric, so that we can deprecate
k8s.node.condition_* metrics.


**Link to tracking Issue:** <Issue number if applicable>


open-telemetry#27617

**Testing:** <Describe what testing was performed and which tests were
added.>

- added unit tests

**Documentation:** <Describe the documentation added.>

- added docs
Copy link
Contributor

github-actions bot commented Jan 8, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 8, 2024
@crobert-1 crobert-1 removed the Stale label Jan 8, 2024
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Mar 11, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants