Add profile label to existing scheduling metrics #92189

alculquicondor · 2020-06-16T14:19:46Z

/sig scheduling

Ref kubernetes/enhancements#1451

As described in the KEP for Multiple Profiles, we decided to add a profile label to the following metrics:

"schedule_attempts_total"
"e2e_scheduling_duration_seconds"
"scheduling_algorithm_duration_seconds"
"scheduling_algorithm_preemption_evaluation_seconds"
"binding_duration_seconds"

@dashpole pointed me to metrics stability classes, which indicate that such change wouldn't be backwards compatible.

All our metrics are ALPHA, so we should be good to add the label, having the appropriate release note.

Any concerns? GKE forsees no major risk

alculquicondor · 2020-06-16T14:20:47Z

cc @ahg-g @Huang-Wei

liggitt · 2020-06-16T14:51:50Z

cc @logicalhan

alculquicondor · 2020-06-16T15:15:31Z

cc @ingvagabund

ingvagabund · 2020-06-18T14:19:49Z

Both CounterOpts [1] and HistogramOpts have ConstLabels field which I presume you plan to use to set the profile label?

The code under clusterloader2/pkg/measurement/common/scheduler_latency.go does not take histogram labels into account. Also, schedulingMetrics type will need to be extended to support multiple profiles (at least for the default one). Including changes in the perf dashboard as well.

[1] https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/metrics/metrics.go#L52

alculquicondor · 2020-06-18T14:24:06Z

Both CounterOpts [1] and HistogramOpts have ConstLabels field which I presume you plan to use to set the profile label?

I'm migrating to the Vec counterparts, like HistogramVec

Thanks for the pointers. What's the easiest way to run the perf tests to make sure they work?

ingvagabund · 2020-06-18T14:56:15Z

I'm migrating to the Vec counterparts, like HistogramVec

Yeah, that is more natural.

Thanks for the pointers. What's the easiest way to run the perf tests to make sure they work?

Through PR against perf-tests and checking CI artifacts. Feel free to lemme know if you need any help.

alculquicondor · 2020-06-18T20:31:54Z

It looks like the measurements will continue to work with the new profile label. Basically the code would merge all the values together at this point

https://github.com/kubernetes/perf-tests/blob/f3cf096d437805dd51e27b406f32f1af2e42307d/clusterloader2/pkg/measurement/common/scheduler_latency.go#L179

And that is fine even if we chose to have multiple profiles by default. Unless we really needed to analyze the data per profiles. But, today we only have one, so no problems.

The problem could be with the new result label for e2e_scheduling_duration_seconds
We are now tracking latency for any scheduling result, whereas before we only recorded it for successful scheduling attempts.

Note that successful scheduling attempts run through pretty much the same steps (although some pods might go through preemption). Unschedulable attempts could finish after filtering, after running preemption or during the permit endpoint which no default plugin uses. Errors could happen at any time, except that we actually shouldn't expecting them to happen.

So the question is: in the perf dashboard, do we want to track (1) all the latency of all attempts regardless of the result, (2) each result of attempts separately or (3) only the successful attempts.

If we do nothing, we are going with (1). If we want to (roughly) preserve the current behavior, we need to do (3). Note that I say roughly, because we were still recording latency for the cases were binding failed. But I don't think we were expecting that to happen during perf tests. Not sure if we expected unschedulable pods.

Thoughts? @ahg-g @ingvagabund. Personally, I say, given the nature of perf_tests, (1) is fine.

ahg-g · 2020-06-18T20:39:55Z

aggregating all attempts under one graph is fine, we have postfilter extension point latency which should give us a good idea of the impact of preemption if it gets executed.

I would like to expand on the perf tests we have for kubemark to include cases that include preemption, so once those are in, we can reevaluate the metrics.

ingvagabund · 2020-06-22T11:08:10Z

It looks like the measurements will continue to work with the new profile label. Basically the code would merge all the values together at this point

If I read the github.com/prometheus/client_go/prometheus code correctly, after extending the labels with the profiles, i.e.:

FrameworkExtensionPointDuration = metrics.NewHistogramVec(
		&metrics.HistogramOpts{
			Subsystem: SchedulerSubsystem,
			Name:      "framework_extension_point_duration_seconds",
			Help:      "Latency for running all plugins of a specific extension point.",
			// Start with 0.1ms with the last bucket being [~200ms, Inf)
			Buckets:        metrics.ExponentialBuckets(0.0001, 2, 12),
			StabilityLevel: metrics.ALPHA,
		},
		[]string{"extension_point", "status", "profile"})

scheduler_latency.go will still accept the histogram. Though, you will not be able to distinguish extension points of individual profiles.

And that is fine even if we chose to have multiple profiles by default. Unless we really needed to analyze the data per profiles. But, today we only have one, so no problems.

Which is the case so taking all my concerns back.

The problem could be with the new result label for e2e_scheduling_duration_seconds
We are now tracking latency for any scheduling result, whereas before we only recorded it for successful scheduling attempts.

"We are now tracking" you mean your PR is, right? Do you need to add the result label and have the metric updated for the failed attempts as well?

alculquicondor · 2020-06-22T14:31:17Z

"We are now tracking" you mean your PR is, right?

Correct

Do you need to add the result label and have the metric updated for the failed attempts as well?

If we leave the perf code as is, they will be aggregated together. Do you think we should distinguish?
This describes our options:

So the question is: in the perf dashboard, do we want to track (1) all the latency of all attempts regardless of the result, (2) each result of attempts separately or (3) only the successful attempts.

If we do nothing, we are going with (1). If we want to (roughly) preserve the current behavior, we need to do (3). Note that I say roughly, because we were still recording latency for the cases were binding failed. But I don't think we were expecting that to happen during perf tests. Not sure if we expected unschedulable pods.

ingvagabund · 2020-06-22T14:50:03Z

If we leave the perf code as is, they will be aggregated together.

I don't think that's to be discussed in this PR (Add profile label to existing scheduling metrics).
Though, aggregation of the attempts assumes you will observe even the failed attempts. Is there a specific case where those are use full? Though, I am not opposed to that. In which case the result label is needed.

ingvagabund · 2020-06-22T15:04:44Z

My interpretation of e2eSchedulingLatency is the time spend on successfully scheduling a pod plus time spent on evaluating the bind operation (no matter if bind operation succeeds or not). In order to know the worst/average/min time of a single and completed scheduling cycle. If the scheduling fails in the middle (e.g. due to score/filter error), we don't care much unless the errors happen too frequently. In which case there's probably a bug and some transient/permanent error. So I don't see much use in recording latency for the failed cases.

alculquicondor · 2020-06-22T15:22:05Z

Recording all latencies is important to understand and estimate the throughput of the scheduler in a running cluster.

If you only check successful attempts and you have a significant amount of unschedulable pods, the latency reported by the metric is not representative.

ahg-g · 2020-06-22T19:53:54Z

@ingvagabund you will be able to break it down result (scheduled, unscheduled and error), so the information is not lost.

ingvagabund · 2020-06-23T09:07:49Z

Given #92202 merged, it might be useful to update help text of e2e_scheduling_duration_seconds. Right now it is E2e scheduling latency in seconds (scheduling algorithm + binding) [1] which historically meant scheduling_algorithm_duration_seconds + binding_duration_seconds. Which is no longer reflected in #92202 given scheduling_algorithm_duration_seconds is reported only when the scheduling algorithm returns nil error but e2e_scheduling_duration_seconds is now reported even then the algorithm returns non-nil error.

[1] https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/metrics/metrics.go#L75

alculquicondor · 2020-09-16T14:36:33Z

The work done for 1.19 is enough

/close

k8s-ci-robot · 2020-09-16T14:36:47Z

@alculquicondor: Closing this issue.

In response to this:

The work done for 1.19 is enough

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Jun 16, 2020

alculquicondor mentioned this issue Jun 16, 2020

Add profile label to schedule_attempts_total metric and e2e_scheduling_duration_seconds #92202

Merged

alculquicondor mentioned this issue Jun 18, 2020

Add profile label to framework_extension_point_duration_seconds #92268

Merged

k8s-ci-robot closed this as completed Sep 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add profile label to existing scheduling metrics #92189

Add profile label to existing scheduling metrics #92189

alculquicondor commented Jun 16, 2020 •

edited

Loading

alculquicondor commented Jun 16, 2020

liggitt commented Jun 16, 2020

alculquicondor commented Jun 16, 2020

ingvagabund commented Jun 18, 2020

alculquicondor commented Jun 18, 2020

ingvagabund commented Jun 18, 2020

alculquicondor commented Jun 18, 2020

ahg-g commented Jun 18, 2020

ingvagabund commented Jun 22, 2020

alculquicondor commented Jun 22, 2020

ingvagabund commented Jun 22, 2020

ingvagabund commented Jun 22, 2020 •

edited

Loading

alculquicondor commented Jun 22, 2020

ahg-g commented Jun 22, 2020 •

edited

Loading

ingvagabund commented Jun 23, 2020

alculquicondor commented Sep 16, 2020

k8s-ci-robot commented Sep 16, 2020

Add profile label to existing scheduling metrics #92189

Add profile label to existing scheduling metrics #92189

Comments

alculquicondor commented Jun 16, 2020 • edited Loading

alculquicondor commented Jun 16, 2020

liggitt commented Jun 16, 2020

alculquicondor commented Jun 16, 2020

ingvagabund commented Jun 18, 2020

alculquicondor commented Jun 18, 2020

ingvagabund commented Jun 18, 2020

alculquicondor commented Jun 18, 2020

ahg-g commented Jun 18, 2020

ingvagabund commented Jun 22, 2020

alculquicondor commented Jun 22, 2020

ingvagabund commented Jun 22, 2020

ingvagabund commented Jun 22, 2020 • edited Loading

alculquicondor commented Jun 22, 2020

ahg-g commented Jun 22, 2020 • edited Loading

ingvagabund commented Jun 23, 2020

alculquicondor commented Sep 16, 2020

k8s-ci-robot commented Sep 16, 2020

alculquicondor commented Jun 16, 2020 •

edited

Loading

ingvagabund commented Jun 22, 2020 •

edited

Loading

ahg-g commented Jun 22, 2020 •

edited

Loading