KEP-4381: DRA: PRR and API for beta of structured parameters in 1.32 #4869

pohly · 2024-09-24T11:48:16Z

One-line PR description: PRR and API for beta of structured parameters in 1.32

Issue link: DRA: structured parameters #4381

Other comments:

Much of the PRR text that was originally written for "classic DRA" applies also to "structured parameters". It gets moved from #3063 to #4381, with some minor adaptions. The placeholder comments get restored in #3063 because further work on the KEP would be needed to move it forward - if it gets moved forward at all instead of being abandoned.

The v1beta1 API will be almost identical to the v1alpha3 API, with just some minor tweaks to fix oversights.

The kubelet gRPC gets bumped with no changes. Nonetheless, drivers should get updated, which can be done by updating the Go dependencies and optionally changing the API import.

pohly · 2024-09-24T11:48:38Z

/assign @johnbelamaric @klueska

pohly · 2024-09-24T11:49:01Z

/cc @mrunalp

pohly · 2024-09-24T12:00:35Z

keps/sig-node/4381-dra-structured-parameters/README.md

@@ -2166,6 +2215,10 @@ k8s.io/kubelet/pkg/apis/dra gRPC interface. It was inspired by
 [CSI](https://github.com/container-storage-interface/spec/blob/master/spec.md),
 with “volume” replaced by “resource” and volume specific parts removed.

+Versions v1alpha4 and v1beta1 are supported by kubelet. Both are identical.
+DRA drivers should implement both because support for v1alpha4 might get
+removed.


If you are curious about the technical aspect of implementing this, see the "DRA kubelet: add v1beta1 gRPC API" commit.

kannon92 · 2024-09-24T13:12:48Z

keps/sig-node/4381-dra-structured-parameters/kep.yaml

-  # - my_feature_metric
+  - resource_controller_create_total
+  - resource_controller_create_failures_total
+  - resource controller workqueue with name="resource_claim"


I tried to search for these metrics but I did not find them in the code base.

Maybe resourceclaim_controller_create_attempts_total
and resourceclaim_controller_create_failures_total.

I don't know what the last metric corresponds too.

Good catch, will update the KEP.

About "resource controller workqueue": Those are the workqueue_* metrics from https://kubernetes.io/docs/reference/instrumentation/metrics inside the kube-controller-manager.

keps/sig-node/4381-dra-structured-parameters/README.md

keps/sig-node/3063-dynamic-resource-allocation/README.md

keps/sig-node/4381-dra-structured-parameters/README.md

pohly · 2024-09-24T16:51:37Z

keps/sig-node/4381-dra-structured-parameters/README.md

+	// the value from there if this field is not set.
+	//
+	// +required
+	AdminAccess *bool


Implemented by kubernetes/kubernetes#127266

pohly · 2024-09-30T16:26:10Z

keps/sig-node/4381-dra-structured-parameters/README.md

+    // of logical steps. Validation against those limits happens only when
+    // setting an expression for the first time or when changing
+    // it. Therefore it is possible to change these limits without
+    // affecting stored expressions. Those remain valid.


@jpbetz, @cici37: can you help review this proposal for a CEL cost limit in DRA? See also below for more comments on CELSelectorExpressionMaxCost.

The corresponding code is currently in kubernetes/kubernetes#127511.

IMHO this is not a blocker for merging the KEP. We can still review the details in the code PR.

pohly · 2024-10-01T16:08:58Z

keps/sig-node/4381-dra-structured-parameters/README.md

+
+Do we need to enforce such a policy by default? If so, with which API?
+
+<<[/UNRESOLVED]>>


This was raised by David Eads when reviewing the VAP that we used in 1.31 (see above).

cc @ritazh

Options:

Keep the field as alpha with a separate feature gate (DRAAdminAccess) while promoting the rest of the KEP to beta. Defer the decision how to do permission checks.

Same as before, but describe the field in a separate KEP.

Implement access control via a admin-access.kubernetes.io namespace label (name to be decided) with an in-tree admission controller in 1.32, promote to beta together with this KEP.

Create a separate AdminAccessResourceClaim top-level type and use RBAC permissions to decide who's allowed to create them.

We discussed this in today's WG Device Management meeting and lean towards the first option. A second KEP seemed overkill. But we are aware that this stretches the KEP process a bit. The third option is doable, but it might not be the right API for this kind of access control. Perhaps some future mechanism in Kubernetes will allow doing this in a better way. The last option doesn't scale when the number of claim aspects that need special permission grows.

The consensus was to leave the AdminAccess field in alpha. This will mean either a separate KEP, or combining that with another set of small enhancements under a single feature gate. Not for any technical reason, but just because our release process doesn't have a way to move KEPs partially to another stage. At least, I think that's true - @jrrickard does that sound right? The question is - could we have two feature gates in one KEP, and have one feature gate advance to beta and the other stay in alpha? I am assuming this will confuse our processes that track features.

combining that with another set of small enhancements under a single feature gate

That feels wrong to me. Each KEP should have its own use cases and proceed according to its own graduation criteria. We don't want to hold back one "sub-feature" because its sibling isn't ready and they share the same feature gate.

"AdminAccess" probably deserves its own KEP. I have a slight tendency towards keeping it here with a separate feature gate for 1.32 to avoid causing more confusion and work shortly before the KEP freeze, and then split it out after the 1.32 release for "proper" tracking in >= 1.33.

Added text and feature gate.

johnbelamaric · 2024-10-01T23:12:13Z

keps/sig-node/4381-dra-structured-parameters/README.md

+gate explicitly gets enabled. The feature gate remains off by default because
+DRA depends on a new API group which following the
+[convention](https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/3136-beta-apis-off-by-default)
+is off by default.


Shouldn't the feature gate be on by default, but just the API off by default? This is a significant distinction for managed k8s which may allow different controls for feature gates versus API enablement.

The problem is that the code which gets enabled because of the feature gate then tries to use the API, which causes failures if the API is disabled. For example, kube-scheduler fails to start because the ResourceSlice informer fails to sync the cache.

Hmph. That's what discovery is for...

But what's the advantage of enabling the feature gate when the API is off?

I can see how it might simplify the cluster administration, but that's all. It adds more code and thus additional complexity, too.

And a need for more testing. "Feature gate on, API off" becomes another supported configuration that must work.

validate user doesn't enable the gate and not the API

I'm not aware of any code which implements that. @aojea, you already had to deal with this in 1.31 for MultiCIDRServiceAllocator. The comment was about that feature. Did you implement such validation?

I believe this is the last remaining PRR item, then I can approve. If you can add this validation as the plan-of-record, we can merge it now. If it turns out to be too difficult, we can skip it and update the KEP to reflect that.

I've added one paragraph about this to the scheduler section, ending with "How to implement such a check reliably still needs to be determined."

I'm surprised that this is now required because I've not seen something like that before and a bit worried that this additional check itself will be a source of failures. I'll start a new discussion in #sig-architecture about this.

As we are getting close to finalizing the KEP, I squashed. The latest change is https://github.com/kubernetes/enhancements/compare/012f94389f7f2263c34486d3a9697a2d040739d3..e64c56539f85dd5ae951bd0a51d564ce4ef5df02

validate user doesn't enable the gate and not the API

I'm not aware of any code which implements that. @aojea, you already had to deal with this in 1.31 for MultiCIDRServiceAllocator. The comment was about that feature. Did you implement such validation?

@pohly no, I didn't, enabling APIs in the multicidr feature without the feature gate is a noop, and enabling the feature without the APIs does not start the cluster at all. It seems that now is too late, but if I have to do it again I will implement the cross validation, seems the best option and probably we should document it for future cases if we find a standard a reliable way of doing it

keps/sig-node/4381-dra-structured-parameters/README.md

johnbelamaric · 2024-10-01T23:17:36Z

keps/sig-node/4381-dra-structured-parameters/README.md

+
+Do we need to enforce such a policy by default? If so, with which API?
+
+<<[/UNRESOLVED]>>


The consensus was to leave the AdminAccess field in alpha. This will mean either a separate KEP, or combining that with another set of small enhancements under a single feature gate. Not for any technical reason, but just because our release process doesn't have a way to move KEPs partially to another stage. At least, I think that's true - @jrrickard does that sound right? The question is - could we have two feature gates in one KEP, and have one feature gate advance to beta and the other stay in alpha? I am assuming this will confuse our processes that track features.

keps/sig-node/4381-dra-structured-parameters/README.md

mrunalp · 2024-10-02T00:12:25Z

keps/sig-node/4381-dra-structured-parameters/README.md

+how much hardware is installed in the cluster. Typically, each driver will
+publish one ResourceSlice per node where it manages hardware.
+
+Kubernetes itself will not impose specific limitations for the number of these


Any etcd scale limits that we need to worry about, especially, if the ResourceSlices do get bigger with partionable devices support?

I'm not aware of any besides the usual maximum size of a single object. The limits imposed on slices and string lengths must ensure that the object can be stored when encoded, which is why we did those back-of-the-envelope calculations about the maximum size.

pohly · 2024-10-04T09:02:53Z

Can we capture some of the points re: scheduling/allocation perf and scale testing as GA criteria (as discussed in the wg meeting today) or a note so we don't forget when we go to GA :)

I added:

Scalability and performance analysis done and published

pohly · 2024-10-04T10:03:49Z

I think I have addressed all review comments and there's nothing unresolved left.

mrunalp · 2024-10-04T16:48:45Z

@johnbelamaric I have added the sig-node approval. Leaving the PRR approval/lgtm to you.

johnbelamaric

I don't think the proposed handling of the alpha field in the beta API is correct.

johnbelamaric · 2024-10-04T22:53:38Z

keps/sig-node/4381-dra-structured-parameters/kep.yaml

+    # because it remains in alpha. It controls whether the "AdminAccess"
+    # field can be set. If set, other components support it
+    # because the field might have been set when enabled.
+    # Therefore other components don't need to and shouldn't


So, this field should be omitted from the beta1 API, unless the feature gate is set, shouldn't it?

Also, clients should also have the feature gate (kube-scheduler in particular), and only take action if the feature gate is set. It doesn't matter if the value was set while the gate was enabled; the feature should not have an effect when the gate is disabled.

A while back @thockin started a discussion about how we handle feature gates. I can't find it again now, but if I remember correctly, it boiled down to the observation that we handle feature gates inconsistently. We either:

Put all code related to it behind an if check.

Check at the boundary, typically the apiserver.

I am using the second approach here because I think the first one leads to additional error scenarios. For example:

A user was able to request admin access (either by using 1.31, or 1.32 with feature gate enabled) when deploying a monitoring solution.

Feature gate gets disabled before pod scheduling.

Scheduler treats those claims like normal claims.

They now prevent using hardware that was meant for production workloads. To the user it is not obvious that this happened because they see the monitoring solution as deployed with admin access in the request.

Found it: https://docs.google.com/document/d/1roVAHyF7eWZAccmCKw7MXYUNgx4BCDSXF2IMS8h10oY/edit?resourcekey=0-x6Tw2qz1SpCIPhbec6Qa2A. From that document:

It would be very bad if their stored object started failing validation or worse, their cluster started acting badly.

It's this "acting badly" that happens when ignoring the admin access field because of a feature gate check in code which is supposed to honor that field.

We do not have good guidelines about whether implementation logic should itself be gate-controlled or not.

In particular, "only take action if the feature gate is set" is not a general guideline.

As explained in the doc, not putting code behind an if check comes with risk that the code is faulty and then gets invoked despite the feature gate being off. This is a real problem which has no good answer in the general case.

Coming back to admin access: the code which would have to be put behind an if check is pretty short. I think the risk that it's faulty is lower than the problems that arise when not executing it, so I stand by my proposal to always honor the field if it was set.

So, if the code in kube-scheduler that handles admin access is causing it to crash loop, then doing it "only at the boundary" means that turning off the feature gate won't fix the crashing. You have to turn off the feature gate, and then delete or update the claim with admin access. I don't like that.

We are talking about this code here:

https://github.com/kubernetes/kubernetes/blob/e5f8706cb712d5fe2b61c38fe4f5557425e412bc/staging/src/k8s.io/dynamic-resource-allocation/structured/allocator.go#L273-L281

And this:
https://github.com/kubernetes/kubernetes/blob/e5f8706cb712d5fe2b61c38fe4f5557425e412bc/staging/src/k8s.io/dynamic-resource-allocation/structured/allocator.go#L570C1-L575C6

And this:

https://github.com/kubernetes/kubernetes/blob/e5f8706cb712d5fe2b61c38fe4f5557425e412bc/staging/src/k8s.io/dynamic-resource-allocation/structured/allocator.go#L746

The risk of that code crashing is... very low. Even if it did, it would not crash the scheduler. At worst it will prevent scheduling the pod using such a claim.

I get your concern, but in this case doing the feature gate if checks is worse.

Having said that, I don't care enough to fight over this. If you want the if checks, then I can add them.

The alternative to "ignore the field" is to explicitly cause allocation to fail when it is set and the feature gate is off.

@pohly let's do that - your last suggestion. Refuse to allocate if the field is set. In particular, this is the lowest risk option from a security standpoint, while still avoiding an "incorrect" (allocated NOT in admin mode) outcome.

For posterity: https://kubernetes.slack.com/archives/CPNHUMN74/p1728317533313819

johnbelamaric · 2024-10-04T22:55:15Z

keps/sig-node/4381-dra-structured-parameters/README.md

+gate explicitly gets enabled. The feature gate remains off by default because
+DRA depends on a new API group which following the
+[convention](https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/3136-beta-apis-off-by-default)
+is off by default.


Hmm. @deads2k a little help here?

keps/sig-node/4381-dra-structured-parameters/README.md

johnbelamaric · 2024-10-04T23:23:52Z

keps/sig-node/4381-dra-structured-parameters/README.md

-<!--
-This section must be completed when targeting beta to a release.
-->
+The container runtime must support CDI.


What errors should we expect if this is not the case? Where would they appear? Would this show up as: pod schedules, but errors out?

Added some text for this.

keps/sig-node/4381-dra-structured-parameters/README.md

johnbelamaric · 2024-10-04T23:30:36Z

keps/sig-node/4381-dra-structured-parameters/README.md

+need to watch additional resources:
+- kube-controller-manager: ResourceClaimTemplate
+- kube-scheduler: ResourceClaim, DeviceClass, ResourceSlice
+- kubelet: ResourceClaim


presumably only the ones relevant for that node?

Actually, kubelet: ResourceClaim was wrong. We replaced that with doing a GET precisely because watching all by all kubelets would have been a scalability problem. The GET adds latency, but it scales linearly with number of nodes and claims.

Removed.

johnbelamaric · 2024-10-04T23:35:34Z

keps/sig-node/4381-dra-structured-parameters/README.md

+
+  - Testing: An E2E test covers the expected retry mechanism in kubelet when
+    `NodePrepareResources` fails intermittently.
+

 ###### What steps should be taken if SLOs are not being met to determine the problem?


Once the SLOs are clearer, you should fill this out.

logicalhan · 2024-10-08T11:12:53Z

keps/sig-node/3063-dynamic-resource-allocation/README.md


-Workloads not using ResourceClaims should not be impacted because the new code


Why was this removed?

This is classic DRA. Some of the answers there became out-dated when pivoting the two KEPs, so in this PR I moved those answers to where they belong now (the structured parameter KEP) and restored the placeholder KEP comments in the classic DRA KEP for future editing of that KEP.

But classic DRA is dead. I suppose I can just leave that KEP in whatever state it was and focus exclusively on structured parameters here: I'll remove the changes to keps/sig-node/3063-dynamic-resource-allocation/README.md.

logicalhan · 2024-10-08T11:13:50Z

keps/sig-node/3063-dynamic-resource-allocation/README.md


 ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

-Merely enabling the feature is not expected to increase resource usage much.


Why is this removed? Shouldn't this be (at least), "no"?

logicalhan · 2024-10-08T11:14:43Z

keps/sig-node/3063-dynamic-resource-allocation/README.md


 ###### What are other known failure modes?

- DRA driver does not or cannot allocate a resource claim.
+- DRA driver controller does not or cannot allocate a resource claim.


Any metrics we can refer to?

bart0sh · 2024-10-08T13:34:04Z

keps/sig-node/4381-dra-structured-parameters/README.md

+removed.
+
+The following metric mirrors the `csi_operations_seconds`:
+- Metric name: `dra_operations_seconds`


Would it make sense to name it so that it's clear that this metric is about GRPC, e.g. dra_grpc_operations_seconds or something similar?
We may decide to add other metrics for DRA operations, so it would be nice if they don't conflict.

That's then not consistent with other metrics. Let's ask our SIG Instrumentation reviewers during the implementation review what they prefer.

Well, even dra_operations_seconds is not consistent with similar metrics in the pkg/kubelet/metrics/metrics.go. We should probably decide which naming convention we're going to follow csi or kubelet.
As we're going to add new metrics to the kubelet, I tend to prefer kubelet naming.

We don't need to resolve that here, we can update the KEP based on what SIG instrumentation recommends during the implementation phase.

So for Kubelet naming convention the metrics could be named this way:

dra_operations_duration_seconds

dra_operations_errors_total

dra_grpc_operations_duration_seconds

dra_grpc_operations_errors_total

And for CSI they can be named this way:

dra_operation_duration_seconds

operations_seconds
similar to storage_operation_duration_seconds and operations_seconds existing CSI metrics. The latter can't be done as it would conflict with the existing metric name.

I'd propose to use Kubelet naming scheme as it's much less confusing then CSI from my point of view.

I thought though that we didn't need the separate errors_total because we have it in the duration_seconds histograms? Or is that an aggregation (not sure we need that?)?

We don't need to resolve that here, we can update the KEP based on what SIG instrumentation recommends during the implementation phase.

absolutely

I thought though that we didn't need the separate errors_total because we have it in the duration_seconds histograms? Or is that an aggregation (not sure we need that?)?

That's what kubelet uses for its metrics, e.g. runtime_operations_duration_seconds and runtime_operations_errors_total

I'd propose to use Kubelet naming scheme as it's much less confusing then CSI from my point of view.

csi_operations_seconds is also a kubelet metric.

Let's continue the discussion in your PR or in #wg-device-management. You can present the alternatives and perhaps some future users of these metrics will have an opinion.

ok, let's continue there

johnbelamaric

Almost there :)

johnbelamaric · 2024-10-08T13:48:54Z

keps/sig-node/4381-dra-structured-parameters/README.md

+gate explicitly gets enabled. The feature gate remains off by default because
+DRA depends on a new API group which following the
+[convention](https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/3136-beta-apis-off-by-default)
+is off by default.


In Jordan's comment you linked above, Jordan suggests validating this. Perhaps there is an example in the code where this is done?

johnbelamaric · 2024-10-08T13:49:12Z

keps/sig-node/4381-dra-structured-parameters/README.md

+    // of logical steps. Validation against those limits happens only when
+    // setting an expression for the first time or when changing
+    // it. Therefore it is possible to change these limits without
+    // affecting stored expressions. Those remain valid.


keps/sig-node/4381-dra-structured-parameters/README.md

johnbelamaric · 2024-10-08T16:46:21Z

keps/sig-node/4381-dra-structured-parameters/README.md

 ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

 Yes. Applications that were already deployed and are running will continue to
 work, but they will stop working when containers get restarted because those
 restarted containers won't have the additional resources.

+DRAAdminAccess: No. Workloads which were deployed with admin access will continue


But if it's in a ResourceClaimTemplate, then Pod recreation will fail, right?

If the claims still need to be generated, then we have a different failure scenario: the template has management access, the claim won't.

This calls for yet another feature gate check, this time in the claim controller: if a template has management access set and the feature gate is off, refuse to generate the claim. Added.

So the answer here becomes "it depends". Updated.

Much of the PRR text that was originally written for "classic DRA" applies also to "structured parameters". It gets copied from kubernetes#3063 to kubernetes#4381, with some adaptions. The v1beta1 API will be almost identical to the v1alpha3 API, with just some minor tweaks to fix oversights. The kubelet gRPC gets bumped with no changes. Nonetheless, drivers should get updated, which can be done by updating the Go dependencies and optionally changing the API import.

alculquicondor · 2024-10-08T20:31:34Z

/cc

johnbelamaric · 2024-10-08T20:33:27Z

/approve

But we missed sig-scheduling, so leaving the LGTM to @alculquicondor

/assign @alculquicondor

k8s-ci-robot · 2024-10-08T20:33:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric, mrunalp, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [johnbelamaric]
~~keps/sig-node/OWNERS~~ [johnbelamaric,mrunalp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alculquicondor · 2024-10-08T20:36:14Z

keps/sig-node/4381-dra-structured-parameters/README.md

@@ -2518,6 +2648,20 @@ For Beta and GA, add links to added tests together with links to k8s-triage for
 https://storage.googleapis.com/k8s-triage/index.html
 -->

+The scheduler plugin and resource claim controller are covered by the workloads


Please update the coverage percentages for all involved packages (new and old)

IIRC, the coverage was pretty low for alpha.

Let's make it a goal for beta graduation to reach 80%

Several PRs have already gone in to increase coverage. I'll create a KEP update with the latest numbers that we can review before merging the beta promotion PR.

kubernetes/kubernetes#127949

thockin

/lgtm
/approve

thockin · 2024-10-08T21:31:06Z

keps/sig-node/4381-dra-structured-parameters/README.md

@@ -1234,8 +1258,22 @@ type DeviceAttribute struct {

 // DeviceAttributeMaxValueLength is the maximum length of a string or version attribute value.
 const DeviceAttributeMaxValueLength = 64
+
+// DeviceCapacity describes a quantity associated with a device.
+type DeviceCapacity struct {


I'm going to OK this, since I am late to the conversation, but I will be looking to understand this change more :)

thockin · 2024-10-08T21:57:19Z

Oops! Sorry, Aldo. @pohly will followup, I am confident.

alculquicondor · 2024-10-09T14:41:01Z

@pohly please update the coverage numbers in the KEP ASAP.

pohly · 2024-10-09T15:30:06Z

See #4914

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 24, 2024

k8s-ci-robot requested review from dchen1107 and derekwaynecarr September 24, 2024 11:48

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 24, 2024

k8s-ci-robot assigned johnbelamaric and klueska Sep 24, 2024

k8s-ci-robot requested a review from mrunalp September 24, 2024 11:49

pohly mentioned this pull request Sep 24, 2024

DRA: structured parameters #4381

Open

15 tasks

pohly force-pushed the dra-1.32 branch from 3b937b9 to 53317e5 Compare September 24, 2024 11:58

pohly commented Sep 24, 2024

View reviewed changes

kannon92 reviewed Sep 24, 2024

View reviewed changes

keps/sig-node/4381-dra-structured-parameters/README.md Outdated Show resolved Hide resolved

kannon92 reviewed Sep 24, 2024

View reviewed changes

keps/sig-node/4381-dra-structured-parameters/README.md Show resolved Hide resolved

kannon92 reviewed Sep 24, 2024

View reviewed changes

keps/sig-node/4381-dra-structured-parameters/README.md Outdated Show resolved Hide resolved

kannon92 reviewed Sep 24, 2024

View reviewed changes

keps/sig-node/4381-dra-structured-parameters/README.md Outdated Show resolved Hide resolved

kannon92 reviewed Sep 24, 2024

View reviewed changes

keps/sig-node/3063-dynamic-resource-allocation/README.md Outdated Show resolved Hide resolved

kannon92 reviewed Sep 24, 2024

View reviewed changes

keps/sig-node/4381-dra-structured-parameters/README.md Outdated Show resolved Hide resolved

kannon92 reviewed Sep 24, 2024

View reviewed changes

keps/sig-node/4381-dra-structured-parameters/README.md Show resolved Hide resolved

kannon92 reviewed Sep 24, 2024

View reviewed changes

keps/sig-node/4381-dra-structured-parameters/README.md Show resolved Hide resolved

kannon92 reviewed Sep 24, 2024

View reviewed changes

keps/sig-node/4381-dra-structured-parameters/README.md Outdated Show resolved Hide resolved

pohly commented Sep 24, 2024

View reviewed changes

pohly mentioned this pull request Sep 26, 2024

DRA resourceclaims: maintain metric of total and allocated claims kubernetes/kubernetes#127661

Open

pohly force-pushed the dra-1.32 branch from 7cfdde9 to 0ebef0c Compare September 30, 2024 16:24

pohly commented Sep 30, 2024

View reviewed changes

pohly commented Oct 1, 2024

View reviewed changes

johnbelamaric reviewed Oct 1, 2024

View reviewed changes

mrunalp reviewed Oct 2, 2024

View reviewed changes

mrunalp approved these changes Oct 4, 2024

View reviewed changes

johnbelamaric reviewed Oct 4, 2024

View reviewed changes

johnbelamaric requested changes Oct 4, 2024

View reviewed changes

logicalhan reviewed Oct 8, 2024

View reviewed changes

bart0sh reviewed Oct 8, 2024

View reviewed changes

johnbelamaric reviewed Oct 8, 2024

View reviewed changes

bart0sh mentioned this pull request Oct 8, 2024

Kubelet: add DRA latency metrics kubernetes/kubernetes#127146

Open

johnbelamaric reviewed Oct 8, 2024

View reviewed changes

pohly force-pushed the dra-1.32 branch from 012f943 to e64c565 Compare October 8, 2024 20:06

k8s-ci-robot requested a review from alculquicondor October 8, 2024 20:31

k8s-ci-robot assigned alculquicondor Oct 8, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 8, 2024

alculquicondor reviewed Oct 8, 2024

View reviewed changes

thockin reviewed Oct 8, 2024

View reviewed changes

k8s-ci-robot assigned thockin Oct 8, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 8, 2024

k8s-ci-robot merged commit 9a23114 into kubernetes:master Oct 8, 2024
4 checks passed

k8s-ci-robot added this to the v1.32 milestone Oct 8, 2024

pohly mentioned this pull request Oct 9, 2024

DRA beta: document the latest unit test coverage kubernetes/kubernetes#127949

Open


		Do we need to enforce such a policy by default? If so, with which API?

		<<[/UNRESOLVED]>>


		Workloads not using ResourceClaims should not be impacted because the new code


		###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

		Merely enabling the feature is not expected to increase resource usage much.

KEP-4381: DRA: PRR and API for beta of structured parameters in 1.32 #4869

KEP-4381: DRA: PRR and API for beta of structured parameters in 1.32 #4869

Conversation

pohly commented Sep 24, 2024

pohly commented Sep 24, 2024

pohly commented Sep 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly Oct 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly Oct 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aojea Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly commented Oct 4, 2024

pohly commented Oct 4, 2024

mrunalp commented Oct 4, 2024 • edited Loading

johnbelamaric left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bart0sh Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnbelamaric left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor commented Oct 8, 2024

johnbelamaric commented Oct 8, 2024

pohly Oct 2, 2024 •

edited

Loading

pohly Oct 2, 2024 •

edited

Loading

aojea Oct 9, 2024 •

edited

Loading

mrunalp commented Oct 4, 2024 •

edited

Loading

pohly Oct 7, 2024 •

edited

Loading

bart0sh Oct 8, 2024 •

edited

Loading