From ea571068c4ef05fbb0e58deb96c5214be5446f27 Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Tue, 24 Sep 2024 08:48:15 -0700
Subject: [PATCH 01/20] Initial draft of summary, motivation, goals, non-goals

---
 keps/prod-readiness/sig-node/4816.yaml        |   3 +
 .../4816-dra-prioritized-list/README.md       | 842 ++++++++++++++++++
 .../4816-dra-prioritized-list/kep.yaml        |  51 ++
 3 files changed, 896 insertions(+)
 create mode 100644 keps/prod-readiness/sig-node/4816.yaml
 create mode 100644 keps/sig-node/4816-dra-prioritized-list/README.md
 create mode 100644 keps/sig-node/4816-dra-prioritized-list/kep.yaml

diff --git a/keps/prod-readiness/sig-node/4816.yaml b/keps/prod-readiness/sig-node/4816.yaml
new file mode 100644
index 00000000000..f891036d62c
--- /dev/null
+++ b/keps/prod-readiness/sig-node/4816.yaml
@@ -0,0 +1,3 @@
+kep-number: 4816
+alpha:
+  approver: "@jpbetz"
diff --git a/keps/sig-node/4816-dra-prioritized-list/README.md b/keps/sig-node/4816-dra-prioritized-list/README.md
new file mode 100644
index 00000000000..088a07f9ff0
--- /dev/null
+++ b/keps/sig-node/4816-dra-prioritized-list/README.md
@@ -0,0 +1,842 @@
+<!--
+**Note:** When your KEP is complete, all of these comment blocks should be removed.
+
+To get started with this template:
+
+- [x] **Pick a hosting SIG.**
+  Make sure that the problem space is something the SIG is interested in taking
+  up. KEPs should not be checked in without a sponsoring SIG.
+- [x] **Create an issue in kubernetes/enhancements**
+  When filing an enhancement tracking issue, please make sure to complete all
+  fields in that template. One of the fields asks for a link to the KEP. You
+  can leave that blank until this KEP is filed, and then go back to the
+  enhancement and add the link.
+- [x] **Make a copy of this template directory.**
+  Copy this template into the owning SIG's directory and name it
+  `NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no
+  leading-zero padding) assigned to your enhancement above.
+- [x] **Fill out as much of the kep.yaml file as you can.**
+  At minimum, you should fill in the "Title", "Authors", "Owning-sig",
+  "Status", and date-related fields.
+- [ ] **Fill out this file as best you can.**
+  At minimum, you should fill in the "Summary" and "Motivation" sections.
+  These should be easy if you've preflighted the idea of the KEP with the
+  appropriate SIG(s).
+- [ ] **Create a PR for this KEP.**
+  Assign it to people in the SIG who are sponsoring this process.
+- [ ] **Merge early and iterate.**
+  Avoid getting hung up on specific details and instead aim to get the goals of
+  the KEP clarified and merged quickly. The best way to do this is to just
+  start with the high-level sections and fill out details incrementally in
+  subsequent PRs.
+
+Just because a KEP is merged does not mean it is complete or approved. Any KEP
+marked as `provisional` is a working document and subject to change. You can
+denote sections that are under active debate as follows:
+
+```
+<<[UNRESOLVED optional short context or usernames ]>>
+Stuff that is being argued.
+<<[/UNRESOLVED]>>
+```
+
+When editing KEPS, aim for tightly-scoped, single-topic PRs to keep discussions
+focused. If you disagree with what is already in a document, open a new PR
+with suggested changes.
+
+One KEP corresponds to one "feature" or "enhancement" for its whole lifecycle.
+You do not need a new KEP to move from beta to GA, for example. If
+new details emerge that belong in the KEP, edit the KEP. Once a feature has become
+"implemented", major changes should get new KEPs.
+
+The canonical place for the latest set of instructions (and the likely source
+of this file) is [here](/keps/NNNN-kep-template/README.md).
+
+**Note:** Any PRs to move a KEP to `implementable`, or significant changes once
+it is marked `implementable`, must be approved by each of the KEP approvers.
+If none of those approvers are still appropriate, then changes to that list
+should be approved by the remaining approvers and/or the owning SIG (or
+SIG Architecture for cross-cutting KEPs).
+-->
+# [KEP-4816](https://github.com/kubernetes/enhancements/issues/4816): DRA: Prioritized Alternatives in Device Requests
+
+<!--
+This is the title of your KEP. Keep it short, simple, and descriptive. A good
+title can help communicate what the KEP is and should be considered as part of
+any review.
+-->
+
+<!--
+A table of contents is helpful for quickly jumping to sections of a KEP and for
+highlighting any additional information provided beyond the standard KEP
+template.
+
+Ensure the TOC is wrapped with
+  <code>&lt;!-- toc --&rt;&lt;!-- /toc --&rt;</code>
+tags, and then generate with `hack/update-toc.sh`.
+-->
+
+<!-- toc -->
+- [Release Signoff Checklist](#release-signoff-checklist)
+- [Summary](#summary)
+- [Motivation](#motivation)
+  - [Goals](#goals)
+  - [Non-Goals](#non-goals)
+- [Proposal](#proposal)
+  - [User Stories (Optional)](#user-stories-optional)
+    - [Story 1](#story-1)
+    - [Story 2](#story-2)
+  - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
+  - [Risks and Mitigations](#risks-and-mitigations)
+- [Design Details](#design-details)
+  - [Test Plan](#test-plan)
+      - [Prerequisite testing updates](#prerequisite-testing-updates)
+      - [Unit tests](#unit-tests)
+      - [Integration tests](#integration-tests)
+      - [e2e tests](#e2e-tests)
+  - [Graduation Criteria](#graduation-criteria)
+  - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
+  - [Version Skew Strategy](#version-skew-strategy)
+- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
+  - [Feature Enablement and Rollback](#feature-enablement-and-rollback)
+  - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
+  - [Monitoring Requirements](#monitoring-requirements)
+  - [Dependencies](#dependencies)
+  - [Scalability](#scalability)
+  - [Troubleshooting](#troubleshooting)
+- [Implementation History](#implementation-history)
+- [Drawbacks](#drawbacks)
+- [Alternatives](#alternatives)
+- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
+<!-- /toc -->
+
+## Release Signoff Checklist
+
+<!--
+**ACTION REQUIRED:** In order to merge code into a release, there must be an
+issue in [kubernetes/enhancements] referencing this KEP and targeting a release
+milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases)
+of the targeted release**.
+
+For enhancements that make changes to code or processes/procedures in core
+Kubernetes—i.e., [kubernetes/kubernetes], we require the following Release
+Signoff checklist to be completed.
+
+Check these off as they are completed for the Release Team to track. These
+checklist items _must_ be updated for the enhancement to be released.
+-->
+
+Items marked with (R) are required *prior to targeting to a milestone / release*.
+
+- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
+- [ ] (R) KEP approvers have approved the KEP status as `implementable`
+- [ ] (R) Design details are appropriately documented
+- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
+  - [ ] e2e Tests for all Beta API Operations (endpoints)
+  - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) 
+  - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
+- [ ] (R) Graduation criteria is in place
+  - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) 
+- [ ] (R) Production readiness review completed
+- [ ] (R) Production readiness review approved
+- [ ] "Implementation History" section is up-to-date for milestone
+- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
+- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
+
+<!--
+**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
+-->
+
+[kubernetes.io]: https://kubernetes.io/
+[kubernetes/enhancements]: https://git.k8s.io/enhancements
+[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
+[kubernetes/website]: https://git.k8s.io/website
+
+## Summary
+
+<!--
+This section is incredibly important for producing high-quality, user-focused
+documentation such as release notes or a development roadmap. It should be
+possible to collect this information before implementation begins, in order to
+avoid requiring implementors to split their attention between writing release
+notes and implementing the feature itself. KEP editors and SIG Docs
+should help to ensure that the tone and content of the `Summary` section is
+useful for a wide audience.
+
+A good summary is probably at least a paragraph in length.
+
+Both in this section and below, follow the guidelines of the [documentation
+style guide]. In particular, wrap lines to a reasonable length, to make it
+easier for reviewers to cite specific portions, and to minimize diff churn on
+updates.
+
+[documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md
+-->
+
+
+The [DRA Structured
+Parameters](https://git.k8s.io/enhancements/keps/sig-node/4381-dra-structured-parameters)
+feature has added the ability to make requests for very specific types of
+devices using a `ResourceClaim`. However, the current API does not allow the
+user to indicate any priority when multiple types or configurations of devices
+may meet the needs of the workload. This feature allows the user to specify
+alternative requests that statisfy the workloads need, giving the scheduler more
+flexiblity in scheduling the workload.
+
+## Motivation
+
+<!--
+This section is for explicitly listing the motivation, goals, and non-goals of
+this KEP.  Describe why the change is important and the benefits to users. The
+motivation section can optionally provide links to [experience reports] to
+demonstrate the interest in a KEP within the wider Kubernetes community.
+
+[experience reports]: https://github.com/golang/go/wiki/ExperienceReports
+-->
+
+"Obtainability" of certain types of scarce resources is a primary concern of
+many AI/ML users. GPUs are in high demand, particularly the latest models. This
+means that workloads that use DRA to specify a need for particular types of GPUs
+may fail to schedule. In practice, a workload that needs a GPU can be written
+such that it can discover the GPUs available to it, and work with what it is
+given. A user may have a preference for the latest model, but would like to run
+the workload even if only an older model is available.
+
+Similarly, packaged workload authors may wish to configure a workload such that
+it will work well in the widest selection of available clusters. That is, a
+distributor of shared workload definitions would like to be able to specify
+alternative types of devices with which their workload will function, without
+requiring the user to modify the manifests.
+
+### Goals
+
+<!--
+List the specific goals of the KEP. What is it trying to achieve? How will we
+know that this has succeeded?
+-->
+
+* Allow workload authors, when specifying a `ResourceClaim`, to provide a list
+  of ways to satisfy the claim, with a preference ranking.
+* Enable the scheduler to evaluate those preferences and allocate devices for the
+  claim based on them.
+* Enable the cluster autoscaler to evaluate those preferences and make scaling
+  choices based on them.
+
+### Non-Goals
+
+* Enable cross-claim consistency of request choices. For example, guaranteeing
+  that all `ResourceClaim`s associated with a given `Deployment` are satisfied
+  using the same choice from the list of possible alternatives.
+
+<!--
+What is out of scope for this KEP? Listing non-goals helps to focus discussion
+and make progress.
+-->
+
+## Proposal
+
+<!--
+This is where we get down to the specifics of what the proposal actually is.
+This should have enough detail that reviewers can understand exactly what
+you're proposing, but should not include things like API designs or
+implementation. What is the desired outcome and how do we measure success?.
+The "Design Details" section below is for the real
+nitty-gritty.
+-->
+
+### User Stories (Optional)
+
+<!--
+Detail the things that people will be able to do if this KEP is implemented.
+Include as much detail as possible so that people can understand the "how" of
+the system. The goal here is to make this feel real for users without getting
+bogged down.
+-->
+
+#### Story 1
+
+#### Story 2
+
+### Notes/Constraints/Caveats (Optional)
+
+<!--
+What are the caveats to the proposal?
+What are some important details that didn't come across above?
+Go in to as much detail as necessary here.
+This might be a good place to talk about core concepts and how they relate.
+-->
+
+### Risks and Mitigations
+
+<!--
+What are the risks of this proposal, and how do we mitigate? Think broadly.
+For example, consider both security and how this will impact the larger
+Kubernetes ecosystem.
+
+How will security be reviewed, and by whom?
+
+How will UX be reviewed, and by whom?
+
+Consider including folks who also work outside the SIG or subproject.
+-->
+
+## Design Details
+
+<!--
+This section should contain enough information that the specifics of your
+change are understandable. This may include API specs (though not always
+required) or even code snippets. If there's any ambiguity about HOW your
+proposal will be implemented, this is the place to discuss them.
+-->
+
+### Test Plan
+
+<!--
+**Note:** *Not required until targeted at a release.*
+The goal is to ensure that we don't accept enhancements with inadequate testing.
+
+All code is expected to have adequate tests (eventually with coverage
+expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines]
+when drafting this test plan.
+
+[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
+-->
+
+[ ] I/we understand the owners of the involved components may require updates to
+existing tests to make this code solid enough prior to committing the changes necessary
+to implement this enhancement.
+
+##### Prerequisite testing updates
+
+<!--
+Based on reviewers feedback describe what additional tests need to be added prior
+implementing this enhancement to ensure the enhancements have also solid foundations.
+-->
+
+##### Unit tests
+
+<!--
+In principle every added code should have complete unit test coverage, so providing
+the exact set of tests will not bring additional value.
+However, if complete unit test coverage is not possible, explain the reason of it
+together with explanation why this is acceptable.
+-->
+
+<!--
+Additionally, for Alpha try to enumerate the core package you will be touching
+to implement this enhancement and provide the current unit coverage for those
+in the form of:
+- <package>: <date> - <current test coverage>
+The data can be easily read from:
+https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit
+
+This can inform certain test coverage improvements that we want to do before
+extending the production code to implement this enhancement.
+-->
+
+- `<package>`: `<date>` - `<test coverage>`
+
+##### Integration tests
+
+<!--
+Integration tests are contained in k8s.io/kubernetes/test/integration.
+Integration tests allow control of the configuration parameters used to start the binaries under test.
+This is different from e2e tests which do not allow configuration of parameters.
+Doing this allows testing non-default options and multiple different and potentially conflicting command line options.
+-->
+
+<!--
+This question should be filled when targeting a release.
+For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
+
+For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
+https://storage.googleapis.com/k8s-triage/index.html
+-->
+
+- <test>: <link to test coverage>
+
+##### e2e tests
+
+<!--
+This question should be filled when targeting a release.
+For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
+
+For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
+https://storage.googleapis.com/k8s-triage/index.html
+
+We expect no non-infra related flakes in the last month as a GA graduation criteria.
+-->
+
+- <test>: <link to test coverage>
+
+### Graduation Criteria
+
+<!--
+**Note:** *Not required until targeted at a release.*
+
+Define graduation milestones.
+
+These may be defined in terms of API maturity, [feature gate] graduations, or as
+something else. The KEP should keep this high-level with a focus on what
+signals will be looked at to determine graduation.
+
+Consider the following in developing the graduation criteria for this enhancement:
+- [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels]
+- [Feature gate][feature gate] lifecycle
+- [Deprecation policy][deprecation-policy]
+
+Clearly define what graduation means by either linking to the [API doc
+definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning)
+or by redefining what graduation means.
+
+In general we try to use the same stages (alpha, beta, GA), regardless of how the
+functionality is accessed.
+
+[feature gate]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md
+[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions
+[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/
+
+Below are some examples to consider, in addition to the aforementioned [maturity levels][maturity-levels].
+
+#### Alpha
+
+- Feature implemented behind a feature flag
+- Initial e2e tests completed and enabled
+
+#### Beta
+
+- Gather feedback from developers and surveys
+- Complete features A, B, C
+- Additional tests are in Testgrid and linked in KEP
+
+#### GA
+
+- N examples of real-world usage
+- N installs
+- More rigorous forms of testing—e.g., downgrade tests and scalability tests
+- Allowing time for feedback
+
+**Note:** Generally we also wait at least two releases between beta and
+GA/stable, because there's no opportunity for user feedback, or even bug reports,
+in back-to-back releases.
+
+**For non-optional features moving to GA, the graduation criteria must include
+[conformance tests].**
+
+[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
+
+#### Deprecation
+
+- Announce deprecation and support policy of the existing flag
+- Two versions passed since introducing the functionality that deprecates the flag (to address version skew)
+- Address feedback on usage/changed behavior, provided on GitHub issues
+- Deprecate the flag
+-->
+
+### Upgrade / Downgrade Strategy
+
+<!--
+If applicable, how will the component be upgraded and downgraded? Make sure
+this is in the test plan.
+
+Consider the following in developing an upgrade/downgrade strategy for this
+enhancement:
+- What changes (in invocations, configurations, API use, etc.) is an existing
+  cluster required to make on upgrade, in order to maintain previous behavior?
+- What changes (in invocations, configurations, API use, etc.) is an existing
+  cluster required to make on upgrade, in order to make use of the enhancement?
+-->
+
+### Version Skew Strategy
+
+<!--
+If applicable, how will the component handle version skew with other
+components? What are the guarantees? Make sure this is in the test plan.
+
+Consider the following in developing a version skew strategy for this
+enhancement:
+- Does this enhancement involve coordinating behavior in the control plane and nodes?
+- How does an n-3 kubelet or kube-proxy without this feature available behave when this feature is used?
+- How does an n-1 kube-controller-manager or kube-scheduler without this feature available behave when this feature is used?
+- Will any other components on the node change? For example, changes to CSI,
+  CRI or CNI may require updating that component before the kubelet.
+-->
+
+## Production Readiness Review Questionnaire
+
+<!--
+
+Production readiness reviews are intended to ensure that features merging into
+Kubernetes are observable, scalable and supportable; can be safely operated in
+production environments, and can be disabled or rolled back in the event they
+cause increased failures in production. See more in the PRR KEP at
+https://git.k8s.io/enhancements/keps/sig-architecture/1194-prod-readiness.
+
+The production readiness review questionnaire must be completed and approved
+for the KEP to move to `implementable` status and be included in the release.
+
+In some cases, the questions below should also have answers in `kep.yaml`. This
+is to enable automation to verify the presence of the review, and to reduce review
+burden and latency.
+
+The KEP must have a approver from the
+[`prod-readiness-approvers`](http://git.k8s.io/enhancements/OWNERS_ALIASES)
+team. Please reach out on the
+[#prod-readiness](https://kubernetes.slack.com/archives/CPNHUMN74) channel if
+you need any help or guidance.
+-->
+
+### Feature Enablement and Rollback
+
+<!--
+This section must be completed when targeting alpha to a release.
+-->
+
+###### How can this feature be enabled / disabled in a live cluster?
+
+<!--
+Pick one of these and delete the rest.
+
+Documentation is available on [feature gate lifecycle] and expectations, as
+well as the [existing list] of feature gates.
+
+[feature gate lifecycle]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md
+[existing list]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
+-->
+
+- [ ] Feature gate (also fill in values in `kep.yaml`)
+  - Feature gate name:
+  - Components depending on the feature gate:
+- [ ] Other
+  - Describe the mechanism:
+  - Will enabling / disabling the feature require downtime of the control
+    plane?
+  - Will enabling / disabling the feature require downtime or reprovisioning
+    of a node?
+
+###### Does enabling the feature change any default behavior?
+
+<!--
+Any change of default behavior may be surprising to users or break existing
+automations, so be extremely careful here.
+-->
+
+###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
+
+<!--
+Describe the consequences on existing workloads (e.g., if this is a runtime
+feature, can it break the existing applications?).
+
+Feature gates are typically disabled by setting the flag to `false` and
+restarting the component. No other changes should be necessary to disable the
+feature.
+
+NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
+-->
+
+###### What happens if we reenable the feature if it was previously rolled back?
+
+###### Are there any tests for feature enablement/disablement?
+
+<!--
+The e2e framework does not currently support enabling or disabling feature
+gates. However, unit tests in each component dealing with managing data, created
+with and without the feature, are necessary. At the very least, think about
+conversion tests if API types are being modified.
+
+Additionally, for features that are introducing a new API field, unit tests that
+are exercising the `switch` of feature gate itself (what happens if I disable a
+feature gate after having objects written with the new field) are also critical.
+You can take a look at one potential example of such test in:
+https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
+-->
+
+### Rollout, Upgrade and Rollback Planning
+
+<!--
+This section must be completed when targeting beta to a release.
+-->
+
+###### How can a rollout or rollback fail? Can it impact already running workloads?
+
+<!--
+Try to be as paranoid as possible - e.g., what if some components will restart
+mid-rollout?
+
+Be sure to consider highly-available clusters, where, for example,
+feature flags will be enabled on some API servers and not others during the
+rollout. Similarly, consider large clusters and how enablement/disablement
+will rollout across nodes.
+-->
+
+###### What specific metrics should inform a rollback?
+
+<!--
+What signals should users be paying attention to when the feature is young
+that might indicate a serious problem?
+-->
+
+###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
+
+<!--
+Describe manual testing that was done and the outcomes.
+Longer term, we may want to require automated upgrade/rollback tests, but we
+are missing a bunch of machinery and tooling and can't do that now.
+-->
+
+###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
+
+<!--
+Even if applying deprecation policies, they may still surprise some users.
+-->
+
+### Monitoring Requirements
+
+<!--
+This section must be completed when targeting beta to a release.
+
+For GA, this section is required: approvers should be able to confirm the
+previous answers based on experience in the field.
+-->
+
+###### How can an operator determine if the feature is in use by workloads?
+
+<!--
+Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
+checking if there are objects with field X set) may be a last resort. Avoid
+logs or events for this purpose.
+-->
+
+###### How can someone using this feature know that it is working for their instance?
+
+<!--
+For instance, if this is a pod-related feature, it should be possible to determine if the feature is functioning properly
+for each individual pod.
+Pick one more of these and delete the rest.
+Please describe all items visible to end users below with sufficient detail so that they can verify correct enablement
+and operation of this feature.
+Recall that end users cannot usually observe component logs or access metrics.
+-->
+
+- [ ] Events
+  - Event Reason: 
+- [ ] API .status
+  - Condition name: 
+  - Other field: 
+- [ ] Other (treat as last resort)
+  - Details:
+
+###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
+
+<!--
+This is your opportunity to define what "normal" quality of service looks like
+for a feature.
+
+It's impossible to provide comprehensive guidance, but at the very
+high level (needs more precise definitions) those may be things like:
+  - per-day percentage of API calls finishing with 5XX errors <= 1%
+  - 99% percentile over day of absolute value from (job creation time minus expected
+    job creation time) for cron job <= 10%
+  - 99.9% of /health requests per day finish with 200 code
+
+These goals will help you determine what you need to measure (SLIs) in the next
+question.
+-->
+
+###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
+
+<!--
+Pick one more of these and delete the rest.
+-->
+
+- [ ] Metrics
+  - Metric name:
+  - [Optional] Aggregation method:
+  - Components exposing the metric:
+- [ ] Other (treat as last resort)
+  - Details:
+
+###### Are there any missing metrics that would be useful to have to improve observability of this feature?
+
+<!--
+Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
+implementation difficulties, etc.).
+-->
+
+### Dependencies
+
+<!--
+This section must be completed when targeting beta to a release.
+-->
+
+###### Does this feature depend on any specific services running in the cluster?
+
+<!--
+Think about both cluster-level services (e.g. metrics-server) as well
+as node-level agents (e.g. specific version of CRI). Focus on external or
+optional services that are needed. For example, if this feature depends on
+a cloud provider API, or upon an external software-defined storage or network
+control plane.
+
+For each of these, fill in the following—thinking about running existing user workloads
+and creating new ones, as well as about cluster-level services (e.g. DNS):
+  - [Dependency name]
+    - Usage description:
+      - Impact of its outage on the feature:
+      - Impact of its degraded performance or high-error rates on the feature:
+-->
+
+### Scalability
+
+<!--
+For alpha, this section is encouraged: reviewers should consider these questions
+and attempt to answer them.
+
+For beta, this section is required: reviewers must answer these questions.
+
+For GA, this section is required: approvers should be able to confirm the
+previous answers based on experience in the field.
+-->
+
+###### Will enabling / using this feature result in any new API calls?
+
+<!--
+Describe them, providing:
+  - API call type (e.g. PATCH pods)
+  - estimated throughput
+  - originating component(s) (e.g. Kubelet, Feature-X-controller)
+Focusing mostly on:
+  - components listing and/or watching resources they didn't before
+  - API calls that may be triggered by changes of some Kubernetes resources
+    (e.g. update of object X triggers new updates of object Y)
+  - periodic API calls to reconcile state (e.g. periodic fetching state,
+    heartbeats, leader election, etc.)
+-->
+
+###### Will enabling / using this feature result in introducing new API types?
+
+<!--
+Describe them, providing:
+  - API type
+  - Supported number of objects per cluster
+  - Supported number of objects per namespace (for namespace-scoped objects)
+-->
+
+###### Will enabling / using this feature result in any new calls to the cloud provider?
+
+<!--
+Describe them, providing:
+  - Which API(s):
+  - Estimated increase:
+-->
+
+###### Will enabling / using this feature result in increasing size or count of the existing API objects?
+
+<!--
+Describe them, providing:
+  - API type(s):
+  - Estimated increase in size: (e.g., new annotation of size 32B)
+  - Estimated amount of new objects: (e.g., new Object X for every existing Pod)
+-->
+
+###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
+
+<!--
+Look at the [existing SLIs/SLOs].
+
+Think about adding additional work or introducing new steps in between
+(e.g. need to do X to start a container), etc. Please describe the details.
+
+[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
+-->
+
+###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
+
+<!--
+Things to keep in mind include: additional in-memory state, additional
+non-trivial computations, excessive access to disks (including increased log
+volume), significant amount of data sent and/or received over network, etc.
+This through this both in small and large cases, again with respect to the
+[supported limits].
+
+[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
+-->
+
+###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
+
+<!--
+Focus not just on happy cases, but primarily on more pathological cases
+(e.g. probes taking a minute instead of milliseconds, failed pods consuming resources, etc.).
+If any of the resources can be exhausted, how this is mitigated with the existing limits
+(e.g. pods per node) or new limits added by this KEP?
+
+Are there any tests that were run/should be run to understand performance characteristics better
+and validate the declared limits?
+-->
+
+### Troubleshooting
+
+<!--
+This section must be completed when targeting beta to a release.
+
+For GA, this section is required: approvers should be able to confirm the
+previous answers based on experience in the field.
+
+The Troubleshooting section currently serves the `Playbook` role. We may consider
+splitting it into a dedicated `Playbook` document (potentially with some monitoring
+details). For now, we leave it here.
+-->
+
+###### How does this feature react if the API server and/or etcd is unavailable?
+
+###### What are other known failure modes?
+
+<!--
+For each of them, fill in the following information by copying the below template:
+  - [Failure mode brief description]
+    - Detection: How can it be detected via metrics? Stated another way:
+      how can an operator troubleshoot without logging into a master or worker node?
+    - Mitigations: What can be done to stop the bleeding, especially for already
+      running user workloads?
+    - Diagnostics: What are the useful log messages and their required logging
+      levels that could help debug the issue?
+      Not required until feature graduated to beta.
+    - Testing: Are there any tests for failure mode? If not, describe why.
+-->
+
+###### What steps should be taken if SLOs are not being met to determine the problem?
+
+## Implementation History
+
+<!--
+Major milestones in the lifecycle of a KEP should be tracked in this section.
+Major milestones might include:
+- the `Summary` and `Motivation` sections being merged, signaling SIG acceptance
+- the `Proposal` section being merged, signaling agreement on a proposed design
+- the date implementation started
+- the first Kubernetes release where an initial version of the KEP was available
+- the version of Kubernetes where the KEP graduated to general availability
+- when the KEP was retired or superseded
+-->
+
+## Drawbacks
+
+<!--
+Why should this KEP _not_ be implemented?
+-->
+
+## Alternatives
+
+<!--
+What other approaches did you consider, and why did you rule them out? These do
+not need to be as detailed as the proposal, but should include enough
+information to express the idea and why it was not acceptable.
+-->
+
+## Infrastructure Needed (Optional)
+
+<!--
+Use this section if you need things from the project/SIG. Examples include a
+new subproject, repos requested, or GitHub details. Listing these here allows a
+SIG to get the process for these resources started right away.
+-->
diff --git a/keps/sig-node/4816-dra-prioritized-list/kep.yaml b/keps/sig-node/4816-dra-prioritized-list/kep.yaml
new file mode 100644
index 00000000000..cead6a0c478
--- /dev/null
+++ b/keps/sig-node/4816-dra-prioritized-list/kep.yaml
@@ -0,0 +1,51 @@
+title: DRA Prioritized List
+kep-number: 4816
+authors:
+  - "@johnbelamaric"
+owning-sig: sig-node
+participating-sigs:
+  - sig-scheduling
+  - sig-autoscaling
+status: provisional
+creation-date: 2024-09-24
+reviewers:
+  - "@pohly"
+  - "@klueska"
+  - "@thockin"
+approvers:
+  - "@mrunalp" # SIG-Node
+  - "@alculquicondor" # SIG-Scheduling
+  - "@MaciekPytel" # SIG-Autoscaling
+  - "@thockin" # API Review
+
+see-also:
+  - "/keps/sig-node/4381-dra-structured-parameters"
+
+# The target maturity stage in the current dev cycle for this KEP.
+stage: alpha
+
+# The most recent milestone for which work toward delivery of this KEP has been
+# done. This can be the current (upcoming) milestone, if it is being actively
+# worked on.
+latest-milestone: "v1.32"
+
+# The milestone at which this feature was, or is targeted to be, at each stage.
+milestone:
+  alpha: "v1.32"
+  beta: "v1.33"
+  stable: "v1.34"
+
+# The following PRR answers are required at alpha release
+# List the feature gate name and the components for which it must be enabled
+feature-gates:
+  - name: DRAPrioritizedList
+    components:
+      - kube-apiserver
+      - kube-controller-manager
+      - kube-scheduler
+      - kubelet
+disable-supported: true
+
+# The following PRR answers are required at beta release
+metrics:
+  #- my_feature_metric

From 76a1f298b7943c0d41b2a43cd3deedc0909165b1 Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Thu, 26 Sep 2024 11:08:01 -0700
Subject: [PATCH 02/20] Review feedback, possible API

---
 .../4816-dra-prioritized-list/README.md       | 109 +++++++++++++++++-
 1 file changed, 103 insertions(+), 6 deletions(-)

diff --git a/keps/sig-node/4816-dra-prioritized-list/README.md b/keps/sig-node/4816-dra-prioritized-list/README.md
index 088a07f9ff0..02d14c72171 100644
--- a/keps/sig-node/4816-dra-prioritized-list/README.md
+++ b/keps/sig-node/4816-dra-prioritized-list/README.md
@@ -107,6 +107,7 @@ tags, and then generate with `hack/update-toc.sh`.
 - [Implementation History](#implementation-history)
 - [Drawbacks](#drawbacks)
 - [Alternatives](#alternatives)
+  - [Resource Claim Indirection](#resource-claim-indirection)
 - [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
 <!-- /toc -->
 
@@ -217,22 +218,24 @@ know that this has succeeded?
 
 * Allow workload authors, when specifying a `ResourceClaim`, to provide a list
   of ways to satisfy the claim, with a preference ranking.
-* Enable the scheduler to evaluate those preferences and allocate devices for the
+* Enable schedulers to evaluate those preferences and allocate devices for the
   claim based on them.
-* Enable the cluster autoscaler to evaluate those preferences and make scaling
+* Enable cluster autoscalers to evaluate those preferences and make scaling
   choices based on them.
+* Provide some measure of ResourceQuota controls when users utilize claims with
+  these types of requests.
 
 ### Non-Goals
 
-* Enable cross-claim consistency of request choices. For example, guaranteeing
-  that all `ResourceClaim`s associated with a given `Deployment` are satisfied
-  using the same choice from the list of possible alternatives.
-
 <!--
 What is out of scope for this KEP? Listing non-goals helps to focus discussion
 and make progress.
 -->
 
+* Enable cross-claim consistency of request choices. For example, guaranteeing
+  that all `ResourceClaim`s associated with a given `Deployment` are satisfied
+  using the same choice from the list of possible alternatives.
+
 ## Proposal
 
 <!--
@@ -244,6 +247,95 @@ The "Design Details" section below is for the real
 nitty-gritty.
 -->
 
+The proposal adds a new type, called `RankedDeviceRequest`, which allows the
+user to list `DeviceRequest`s, exactly one of which must be satisfied. The
+`DeviceClaim` then gets a new field listing all of these such requests that must
+be satisfied. There is no change to the existing `DeviceRequest` type.
+
+```go
+// DeviceClaim defines how to request devices with a ResourceClaim.
+type DeviceClaim struct {
+	// Requests represent individual requests for distinct devices which
+	// must all be satisfied. If empty, nothing needs to be allocated.
+	//
+	// +optional
+	// +listType=atomic
+	Requests []DeviceRequest
+
+	// RankedRequests represents groups of requests, where exactly one
+	// request in each group must be satisfied.
+	//
+	// +optional
+	// +listType=atomic
+	RankedRequests []RankedDeviceRequest
+
+	// These constraints must be satisfied by the set of devices that get
+	// allocated for the claim.
+	//
+	// +optional
+	// +listType=atomic
+	Constraints []DeviceConstraint
+
+	// This field holds configuration for multiple potential drivers which
+	// could satisfy requests in this claim. It is ignored while allocating
+	// the claim.
+	//
+	// +optional
+	// +listType=atomic
+	Config []DeviceClaimConfiguration
+
+	// Potential future extension, ignored by older schedulers. This is
+	// fine because scoring allows users to define a preference, without
+	// making it a hard requirement.
+	//
+	// Score *SomeScoringStruct
+}
+
+const (
+	DeviceRequestsMaxSize    = AllocationResultsMaxSize
+	DeviceConstraintsMaxSize = 32
+	DeviceConfigMaxSize      = 32
+)
+
+// RankedDeviceRequest is a list of DeviceRequests, in the user's order of
+// preference for allocation.
+//
+type RankedDeviceRequest struct {
+	// Name can be used to reference this request in a pod.spec.containers[].resources.claims
+	// entry, or in Constraints or Config.
+	//
+	// In the container spec, this is the name that must be used, rather
+	// the names of the underlying requests.
+	//
+	// In the Contraints or Config, this name may be used, or the underlying request
+	// names may be used to provide additional specificity.
+	//
+	// Must be a DNS label.
+	//
+	// +required
+	Name string
+
+
+	// Requests represent individual requests for distinct devices, exactly
+	// one of which must be satisfied. If empty, nothing needs to be allocated.
+	//
+	// +optional
+	// +listType=atomic
+	Requests []DeviceRequest
+}
+
+const (
+	RankedDeviceRequestsMaxSize    = 8
+)
+```
+
+ResourceQuota will be enforced such that the user must have quota for each
+`DeviceRequest` under every `RankedDeviceRequest`. Thus, this "pick one"
+behavior cannot be used to circumvent quota. This reduces the usefuleness of the
+feature, as it no longer services as a quota-management feature. However, the
+primary goal of the feature is about flexibility across clusters and
+obtainability of underlying devices, not quota management.
+
 ### User Stories (Optional)
 
 <!--
@@ -833,6 +925,11 @@ not need to be as detailed as the proposal, but should include enough
 information to express the idea and why it was not acceptable.
 -->
 
+### Resource Claim Indirection
+
+Rather than embedding a list of alternative request objects, we could use an
+umbrella `ResourceClaim` that instead references other `ResourceClaim`s.
+
 ## Infrastructure Needed (Optional)
 
 <!--

From 809b90c0c6336b25a0db5226819686269b358b42 Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Thu, 26 Sep 2024 12:28:48 -0700
Subject: [PATCH 03/20] Add an example and more explanation

---
 .../4816-dra-prioritized-list/README.md       | 85 ++++++++++++++++++-
 1 file changed, 84 insertions(+), 1 deletion(-)

diff --git a/keps/sig-node/4816-dra-prioritized-list/README.md b/keps/sig-node/4816-dra-prioritized-list/README.md
index 02d14c72171..f3b0695c575 100644
--- a/keps/sig-node/4816-dra-prioritized-list/README.md
+++ b/keps/sig-node/4816-dra-prioritized-list/README.md
@@ -263,7 +263,9 @@ type DeviceClaim struct {
 	Requests []DeviceRequest
 
 	// RankedRequests represents groups of requests, where exactly one
-	// request in each group must be satisfied.
+	// request in each group must be satisfied. All entries in this list
+    // must be satisfied, using exactly one of the DeviceRequests listed
+    // in each RankedDeviceRequest.
 	//
 	// +optional
 	// +listType=atomic
@@ -329,6 +331,87 @@ const (
 )
 ```
 
+Let's take a look at an example.
+
+```yaml
+apiVersion: resource.k8s.io/v1alpha4
+kind: ResourceClaim
+metadata:
+  name: device-consumer-claim
+spec:
+  devices:
+    requests:
+    - name: nic
+      deviceClassName: rdma-nic
+    rankedRequests:
+    - name: gpu
+      requests:
+      - name: big-gpu
+    deviceClassName: big-gpu
+      - name: mid-gpu
+    deviceClassName: mid-gpu
+      - name: small-gpu
+        deviceClassName: small-gpu
+        count: 2
+    constraints:
+    - requests: ["nic", gpu"]
+      matchAttribute:
+      - dra.k8s.io/pcieRoot
+    config:
+    - requests: ["small-gpu"]
+      opaque:
+        driver: gpu.acme.example.com
+        parameters:
+          apiVersion: gpu.acme.example.com/v1
+          kind: GPUConfig
+          mode: multipleGPUs
+---
+apiVersion: v1
+kind: Pod
+metadata:
+  name: device-consumer
+spec:
+  resourceClaims:
+  - name: "gpu-and-nic"
+    resourceClaimName: device-consumer-claim
+  containers:
+  - name: workload
+    image: my-app
+    command: ["/bin/program"]
+    resources:
+      requests:
+        memory: "64Mi"
+        cpu: "250m"
+      limits:
+        memory: "128Mi"
+        cpu: "500m"
+      claims:
+      - name: "gpu-and-nic"
+        request: "gpu" # the 'nic' request is pod-level, no need to attach to container
+```
+
+There are a few things to note here. First, the "nic" request is listed in
+`requests`, because it has no alternative request types. The "gpu" request could
+be met by serveral different types of GPU, in an order of preference. Each of
+those is a separate `DeviceRequest`, and thus also has its own name. This allow
+us to apply constraints or configuration to specific, individual requests, in
+the event even that it is the chosen alternative. In this example, the
+"small-gpu" choice requires a configuration option that the other two choices do
+not need. Thus, if the resolution of the "gpu" request is made using the
+"small-gpu" subrequest, then that configuration will be attached to the
+allocation. Otherwise, it will not.
+
+Similarly, for `Constraints`, the list of requests can include the ranked
+request name ("gpu" in this case), in which case the constraint applies
+regardless of which alternative is chosen. Or, it can include the subrequest
+name, in which case that constraint only applies if that particular subrequest
+is chosen.
+
+In the PodSpec, however, the subrequest names are not valid. Only the main
+request name may be used.
+
+### Resource Quota
+
 ResourceQuota will be enforced such that the user must have quota for each
 `DeviceRequest` under every `RankedDeviceRequest`. Thus, this "pick one"
 behavior cannot be used to circumvent quota. This reduces the usefuleness of the

From e874051e7305efdafc670190163b914db2a8c28a Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Thu, 26 Sep 2024 12:37:00 -0700
Subject: [PATCH 04/20] Fix some broken indentation

---
 .../4816-dra-prioritized-list/README.md       | 122 +++++++++---------
 1 file changed, 61 insertions(+), 61 deletions(-)

diff --git a/keps/sig-node/4816-dra-prioritized-list/README.md b/keps/sig-node/4816-dra-prioritized-list/README.md
index f3b0695c575..73a386f825e 100644
--- a/keps/sig-node/4816-dra-prioritized-list/README.md
+++ b/keps/sig-node/4816-dra-prioritized-list/README.md
@@ -255,79 +255,79 @@ be satisfied. There is no change to the existing `DeviceRequest` type.
 ```go
 // DeviceClaim defines how to request devices with a ResourceClaim.
 type DeviceClaim struct {
-	// Requests represent individual requests for distinct devices which
-	// must all be satisfied. If empty, nothing needs to be allocated.
-	//
-	// +optional
-	// +listType=atomic
-	Requests []DeviceRequest
-
-	// RankedRequests represents groups of requests, where exactly one
-	// request in each group must be satisfied. All entries in this list
+    // Requests represent individual requests for distinct devices which
+    // must all be satisfied. If empty, nothing needs to be allocated.
+    //
+    // +optional
+    // +listType=atomic
+    Requests []DeviceRequest
+
+    // RankedRequests represents groups of requests, where exactly one
+    // request in each group must be satisfied. All entries in this list
     // must be satisfied, using exactly one of the DeviceRequests listed
     // in each RankedDeviceRequest.
-	//
-	// +optional
-	// +listType=atomic
-	RankedRequests []RankedDeviceRequest
-
-	// These constraints must be satisfied by the set of devices that get
-	// allocated for the claim.
-	//
-	// +optional
-	// +listType=atomic
-	Constraints []DeviceConstraint
-
-	// This field holds configuration for multiple potential drivers which
-	// could satisfy requests in this claim. It is ignored while allocating
-	// the claim.
-	//
-	// +optional
-	// +listType=atomic
-	Config []DeviceClaimConfiguration
-
-	// Potential future extension, ignored by older schedulers. This is
-	// fine because scoring allows users to define a preference, without
-	// making it a hard requirement.
-	//
-	// Score *SomeScoringStruct
+    //
+    // +optional
+    // +listType=atomic
+    RankedRequests []RankedDeviceRequest
+
+    // These constraints must be satisfied by the set of devices that get
+    // allocated for the claim.
+    //
+    // +optional
+    // +listType=atomic
+    Constraints []DeviceConstraint
+
+    // This field holds configuration for multiple potential drivers which
+    // could satisfy requests in this claim. It is ignored while allocating
+    // the claim.
+    //
+    // +optional
+    // +listType=atomic
+    Config []DeviceClaimConfiguration
+
+    // Potential future extension, ignored by older schedulers. This is
+    // fine because scoring allows users to define a preference, without
+    // making it a hard requirement.
+    //
+    // Score *SomeScoringStruct
 }
 
 const (
-	DeviceRequestsMaxSize    = AllocationResultsMaxSize
-	DeviceConstraintsMaxSize = 32
-	DeviceConfigMaxSize      = 32
+    DeviceRequestsMaxSize    = AllocationResultsMaxSize
+    DeviceConstraintsMaxSize = 32
+    DeviceConfigMaxSize      = 32
 )
 
 // RankedDeviceRequest is a list of DeviceRequests, in the user's order of
 // preference for allocation.
 //
 type RankedDeviceRequest struct {
-	// Name can be used to reference this request in a pod.spec.containers[].resources.claims
-	// entry, or in Constraints or Config.
-	//
-	// In the container spec, this is the name that must be used, rather
-	// the names of the underlying requests.
-	//
-	// In the Contraints or Config, this name may be used, or the underlying request
-	// names may be used to provide additional specificity.
-	//
-	// Must be a DNS label.
-	//
-	// +required
-	Name string
-
-
-	// Requests represent individual requests for distinct devices, exactly
-	// one of which must be satisfied. If empty, nothing needs to be allocated.
-	//
-	// +optional
-	// +listType=atomic
-	Requests []DeviceRequest
+    // Name can be used to reference this request in a pod.spec.containers[].resources.claims
+    // entry, or in Constraints or Config.
+    //
+    // In the pod spec, this is the name that must be used, rather
+    // the names of the underlying requests.
+    //
+    // In the Contraints or Config, this name may be used, or the underlying request
+    // names may be used to provide additional specificity.
+    //
+    // Must be a DNS label.
+    //
+    // +required
+    Name string
+
+
+    // Requests represent individual requests for distinct devices, exactly
+    // one of which must be satisfied. If empty, nothing needs to be allocated.
+    //
+    // +optional
+    // +listType=atomic
+    Requests []DeviceRequest
 }
 
 const (
-	RankedDeviceRequestsMaxSize    = 8
+    RankedDeviceRequestsMaxSize    = 8
 )
 ```
 
@@ -347,9 +347,9 @@ spec:
     - name: gpu
       requests:
       - name: big-gpu
-    deviceClassName: big-gpu
+        deviceClassName: big-gpu
       - name: mid-gpu
-    deviceClassName: mid-gpu
+        deviceClassName: mid-gpu
       - name: small-gpu
         deviceClassName: small-gpu
         count: 2

From 4011f94fa8597cb79da8bb3864ec3b6c86cc547e Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Fri, 27 Sep 2024 11:45:12 -0700
Subject: [PATCH 05/20] Review feedback

---
 .../4816-dra-prioritized-list/README.md       | 184 ++++++++++--------
 1 file changed, 105 insertions(+), 79 deletions(-)

diff --git a/keps/sig-node/4816-dra-prioritized-list/README.md b/keps/sig-node/4816-dra-prioritized-list/README.md
index 73a386f825e..3c27d1fe016 100644
--- a/keps/sig-node/4816-dra-prioritized-list/README.md
+++ b/keps/sig-node/4816-dra-prioritized-list/README.md
@@ -247,87 +247,114 @@ The "Design Details" section below is for the real
 nitty-gritty.
 -->
 
-The proposal adds a new type, called `RankedDeviceRequest`, which allows the
-user to list `DeviceRequest`s, exactly one of which must be satisfied. The
-`DeviceClaim` then gets a new field listing all of these such requests that must
-be satisfied. There is no change to the existing `DeviceRequest` type.
+The proposal adds a new field to the `DeviceRequest`, called `FirstOf` which
+will contain an ordered list of `DeviceRequest` objects. In order to satisfy the
+main (containing) request, exactly one of the requests listed in `FirstOf` must
+be satisfied. They order listed is considered a priority order, such that the
+scheduler will only try to use the second item in the list if it is unable to
+satsify the first item, and so on.
+
+A `DeviceRequest` that populates the `FirstOf` field must *not* populate the
+`DeviceClassName` field. The `required` validation on this field will be
+relaxed. This allows existing clients to differentiate between claims they
+understand (with `DeviceClassName`) and those they do not (without
+`DeviceClassName` but with the new field). Clients written for 1.31, when
+`DeviceClassName` was required, were requested to include this logic, and the
+in-tree components have been built in this way.
 
 ```go
-// DeviceClaim defines how to request devices with a ResourceClaim.
-type DeviceClaim struct {
-    // Requests represent individual requests for distinct devices which
-    // must all be satisfied. If empty, nothing needs to be allocated.
+// DeviceRequest is a request for devices required for a claim.
+// This is typically a request for a single resource like a device, but can
+// also ask for several identical devices.
+type DeviceRequest struct {
+    // Name can be used to reference this request in a pod.spec.containers[].resources.claims
+    // entry and in a constraint of the claim.
     //
-    // +optional
-    // +listType=atomic
-    Requests []DeviceRequest
+    // Must be a DNS label.
+    //
+    // +required
+    Name string
 
-    // RankedRequests represents groups of requests, where exactly one
-    // request in each group must be satisfied. All entries in this list
-    // must be satisfied, using exactly one of the DeviceRequests listed
-    // in each RankedDeviceRequest.
+    // DeviceClassName references a specific DeviceClass, which can define
+    // additional configuration and selectors to be inherited by this
+    // request.
+    //
+    // Either a class or FirstOf requests are required in DeviceClaim.Requests.
+    // When this request is part of the FirstOf list, a class is required. Nested
+    // FirstOf requests are not allowed
+    //
+    // Which classes are available depends on the cluster.
+    //
+    // Administrators may use this to restrict which devices may get
+    // requested by only installing classes with selectors for permitted
+    // devices. If users are free to request anything without restrictions,
+    // then administrators can create an empty DeviceClass for users
+    // to reference.
     //
     // +optional
-    // +listType=atomic
-    RankedRequests []RankedDeviceRequest
+    // +oneOf=deviceRequestType
+    DeviceClassName string
 
-    // These constraints must be satisfied by the set of devices that get
-    // allocated for the claim.
+    // FirstOf contains subrequests, exactly one of which must be satisfied
+    // in order to satisfy this request. This field may only be set in the
+    // entries of DeviceClaim.Requests. It must not be set in DeviceRequest
+    // instances that themselves are part of a FirstOf.
     //
     // +optional
-    // +listType=atomic
-    Constraints []DeviceConstraint
+    // +oneOf=deviceRequestType
+    FirstOf []DeviceRequest
 
-    // This field holds configuration for multiple potential drivers which
-    // could satisfy requests in this claim. It is ignored while allocating
-    // the claim.
+    // Selectors define criteria which must be satisfied by a specific
+    // device in order for that device to be considered for this
+    // request. All selectors must be satisfied for a device to be
+    // considered.
     //
     // +optional
     // +listType=atomic
-    Config []DeviceClaimConfiguration
+    Selectors []DeviceSelector
 
-    // Potential future extension, ignored by older schedulers. This is
-    // fine because scoring allows users to define a preference, without
-    // making it a hard requirement.
+    // AllocationMode and its related fields define how devices are allocated
+    // to satisfy this request. Supported values are:
     //
-    // Score *SomeScoringStruct
-}
-
-const (
-    DeviceRequestsMaxSize    = AllocationResultsMaxSize
-    DeviceConstraintsMaxSize = 32
-    DeviceConfigMaxSize      = 32
-)
-
-// RankedDeviceRequest is a list of DeviceRequests, in the user's order of
-// preference for allocation.
-//
-type RankedDeviceRequest struct {
-    // Name can be used to reference this request in a pod.spec.containers[].resources.claims
-    // entry, or in Constraints or Config.
+    // - ExactCount: This request is for a specific number of devices.
+    //   This is the default. The exact number is provided in the
+    //   count field.
     //
-    // In the pod spec, this is the name that must be used, rather
-    // the names of the underlying requests.
+    // - All: This request is for all of the matching devices in a pool.
+    //   Allocation will fail if some devices are already allocated,
+    //   unless adminAccess is requested.
     //
-    // In the Contraints or Config, this name may be used, or the underlying request
-    // names may be used to provide additional specificity.
+    // If AlloctionMode is not specified, the default mode is ExactCount. If
+    // the mode is ExactCount and count is not specified, the default count is
+    // one. Any other requests must specify this field.
     //
-    // Must be a DNS label.
+    // More modes may get added in the future. Clients must refuse to handle
+    // requests with unknown modes.
     //
-    // +required
-    Name string
-
+    // +optional
+    AllocationMode DeviceAllocationMode
 
-    // Requests represent individual requests for distinct devices, exactly
-    // one of which must be satisfied. If empty, nothing needs to be allocated.
+    // Count is used only when the count mode is "ExactCount". Must be greater than zero.
+    // If AllocationMode is ExactCount and this field is not specified, the default is one.
     //
     // +optional
-    // +listType=atomic
-    Requests []DeviceRequest
+    // +oneOf=AllocationMode
+    Count int64
+
+    // AdminAccess indicates that this is a claim for administrative access
+    // to the device(s). Claims with AdminAccess are expected to be used for
+    // monitoring or other management services for a device.  They ignore
+    // all ordinary claims to the device with respect to access modes and
+    // any resource allocations.
+    //
+    // +optional
+    // +default=false
+    AdminAccess bool
 }
 
 const (
-    RankedDeviceRequestsMaxSize    = 8
+    DeviceSelectorsMaxSize    = 32
+    FirstOfDeviceRequestMaxSize = 8
 )
 ```
 
@@ -343,9 +370,8 @@ spec:
     requests:
     - name: nic
       deviceClassName: rdma-nic
-    rankedRequests:
     - name: gpu
-      requests:
+      firstOf:
       - name: big-gpu
         deviceClassName: big-gpu
       - name: mid-gpu
@@ -390,22 +416,22 @@ spec:
         request: "gpu" # the 'nic' request is pod-level, no need to attach to container
 ```
 
-There are a few things to note here. First, the "nic" request is listed in
-`requests`, because it has no alternative request types. The "gpu" request could
-be met by serveral different types of GPU, in an order of preference. Each of
-those is a separate `DeviceRequest`, and thus also has its own name. This allow
-us to apply constraints or configuration to specific, individual requests, in
-the event even that it is the chosen alternative. In this example, the
-"small-gpu" choice requires a configuration option that the other two choices do
-not need. Thus, if the resolution of the "gpu" request is made using the
-"small-gpu" subrequest, then that configuration will be attached to the
-allocation. Otherwise, it will not.
-
-Similarly, for `Constraints`, the list of requests can include the ranked
-request name ("gpu" in this case), in which case the constraint applies
-regardless of which alternative is chosen. Or, it can include the subrequest
-name, in which case that constraint only applies if that particular subrequest
-is chosen.
+There are a few things to note here. First, the "nic" request is listed with a
+`deviceClassName`, because it has no alternative request types. The "gpu"
+request could be met by several different types of GPU, in the listed order of
+preference. Each of those is a separate `DeviceRequest`, with both a
+`deviceClassName` and also its own name. The fact that these subrequests also
+have their own names allows us to apply constraints or configuration to
+specific, individual subrequests, in the event that it is the chosen
+alternative. In this example, the "small-gpu" choice requires a configuration
+option that the other two choices do not need. Thus, if the resolution of the
+"gpu" request is made using the "small-gpu" subrequest, then that configuration
+will be attached to the allocation. Otherwise, it will not.
+
+Similarly, for `Constraints`, the list of requests can include the main request
+name ("gpu" in this case), in which case the constraint applies regardless of
+which alternative is chosen. Or, it can include the subrequest name, in which
+case that constraint only applies if that particular subrequest is chosen.
 
 In the PodSpec, however, the subrequest names are not valid. Only the main
 request name may be used.
@@ -413,11 +439,11 @@ request name may be used.
 ### Resource Quota
 
 ResourceQuota will be enforced such that the user must have quota for each
-`DeviceRequest` under every `RankedDeviceRequest`. Thus, this "pick one"
-behavior cannot be used to circumvent quota. This reduces the usefuleness of the
-feature, as it no longer services as a quota-management feature. However, the
-primary goal of the feature is about flexibility across clusters and
-obtainability of underlying devices, not quota management.
+`DeviceRequest` under every `FirstOf`. Thus, this "pick one" behavior cannot be
+used to circumvent quota. This reduces the usefulness of the feature, as it
+means it will not serve as a quota management feature. However, the primary goal
+of the feature is about flexibility across clusters and obtainability of
+underlying devices, not quota management.
 
 ### User Stories (Optional)
 

From 338dde9842ed1a200bc96ca186d4d91a1440eb34 Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Fri, 27 Sep 2024 11:49:14 -0700
Subject: [PATCH 06/20] typo

---
 keps/sig-node/4816-dra-prioritized-list/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/keps/sig-node/4816-dra-prioritized-list/README.md b/keps/sig-node/4816-dra-prioritized-list/README.md
index 3c27d1fe016..62c40e1adac 100644
--- a/keps/sig-node/4816-dra-prioritized-list/README.md
+++ b/keps/sig-node/4816-dra-prioritized-list/README.md
@@ -250,7 +250,7 @@ nitty-gritty.
 The proposal adds a new field to the `DeviceRequest`, called `FirstOf` which
 will contain an ordered list of `DeviceRequest` objects. In order to satisfy the
 main (containing) request, exactly one of the requests listed in `FirstOf` must
-be satisfied. They order listed is considered a priority order, such that the
+be satisfied. The order listed is considered a priority order, such that the
 scheduler will only try to use the second item in the list if it is unable to
 satsify the first item, and so on.
 

From 010e7e193477f26f77c27ccba05f79d0d016b2fe Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Mon, 7 Oct 2024 12:52:56 -0700
Subject: [PATCH 07/20] Fill in the rest of the KEP

---
 .../4816-dra-prioritized-list/README.md       | 290 +++++++++++-------
 1 file changed, 179 insertions(+), 111 deletions(-)

diff --git a/keps/sig-node/4816-dra-prioritized-list/README.md b/keps/sig-node/4816-dra-prioritized-list/README.md
index 62c40e1adac..ecb24de9081 100644
--- a/keps/sig-node/4816-dra-prioritized-list/README.md
+++ b/keps/sig-node/4816-dra-prioritized-list/README.md
@@ -18,11 +18,11 @@ To get started with this template:
 - [x] **Fill out as much of the kep.yaml file as you can.**
   At minimum, you should fill in the "Title", "Authors", "Owning-sig",
   "Status", and date-related fields.
-- [ ] **Fill out this file as best you can.**
+- [x] **Fill out this file as best you can.**
   At minimum, you should fill in the "Summary" and "Motivation" sections.
   These should be easy if you've preflighted the idea of the KEP with the
   appropriate SIG(s).
-- [ ] **Create a PR for this KEP.**
+- [x] **Create a PR for this KEP.**
   Assign it to people in the SIG who are sponsoring this process.
 - [ ] **Merge early and iterate.**
   Avoid getting hung up on specific details and instead aim to get the goals of
@@ -129,16 +129,16 @@ checklist items _must_ be updated for the enhancement to be released.
 
 Items marked with (R) are required *prior to targeting to a milestone / release*.
 
-- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
+- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
 - [ ] (R) KEP approvers have approved the KEP status as `implementable`
-- [ ] (R) Design details are appropriately documented
-- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
+- [x] (R) Design details are appropriately documented
+- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
   - [ ] e2e Tests for all Beta API Operations (endpoints)
   - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) 
   - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
-- [ ] (R) Graduation criteria is in place
+- [x] (R) Graduation criteria is in place
   - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) 
-- [ ] (R) Production readiness review completed
+- [x] (R) Production readiness review completed
 - [ ] (R) Production readiness review approved
 - [ ] "Implementation History" section is up-to-date for milestone
 - [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
@@ -247,17 +247,17 @@ The "Design Details" section below is for the real
 nitty-gritty.
 -->
 
-The proposal adds a new field to the `DeviceRequest`, called `FirstOf` which
-will contain an ordered list of `DeviceRequest` objects. In order to satisfy the
-main (containing) request, exactly one of the requests listed in `FirstOf` must
-be satisfied. The order listed is considered a priority order, such that the
-scheduler will only try to use the second item in the list if it is unable to
-satsify the first item, and so on.
-
-A `DeviceRequest` that populates the `FirstOf` field must *not* populate the
-`DeviceClassName` field. The `required` validation on this field will be
-relaxed. This allows existing clients to differentiate between claims they
-understand (with `DeviceClassName`) and those they do not (without
+The proposal adds a new field to the `DeviceRequest`, called `FirstAvailableOf`
+which will contain an ordered list of `DeviceRequest` objects. In order to
+satisfy the main (containing) request, exactly one of the requests listed in
+`FirstAvailableOf` must be satisfied. The order listed is considered a priority
+order, such that the scheduler will only try to use the second item in the list
+if it is unable to satsify the first item, and so on.
+
+A `DeviceRequest` that populates the `FirstAvailableOf` field must *not*
+populate the `DeviceClassName` field. The `required` validation on this field
+will be relaxed. This allows existing clients to differentiate between claims
+they understand (with `DeviceClassName`) and those they do not (without
 `DeviceClassName` but with the new field). Clients written for 1.31, when
 `DeviceClassName` was required, were requested to include this logic, and the
 in-tree components have been built in this way.
@@ -279,9 +279,9 @@ type DeviceRequest struct {
     // additional configuration and selectors to be inherited by this
     // request.
     //
-    // Either a class or FirstOf requests are required in DeviceClaim.Requests.
-    // When this request is part of the FirstOf list, a class is required. Nested
-    // FirstOf requests are not allowed
+    // Either a class or FirstAvailableOf requests are required in DeviceClaim.Requests.
+    // When this request is part of the FirstAvailableOf list, a class is required. Nested
+    // FirstAvailableOf requests are not allowed
     //
     // Which classes are available depends on the cluster.
     //
@@ -295,14 +295,14 @@ type DeviceRequest struct {
     // +oneOf=deviceRequestType
     DeviceClassName string
 
-    // FirstOf contains subrequests, exactly one of which must be satisfied
+    // FirstAvailableOf contains subrequests, exactly one of which must be satisfied
     // in order to satisfy this request. This field may only be set in the
     // entries of DeviceClaim.Requests. It must not be set in DeviceRequest
-    // instances that themselves are part of a FirstOf.
+    // instances that themselves are part of a FirstAvailableOf.
     //
     // +optional
     // +oneOf=deviceRequestType
-    FirstOf []DeviceRequest
+    FirstAvailableOf []DeviceRequest
 
     // Selectors define criteria which must be satisfied by a specific
     // device in order for that device to be considered for this
@@ -353,8 +353,8 @@ type DeviceRequest struct {
 }
 
 const (
-    DeviceSelectorsMaxSize    = 32
-    FirstOfDeviceRequestMaxSize = 8
+    DeviceSelectorsMaxSize               = 32
+    FirstAvailableOfDeviceRequestMaxSize = 8
 )
 ```
 
@@ -439,11 +439,11 @@ request name may be used.
 ### Resource Quota
 
 ResourceQuota will be enforced such that the user must have quota for each
-`DeviceRequest` under every `FirstOf`. Thus, this "pick one" behavior cannot be
-used to circumvent quota. This reduces the usefulness of the feature, as it
-means it will not serve as a quota management feature. However, the primary goal
-of the feature is about flexibility across clusters and obtainability of
-underlying devices, not quota management.
+`DeviceRequest` under every `FirstAvailableOf`. Thus, this "pick one" behavior
+cannot be used to circumvent quota. This reduces the usefulness of the feature,
+as it means it will not serve as a quota management feature. However, the
+primary goal of the feature is about flexibility across clusters and
+obtainability of underlying devices, not quota management.
 
 ### User Stories (Optional)
 
@@ -503,7 +503,7 @@ when drafting this test plan.
 [testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
 -->
 
-[ ] I/we understand the owners of the involved components may require updates to
+[x] I/we understand the owners of the involved components may require updates to
 existing tests to make this code solid enough prior to committing the changes necessary
 to implement this enhancement.
 
@@ -535,7 +535,9 @@ This can inform certain test coverage improvements that we want to do before
 extending the production code to implement this enhancement.
 -->
 
-- `<package>`: `<date>` - `<test coverage>`
+- `k8s.io/kubernetes/pkg/scheduler`: TBD
+- `k8s.io/kubernetes/pkg/scheduler/framework`: TBD
+- `k8s.io/kubernetes/pkg/controller`: TBD
 
 ##### Integration tests
 
@@ -554,7 +556,14 @@ For Beta and GA, add links to added tests together with links to k8s-triage for
 https://storage.googleapis.com/k8s-triage/index.html
 -->
 
-- <test>: <link to test coverage>
+The existing [integration tests for kube-scheduler which measure
+performance](https://github.com/kubernetes/kubernetes/tree/master/test/integration/scheduler_perf#readme)
+will be extended to cover the overheaad of running the additional logic to
+support the features in this KEP. These also serve as [correctness
+tests](https://github.com/kubernetes/kubernetes/commit/cecebe8ea2feee856bc7a62f4c16711ee8a5f5d9)
+as part of the normal Kubernetes "integration" jobs which cover [the dynamic
+resource
+controller](https://github.com/kubernetes/kubernetes/blob/294bde0079a0d56099cf8b8cf558e3ae7230de12/test/integration/scheduler_perf/util.go#L135-L139).
 
 ##### e2e tests
 
@@ -568,37 +577,15 @@ https://storage.googleapis.com/k8s-triage/index.html
 We expect no non-infra related flakes in the last month as a GA graduation criteria.
 -->
 
-- <test>: <link to test coverage>
+End-to-end testing depends on a working resource driver and a container runtime
+with CDI support. A [test
+driver](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/dra/test-driver)
+was developed as part of the overall DRA development effort. We will extend this
+test driver to enable support for alternative device requests and add tests to
+ensure they are handled by the scheduler as described in this KEP.
 
 ### Graduation Criteria
 
-<!--
-**Note:** *Not required until targeted at a release.*
-
-Define graduation milestones.
-
-These may be defined in terms of API maturity, [feature gate] graduations, or as
-something else. The KEP should keep this high-level with a focus on what
-signals will be looked at to determine graduation.
-
-Consider the following in developing the graduation criteria for this enhancement:
-- [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels]
-- [Feature gate][feature gate] lifecycle
-- [Deprecation policy][deprecation-policy]
-
-Clearly define what graduation means by either linking to the [API doc
-definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning)
-or by redefining what graduation means.
-
-In general we try to use the same stages (alpha, beta, GA), regardless of how the
-functionality is accessed.
-
-[feature gate]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md
-[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions
-[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/
-
-Below are some examples to consider, in addition to the aforementioned [maturity levels][maturity-levels].
-
 #### Alpha
 
 - Feature implemented behind a feature flag
@@ -606,62 +593,40 @@ Below are some examples to consider, in addition to the aforementioned [maturity
 
 #### Beta
 
-- Gather feedback from developers and surveys
-- Complete features A, B, C
+- Gather feedback
 - Additional tests are in Testgrid and linked in KEP
 
 #### GA
 
-- N examples of real-world usage
-- N installs
-- More rigorous forms of testing—e.g., downgrade tests and scalability tests
+- 3 examples of real-world usage
 - Allowing time for feedback
 
-**Note:** Generally we also wait at least two releases between beta and
-GA/stable, because there's no opportunity for user feedback, or even bug reports,
-in back-to-back releases.
-
-**For non-optional features moving to GA, the graduation criteria must include
-[conformance tests].**
-
-[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
-
-#### Deprecation
-
-- Announce deprecation and support policy of the existing flag
-- Two versions passed since introducing the functionality that deprecates the flag (to address version skew)
-- Address feedback on usage/changed behavior, provided on GitHub issues
-- Deprecate the flag
--->
-
 ### Upgrade / Downgrade Strategy
 
-<!--
-If applicable, how will the component be upgraded and downgraded? Make sure
-this is in the test plan.
-
-Consider the following in developing an upgrade/downgrade strategy for this
-enhancement:
-- What changes (in invocations, configurations, API use, etc.) is an existing
-  cluster required to make on upgrade, in order to maintain previous behavior?
-- What changes (in invocations, configurations, API use, etc.) is an existing
-  cluster required to make on upgrade, in order to make use of the enhancement?
--->
+Standard upgrade/downgrade strategies may be used, no special configuration
+changes are needed. There are no kubelet or DRA-driver changes for this feature,
+they are all local to the control plane.
 
 ### Version Skew Strategy
 
-<!--
-If applicable, how will the component handle version skew with other
-components? What are the guarantees? Make sure this is in the test plan.
-
-Consider the following in developing a version skew strategy for this
-enhancement:
-- Does this enhancement involve coordinating behavior in the control plane and nodes?
-- How does an n-3 kubelet or kube-proxy without this feature available behave when this feature is used?
-- How does an n-1 kube-controller-manager or kube-scheduler without this feature available behave when this feature is used?
-- Will any other components on the node change? For example, changes to CSI,
-  CRI or CNI may require updating that component before the kubelet.
--->
+The proposed API change relaxes a `required` constraint on the
+`DeviceRequest.DeviceClassName` field. The `DeviceRequest` thus becomes a one-of
+that must have either the `DeviceClassName` or the `FirstAvailableOf` field
+populated.
+
+Older clients have been advised in the current implementation to check this
+field, even though it is required, and fail to allocate a claim that does not
+have the field set. This means that during rollout, if the API server has this
+feature, but the scheduler does not, the scheduler will fail to schedule pods
+that utilize the feature. The pod will be scheduled later according to the new
+functionality after the scheduler is upgraded.
+
+This feature affects the specific allocations that get made by the scheduler.
+Those allocations are stored in the `ResourceClaim` status, and will be acted
+upon by the kubelet and DRA-driver just as if the user had made the request
+without this feature. Thus, there is no impact on the data plane version skew;
+if the selected request could be satisfied by the data plane without this
+feature, it will work exactly the same with this feature.
 
 ## Production Readiness Review Questionnaire
 
@@ -705,9 +670,15 @@ well as the [existing list] of feature gates.
 [existing list]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
 -->
 
-- [ ] Feature gate (also fill in values in `kep.yaml`)
-  - Feature gate name:
+This is an add-on on top of the `DynamicResourceAllocation` feature gate, which
+also must be enabled for this feature to work.
+
+- [x] Feature gate (also fill in values in `kep.yaml`)
+  - Feature gate name: DRAFirstAvailableOf
   - Components depending on the feature gate:
+    - kube-apiserver
+    - kube-scheduler
+    - kube-controller-manager
 - [ ] Other
   - Describe the mechanism:
   - Will enabling / disabling the feature require downtime of the control
@@ -722,6 +693,8 @@ Any change of default behavior may be surprising to users or break existing
 automations, so be extremely careful here.
 -->
 
+No.
+
 ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
 
 <!--
@@ -735,8 +708,21 @@ feature.
 NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
 -->
 
+Yes. No existing claims or running pods will be affected. This feature affects
+only the allocation of devices during scheduling.
+
+If a workload controller or Pod uses a `ResourceClaimTemplate` that includes
+this feature, it could happen that a new Pod may be created and need to be
+scheduled, even though the feature is disabled. In this case, the new Pod will
+fail to schedule, as the corresponding `ResourceClaim` will not be able to be
+created.
+
 ###### What happens if we reenable the feature if it was previously rolled back?
 
+The feature will begin working again for future scheduling choices that make use
+of it. For `Deployments` or other users of `ResourceClaimTemplate`, previously
+failing Pod creations or scheduling may begin to succeed.
+
 ###### Are there any tests for feature enablement/disablement?
 
 <!--
@@ -752,6 +738,9 @@ You can take a look at one potential example of such test in:
 https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
 -->
 
+Unit tests will be written to validate the enablement and disablement behavior,
+as well as type conversions for the new field and relaxed validation.
+
 ### Rollout, Upgrade and Rollback Planning
 
 <!--
@@ -770,6 +759,8 @@ rollout. Similarly, consider large clusters and how enablement/disablement
 will rollout across nodes.
 -->
 
+Will consider in the beta timeframe.
+
 ###### What specific metrics should inform a rollback?
 
 <!--
@@ -777,6 +768,8 @@ What signals should users be paying attention to when the feature is young
 that might indicate a serious problem?
 -->
 
+Will consider in the beta timeframe.
+
 ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
 
 <!--
@@ -785,12 +778,17 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
 are missing a bunch of machinery and tooling and can't do that now.
 -->
 
+Will consider in the beta timeframe.
+
 ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
 
 <!--
 Even if applying deprecation policies, they may still surprise some users.
 -->
 
+No, though we do relax validation on one field to make it no longer a required
+field.
+
 ### Monitoring Requirements
 
 <!--
@@ -808,6 +806,8 @@ checking if there are objects with field X set) may be a last resort. Avoid
 logs or events for this purpose.
 -->
 
+Will consider in the beta timeframe.
+
 ###### How can someone using this feature know that it is working for their instance?
 
 <!--
@@ -819,6 +819,8 @@ and operation of this feature.
 Recall that end users cannot usually observe component logs or access metrics.
 -->
 
+Will consider in the beta timeframe.
+
 - [ ] Events
   - Event Reason: 
 - [ ] API .status
@@ -844,12 +846,16 @@ These goals will help you determine what you need to measure (SLIs) in the next
 question.
 -->
 
+Existing DRA and related SLOs continue to apply.
+
 ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
 
 <!--
 Pick one more of these and delete the rest.
 -->
 
+Will consider in the beta timeframe.
+
 - [ ] Metrics
   - Metric name:
   - [Optional] Aggregation method:
@@ -864,6 +870,9 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
 implementation difficulties, etc.).
 -->
 
+We can consider a histogram metric showing how many allocations are made from
+indices 0-7 of ResourceClaims that utilize this feature.
+
 ### Dependencies
 
 <!--
@@ -887,6 +896,10 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
       - Impact of its degraded performance or high-error rates on the feature:
 -->
 
+This feature depends on the DRA structured parameters feature being enabled, and
+on DRA drivers being deployed. There are no requirements beyond those already
+needed for DRA structured parameters.
+
 ### Scalability
 
 <!--
@@ -914,6 +927,8 @@ Focusing mostly on:
     heartbeats, leader election, etc.)
 -->
 
+No.
+
 ###### Will enabling / using this feature result in introducing new API types?
 
 <!--
@@ -923,6 +938,8 @@ Describe them, providing:
   - Supported number of objects per namespace (for namespace-scoped objects)
 -->
 
+No, just a new field on the `ResourceClaim.DeviceRequest` struct.
+
 ###### Will enabling / using this feature result in any new calls to the cloud provider?
 
 <!--
@@ -931,6 +948,8 @@ Describe them, providing:
   - Estimated increase:
 -->
 
+No.
+
 ###### Will enabling / using this feature result in increasing size or count of the existing API objects?
 
 <!--
@@ -940,6 +959,11 @@ Describe them, providing:
   - Estimated amount of new objects: (e.g., new Object X for every existing Pod)
 -->
 
+Yes, when using this field, the user will add additional data in their
+`ResourceClaim` and `ResourceClaimTemplate` objects. This is an incremental
+increase on top of the existing structures. The number of alternate requests is
+limited to 8 in order to minimize the potential object size.
+
 ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
 
 <!--
@@ -951,6 +975,10 @@ Think about adding additional work or introducing new steps in between
 [existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
 -->
 
+Scheduling a claim that uses this feature may take a bit longer, if it is
+necessary to go deeper into the list of alternative options before finding a
+suitable device. We can measure this impact in alpha.
+
 ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
 
 <!--
@@ -963,6 +991,8 @@ This through this both in small and large cases, again with respect to the
 [supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
 -->
 
+No.
+
 ###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
 
 <!--
@@ -975,6 +1005,8 @@ Are there any tests that were run/should be run to understand performance charac
 and validate the declared limits?
 -->
 
+No.
+
 ### Troubleshooting
 
 <!--
@@ -1034,10 +1066,46 @@ not need to be as detailed as the proposal, but should include enough
 information to express the idea and why it was not acceptable.
 -->
 
-### Resource Claim Indirection
+### Higher Level Indirection
 
 Rather than embedding a list of alternative request objects, we could use an
-umbrella `ResourceClaim` that instead references other `ResourceClaim`s.
+indirection at either the `ResourceClaim` level, or the `DeviceClaim` level.
+For example, we could create a new resource claim type by adding a
+`FirstOfDevices` list to the `ResourceClaimSpec`, and making it a one-of with
+`Devices`.
+
+Something like this:
+
+```go
+// ResourceClaimSpec defines what is being requested in a ResourceClaim and how to configure it.
+type ResourceClaimSpec struct {
+        // Devices defines how to request devices.
+        //
+        // oneOf: claimType
+        // +optional
+        Devices DeviceClaim
+
+        // FirstOfDevices defines devices to claim in a
+        //
+        // oneOf: claimType
+        // +optional
+        FirstOfDevices []DeviceClaim
+
+        //
+        // Must be a DNS subdomain and should end with a DNS domain owned by the
+        // vendor of the driver.
+        //
+        // This is an alpha field and requires enabling the DRAControlPlaneController
+        // feature gate.
+        //
+        // +optional
+        // +featureGate=DRAControlPlaneController
+        Controller string
+}
+```
+
+This is arguably simpler and allows them to be essentially complete, alternate
+claims.
 
 ## Infrastructure Needed (Optional)
 

From 8471149917357557b6082e5ca0a004385b4f3c0f Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Mon, 7 Oct 2024 12:56:04 -0700
Subject: [PATCH 08/20] update field name in example

---
 keps/sig-node/4816-dra-prioritized-list/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/keps/sig-node/4816-dra-prioritized-list/README.md b/keps/sig-node/4816-dra-prioritized-list/README.md
index ecb24de9081..0b44bbcd8b6 100644
--- a/keps/sig-node/4816-dra-prioritized-list/README.md
+++ b/keps/sig-node/4816-dra-prioritized-list/README.md
@@ -371,7 +371,7 @@ spec:
     - name: nic
       deviceClassName: rdma-nic
     - name: gpu
-      firstOf:
+      firstAvailableOf:
       - name: big-gpu
         deviceClassName: big-gpu
       - name: mid-gpu

From 33cdc03b073ac1376e524cbb0d568204bbba5ec9 Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Mon, 7 Oct 2024 13:37:56 -0700
Subject: [PATCH 09/20] Re-organize and add a bit

---
 .../4816-dra-prioritized-list/README.md       | 150 +++++++++++-------
 1 file changed, 97 insertions(+), 53 deletions(-)

diff --git a/keps/sig-node/4816-dra-prioritized-list/README.md b/keps/sig-node/4816-dra-prioritized-list/README.md
index 0b44bbcd8b6..370186deae1 100644
--- a/keps/sig-node/4816-dra-prioritized-list/README.md
+++ b/keps/sig-node/4816-dra-prioritized-list/README.md
@@ -247,6 +247,11 @@ The "Design Details" section below is for the real
 nitty-gritty.
 -->
 
+The `ResourceClaim` object contains a `DeviceClaim`, which in turn contains a
+list of `DeviceRequest` objects. This allows the user to allocate different
+types of devices for the same claim, and apply constraints and configuration
+across those different requests.
+
 The proposal adds a new field to the `DeviceRequest`, called `FirstAvailableOf`
 which will contain an ordered list of `DeviceRequest` objects. In order to
 satisfy the main (containing) request, exactly one of the requests listed in
@@ -254,6 +259,79 @@ satisfy the main (containing) request, exactly one of the requests listed in
 order, such that the scheduler will only try to use the second item in the list
 if it is unable to satsify the first item, and so on.
 
+This allows some flexibility for the user to create, say, a "gpu" request, but
+allow it to be satisfied by one of several models of GPU.
+
+### User Stories (Optional)
+
+<!--
+Detail the things that people will be able to do if this KEP is implemented.
+Include as much detail as possible so that people can understand the "how" of
+the system. The goal here is to make this feel real for users without getting
+bogged down.
+-->
+
+#### Story 1
+
+As a workload author, I want to run a workload that needs a GPU. The workoad
+itself can work with a few different models of GPU, but may need different
+numbers of them depending on the model chosen. If the latest model is available
+in my cluster, I would like to use that, but if it is not I am willing to take
+a model one generation older. If none of those are available, I am willing to
+take two GPUs of an even older model.
+
+#### Story 2
+
+As a workload author, I want to distribute the manifests of my workloads online.
+However, there are many different models of device out there, and so I do not
+want to be too prescriptive in how I define my manifest. If I make it too
+detailed, then I will either need multiple versions or the users will have to
+edit the manifest. Instead, I would like to provide some optionality in the
+types of devices that can meet my workload's needs. For best performance though,
+I do have a preferred ordering of devices.
+
+### Notes/Constraints/Caveats (Optional)
+
+<!--
+What are the caveats to the proposal?
+What are some important details that didn't come across above?
+Go in to as much detail as necessary here.
+This might be a good place to talk about core concepts and how they relate.
+-->
+
+#### Resource Quota
+
+ResourceQuota will be enforced such that the user must have quota for each
+`DeviceRequest` under every `FirstAvailableOf`. Thus, this "pick one" behavior
+cannot be used to circumvent quota. This reduces the usefulness of the feature,
+as it means it will not serve as a quota management feature. However, the
+primary goal of the feature is about flexibility across clusters and
+obtainability of underlying devices, not quota management.
+
+
+### Risks and Mitigations
+
+<!--
+What are the risks of this proposal, and how do we mitigate? Think broadly.
+For example, consider both security and how this will impact the larger
+Kubernetes ecosystem.
+
+How will security be reviewed, and by whom?
+
+How will UX be reviewed, and by whom?
+
+Consider including folks who also work outside the SIG or subproject.
+-->
+
+## Design Details
+
+<!--
+This section should contain enough information that the specifics of your
+change are understandable. This may include API specs (though not always
+required) or even code snippets. If there's any ambiguity about HOW your
+proposal will be implemented, this is the place to discuss them.
+-->
+
 A `DeviceRequest` that populates the `FirstAvailableOf` field must *not*
 populate the `DeviceClassName` field. The `required` validation on this field
 will be relaxed. This allows existing clients to differentiate between claims
@@ -436,59 +514,25 @@ case that constraint only applies if that particular subrequest is chosen.
 In the PodSpec, however, the subrequest names are not valid. Only the main
 request name may be used.
 
-### Resource Quota
-
-ResourceQuota will be enforced such that the user must have quota for each
-`DeviceRequest` under every `FirstAvailableOf`. Thus, this "pick one" behavior
-cannot be used to circumvent quota. This reduces the usefulness of the feature,
-as it means it will not serve as a quota management feature. However, the
-primary goal of the feature is about flexibility across clusters and
-obtainability of underlying devices, not quota management.
-
-### User Stories (Optional)
-
-<!--
-Detail the things that people will be able to do if this KEP is implemented.
-Include as much detail as possible so that people can understand the "how" of
-the system. The goal here is to make this feel real for users without getting
-bogged down.
--->
-
-#### Story 1
-
-#### Story 2
-
-### Notes/Constraints/Caveats (Optional)
-
-<!--
-What are the caveats to the proposal?
-What are some important details that didn't come across above?
-Go in to as much detail as necessary here.
-This might be a good place to talk about core concepts and how they relate.
--->
-
-### Risks and Mitigations
-
-<!--
-What are the risks of this proposal, and how do we mitigate? Think broadly.
-For example, consider both security and how this will impact the larger
-Kubernetes ecosystem.
-
-How will security be reviewed, and by whom?
-
-How will UX be reviewed, and by whom?
-
-Consider including folks who also work outside the SIG or subproject.
--->
-
-## Design Details
-
-<!--
-This section should contain enough information that the specifics of your
-change are understandable. This may include API specs (though not always
-required) or even code snippets. If there's any ambiguity about HOW your
-proposal will be implemented, this is the place to discuss them.
--->
+### Scheduler Implementation
+
+Currently, the scheduler loops through each entry in `DeviceClaim.Requests` and
+tries to satisfy each one. This would work essentially the same, except that
+today, it [throws an
+error](https://github.com/kubernetes/kubernetes/blob/03f134461462f86239067ec20ec17a0ba892db52/staging/src/k8s.io/dynamic-resource-allocation/structured/allocator.go#L164)
+when it encounters a claim with a missing `DeviceClassName`. Instead, here we
+would check for entries in `FirstAvailableOf`, and add an additional loop,
+trying each of these requests in order.
+
+The current implementation will navigate a depth-first search of the devices,
+trying to satisfy all requests and contraints of all claims. The optionality
+offered at the `DeviceRequest` level provides another index state to track in
+the
+[`requestIndices`](https://github.com/kubernetes/kubernetes/blob/03f134461462f86239067ec20ec17a0ba892db52/staging/src/k8s.io/dynamic-resource-allocation/structured/allocator.go#L362) and [`deviceIndices`](https://github.com/kubernetes/kubernetes/blob/03f134461462f86239067ec20ec17a0ba892db52/staging/src/k8s.io/dynamic-resource-allocation/structured/allocator.go#L368). In the case of the feature gate
+disabled, this new index will always be 0.
+
+Alternatively, we can refactor to make this code more defensible via a feature
+gate.
 
 ### Test Plan
 

From be36daa3831fc9019e7b39dbae063108ffb1235a Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Mon, 7 Oct 2024 13:40:57 -0700
Subject: [PATCH 10/20] update kep.yaml for latest changes

---
 keps/sig-node/4816-dra-prioritized-list/kep.yaml | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/keps/sig-node/4816-dra-prioritized-list/kep.yaml b/keps/sig-node/4816-dra-prioritized-list/kep.yaml
index cead6a0c478..76dd2cbb9f5 100644
--- a/keps/sig-node/4816-dra-prioritized-list/kep.yaml
+++ b/keps/sig-node/4816-dra-prioritized-list/kep.yaml
@@ -38,12 +38,11 @@ milestone:
 # The following PRR answers are required at alpha release
 # List the feature gate name and the components for which it must be enabled
 feature-gates:
-  - name: DRAPrioritizedList
+  - name: DRAFirstAvailableOf
     components:
       - kube-apiserver
       - kube-controller-manager
       - kube-scheduler
-      - kubelet
 disable-supported: true
 
 # The following PRR answers are required at beta release

From 3845eaffd5d9c0886713e30885206e717a0d901a Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Mon, 7 Oct 2024 18:24:36 -0700
Subject: [PATCH 11/20] Remove some unchanged API fields to make it easier to
 read

---
 .../4816-dra-prioritized-list/README.md       | 67 +++----------------
 1 file changed, 10 insertions(+), 57 deletions(-)

diff --git a/keps/sig-node/4816-dra-prioritized-list/README.md b/keps/sig-node/4816-dra-prioritized-list/README.md
index 370186deae1..60613bea0db 100644
--- a/keps/sig-node/4816-dra-prioritized-list/README.md
+++ b/keps/sig-node/4816-dra-prioritized-list/README.md
@@ -382,52 +382,7 @@ type DeviceRequest struct {
     // +oneOf=deviceRequestType
     FirstAvailableOf []DeviceRequest
 
-    // Selectors define criteria which must be satisfied by a specific
-    // device in order for that device to be considered for this
-    // request. All selectors must be satisfied for a device to be
-    // considered.
-    //
-    // +optional
-    // +listType=atomic
-    Selectors []DeviceSelector
-
-    // AllocationMode and its related fields define how devices are allocated
-    // to satisfy this request. Supported values are:
-    //
-    // - ExactCount: This request is for a specific number of devices.
-    //   This is the default. The exact number is provided in the
-    //   count field.
-    //
-    // - All: This request is for all of the matching devices in a pool.
-    //   Allocation will fail if some devices are already allocated,
-    //   unless adminAccess is requested.
-    //
-    // If AlloctionMode is not specified, the default mode is ExactCount. If
-    // the mode is ExactCount and count is not specified, the default count is
-    // one. Any other requests must specify this field.
-    //
-    // More modes may get added in the future. Clients must refuse to handle
-    // requests with unknown modes.
-    //
-    // +optional
-    AllocationMode DeviceAllocationMode
-
-    // Count is used only when the count mode is "ExactCount". Must be greater than zero.
-    // If AllocationMode is ExactCount and this field is not specified, the default is one.
-    //
-    // +optional
-    // +oneOf=AllocationMode
-    Count int64
-
-    // AdminAccess indicates that this is a claim for administrative access
-    // to the device(s). Claims with AdminAccess are expected to be used for
-    // monitoring or other management services for a device.  They ignore
-    // all ordinary claims to the device with respect to access modes and
-    // any resource allocations.
-    //
-    // +optional
-    // +default=false
-    AdminAccess bool
+    ...
 }
 
 const (
@@ -1102,6 +1057,10 @@ Major milestones might include:
 Why should this KEP _not_ be implemented?
 -->
 
+This adds complexity to the scheduler and to the cluster autoscaler, which will
+simulate the satisfaction of claims with different node shapes.
+
+
 ## Alternatives
 
 <!--
@@ -1135,21 +1094,15 @@ type ResourceClaimSpec struct {
         // +optional
         FirstOfDevices []DeviceClaim
 
-        //
-        // Must be a DNS subdomain and should end with a DNS domain owned by the
-        // vendor of the driver.
-        //
-        // This is an alpha field and requires enabling the DRAControlPlaneController
-        // feature gate.
-        //
-        // +optional
-        // +featureGate=DRAControlPlaneController
-        Controller string
+        ...
 }
 ```
 
 This is arguably simpler and allows them to be essentially complete, alternate
-claims.
+claims. It would be more difficult for the user, though, as it would require
+duplication of other device requests. Additionally, if there were multiple
+separate `FirstAvailableOf` requests in a claim, the user would have to specify
+all the combinations of those in order to get the same flexibility.
 
 ## Infrastructure Needed (Optional)
 

From a1be8d3c1106699fb59d40e4488c9d3baf6974ba Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Mon, 7 Oct 2024 19:04:00 -0700
Subject: [PATCH 12/20] Linter

---
 keps/sig-node/4816-dra-prioritized-list/README.md | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/keps/sig-node/4816-dra-prioritized-list/README.md b/keps/sig-node/4816-dra-prioritized-list/README.md
index 60613bea0db..90fd34832ee 100644
--- a/keps/sig-node/4816-dra-prioritized-list/README.md
+++ b/keps/sig-node/4816-dra-prioritized-list/README.md
@@ -87,14 +87,19 @@ tags, and then generate with `hack/update-toc.sh`.
     - [Story 1](#story-1)
     - [Story 2](#story-2)
   - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
+    - [Resource Quota](#resource-quota)
   - [Risks and Mitigations](#risks-and-mitigations)
 - [Design Details](#design-details)
+  - [Scheduler Implementation](#scheduler-implementation)
   - [Test Plan](#test-plan)
       - [Prerequisite testing updates](#prerequisite-testing-updates)
       - [Unit tests](#unit-tests)
       - [Integration tests](#integration-tests)
       - [e2e tests](#e2e-tests)
   - [Graduation Criteria](#graduation-criteria)
+    - [Alpha](#alpha)
+    - [Beta](#beta)
+    - [GA](#ga)
   - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
   - [Version Skew Strategy](#version-skew-strategy)
 - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
@@ -107,7 +112,7 @@ tags, and then generate with `hack/update-toc.sh`.
 - [Implementation History](#implementation-history)
 - [Drawbacks](#drawbacks)
 - [Alternatives](#alternatives)
-  - [Resource Claim Indirection](#resource-claim-indirection)
+  - [Higher Level Indirection](#higher-level-indirection)
 - [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
 <!-- /toc -->
 
@@ -257,7 +262,7 @@ which will contain an ordered list of `DeviceRequest` objects. In order to
 satisfy the main (containing) request, exactly one of the requests listed in
 `FirstAvailableOf` must be satisfied. The order listed is considered a priority
 order, such that the scheduler will only try to use the second item in the list
-if it is unable to satsify the first item, and so on.
+if it is unable to satisfy the first item, and so on.
 
 This allows some flexibility for the user to create, say, a "gpu" request, but
 allow it to be satisfied by one of several models of GPU.

From a27146928507058f243d6bec0b1f398f658968ea Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Tue, 8 Oct 2024 09:54:49 -0700
Subject: [PATCH 13/20] Review feedback, implementable

---
 keps/sig-node/4816-dra-prioritized-list/README.md | 6 ++++++
 keps/sig-node/4816-dra-prioritized-list/kep.yaml  | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/keps/sig-node/4816-dra-prioritized-list/README.md b/keps/sig-node/4816-dra-prioritized-list/README.md
index 90fd34832ee..22451f89896 100644
--- a/keps/sig-node/4816-dra-prioritized-list/README.md
+++ b/keps/sig-node/4816-dra-prioritized-list/README.md
@@ -721,6 +721,12 @@ scheduled, even though the feature is disabled. In this case, the new Pod will
 fail to schedule, as the corresponding `ResourceClaim` will not be able to be
 created.
 
+The recommendation is to remove any usage of this feature in both
+`ResourceClaim`s and `ResourceClaimTemplate`s when disabling the feature, and
+force the workloads to use a specific device request instead. This will ensure
+that there are no unexpected failures later, if a Pod gets rescheduled to
+another node or recreated for some reason.
+
 ###### What happens if we reenable the feature if it was previously rolled back?
 
 The feature will begin working again for future scheduling choices that make use
diff --git a/keps/sig-node/4816-dra-prioritized-list/kep.yaml b/keps/sig-node/4816-dra-prioritized-list/kep.yaml
index 76dd2cbb9f5..5fb4b7a3e10 100644
--- a/keps/sig-node/4816-dra-prioritized-list/kep.yaml
+++ b/keps/sig-node/4816-dra-prioritized-list/kep.yaml
@@ -6,7 +6,7 @@ owning-sig: sig-node
 participating-sigs:
   - sig-scheduling
   - sig-autoscaling
-status: provisional
+status: implementable
 creation-date: 2024-09-24
 reviewers:
   - "@pohly"

From 6cf172ec46fb0c765acb86005e91fd9138c74c4b Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Tue, 8 Oct 2024 17:11:00 -0700
Subject: [PATCH 14/20] Review feedback

---
 keps/sig-node/4816-dra-prioritized-list/README.md | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/keps/sig-node/4816-dra-prioritized-list/README.md b/keps/sig-node/4816-dra-prioritized-list/README.md
index 22451f89896..467db81001b 100644
--- a/keps/sig-node/4816-dra-prioritized-list/README.md
+++ b/keps/sig-node/4816-dra-prioritized-list/README.md
@@ -494,6 +494,14 @@ disabled, this new index will always be 0.
 Alternatively, we can refactor to make this code more defensible via a feature
 gate.
 
+DRA today works on a "first match" basis for a given node. That would not change
+with this KEP. However, in order for the scheduler to prefer a node that has the
+initial prioritized device request, those requests would need a higher score.
+This will be implemented for beta. For alpha, the scheduler may still pick a
+node with a less preferred device, if there are nodes with each type of device
+available.
+
+
 ### Test Plan
 
 <!--
@@ -593,11 +601,14 @@ ensure they are handled by the scheduler as described in this KEP.
 #### Alpha
 
 - Feature implemented behind a feature flag
+- Implemented in the scheduler but not necessarily the cluster auto scaler
 - Initial e2e tests completed and enabled
 
 #### Beta
 
 - Gather feedback
+- Implement node scoring
+- Cluster auto scaler implementation
 - Additional tests are in Testgrid and linked in KEP
 
 #### GA

From 5f5ec1faf7946df6ed5a515442b41f5dbd973014 Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Tue, 8 Oct 2024 17:42:48 -0700
Subject: [PATCH 15/20] Move to sig-scheduling

---
 .../4816-dra-prioritized-list/README.md                     | 0
 .../4816-dra-prioritized-list/kep.yaml                      | 6 +++---
 2 files changed, 3 insertions(+), 3 deletions(-)
 rename keps/{sig-node => sig-scheduling}/4816-dra-prioritized-list/README.md (100%)
 rename keps/{sig-node => sig-scheduling}/4816-dra-prioritized-list/kep.yaml (96%)

diff --git a/keps/sig-node/4816-dra-prioritized-list/README.md b/keps/sig-scheduling/4816-dra-prioritized-list/README.md
similarity index 100%
rename from keps/sig-node/4816-dra-prioritized-list/README.md
rename to keps/sig-scheduling/4816-dra-prioritized-list/README.md
diff --git a/keps/sig-node/4816-dra-prioritized-list/kep.yaml b/keps/sig-scheduling/4816-dra-prioritized-list/kep.yaml
similarity index 96%
rename from keps/sig-node/4816-dra-prioritized-list/kep.yaml
rename to keps/sig-scheduling/4816-dra-prioritized-list/kep.yaml
index 5fb4b7a3e10..c85009bb0cb 100644
--- a/keps/sig-node/4816-dra-prioritized-list/kep.yaml
+++ b/keps/sig-scheduling/4816-dra-prioritized-list/kep.yaml
@@ -2,9 +2,9 @@ title: DRA Prioritized List
 kep-number: 4816
 authors:
   - "@johnbelamaric"
-owning-sig: sig-node
+owning-sig: sig-scheduling
 participating-sigs:
-  - sig-scheduling
+  - sig-node
   - sig-autoscaling
 status: implementable
 creation-date: 2024-09-24
@@ -13,8 +13,8 @@ reviewers:
   - "@klueska"
   - "@thockin"
 approvers:
-  - "@mrunalp" # SIG-Node
   - "@alculquicondor" # SIG-Scheduling
+  - "@mrunalp" # SIG-Node
   - "@MaciekPytel" # SIG-Autoscaling
   - "@thockin" # API Review
 

From d669ac191bb7d4de8327f51a86d50a822666d81c Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Tue, 8 Oct 2024 17:47:08 -0700
Subject: [PATCH 16/20] Forgot to move PRR file

---
 keps/prod-readiness/{sig-node => sig-scheduling}/4816.yaml | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename keps/prod-readiness/{sig-node => sig-scheduling}/4816.yaml (100%)

diff --git a/keps/prod-readiness/sig-node/4816.yaml b/keps/prod-readiness/sig-scheduling/4816.yaml
similarity index 100%
rename from keps/prod-readiness/sig-node/4816.yaml
rename to keps/prod-readiness/sig-scheduling/4816.yaml

From 16987448b88a51e5fe3425431dbcae924601bc60 Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Wed, 9 Oct 2024 09:12:27 -0700
Subject: [PATCH 17/20] Add coverage numbers

---
 .../4816-dra-prioritized-list/README.md       | 37 ++++++++-----------
 1 file changed, 15 insertions(+), 22 deletions(-)

diff --git a/keps/sig-scheduling/4816-dra-prioritized-list/README.md b/keps/sig-scheduling/4816-dra-prioritized-list/README.md
index 467db81001b..5df673b0cdc 100644
--- a/keps/sig-scheduling/4816-dra-prioritized-list/README.md
+++ b/keps/sig-scheduling/4816-dra-prioritized-list/README.md
@@ -83,10 +83,10 @@ tags, and then generate with `hack/update-toc.sh`.
   - [Goals](#goals)
   - [Non-Goals](#non-goals)
 - [Proposal](#proposal)
-  - [User Stories (Optional)](#user-stories-optional)
+  - [User Stories](#user-stories)
     - [Story 1](#story-1)
     - [Story 2](#story-2)
-  - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
+  - [Notes/Constraints/Caveats](#notesconstraintscaveats)
     - [Resource Quota](#resource-quota)
   - [Risks and Mitigations](#risks-and-mitigations)
 - [Design Details](#design-details)
@@ -267,14 +267,7 @@ if it is unable to satisfy the first item, and so on.
 This allows some flexibility for the user to create, say, a "gpu" request, but
 allow it to be satisfied by one of several models of GPU.
 
-### User Stories (Optional)
-
-<!--
-Detail the things that people will be able to do if this KEP is implemented.
-Include as much detail as possible so that people can understand the "how" of
-the system. The goal here is to make this feel real for users without getting
-bogged down.
--->
+### User Stories
 
 #### Story 1
 
@@ -295,14 +288,7 @@ edit the manifest. Instead, I would like to provide some optionality in the
 types of devices that can meet my workload's needs. For best performance though,
 I do have a preferred ordering of devices.
 
-### Notes/Constraints/Caveats (Optional)
-
-<!--
-What are the caveats to the proposal?
-What are some important details that didn't come across above?
-Go in to as much detail as necessary here.
-This might be a good place to talk about core concepts and how they relate.
--->
+### Notes/Constraints/Caveats
 
 #### Resource Quota
 
@@ -501,7 +487,6 @@ This will be implemented for beta. For alpha, the scheduler may still pick a
 node with a less preferred device, if there are nodes with each type of device
 available.
 
-
 ### Test Plan
 
 <!--
@@ -547,9 +532,17 @@ This can inform certain test coverage improvements that we want to do before
 extending the production code to implement this enhancement.
 -->
 
-- `k8s.io/kubernetes/pkg/scheduler`: TBD
-- `k8s.io/kubernetes/pkg/scheduler/framework`: TBD
-- `k8s.io/kubernetes/pkg/controller`: TBD
+<!--
+Generated with:
+go test -cover ./pkg/scheduler/framework/plugins/dynamicresources/... ./pkg/controller/resourceclaim ./pkg/kubelet/cm/dra/... ./staging/src/k8s.io/dynamic-resource-allocation/cel ./staging/src/k8s.io/dynamic-resource-allocation/structured | sed -e 's/.*\(k8s.io[a-z/-]*\).*coverage: \(.*\) of statements/- `\1`: \2/' | sort
+-->
+
+Start of v1.32 development cycle (v1.32.0-alpha.1-178-gd9c46d8ecb1):
+
+- `k8s.io/dynamic-resource-allocation/cel`: 88.8%
+- `k8s.io/dynamic-resource-allocation/structured`: 82.7%
+- `k8s.io/kubernetes/pkg/controller/resourceclaim`: 70.0%
+- `k8s.io/kubernetes/pkg/scheduler/framework/plugins/dynamicresources`: 72.9%
 
 ##### Integration tests
 

From 51e4db71bcc8cfe1207a5ee80100cc426c709c26 Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Wed, 9 Oct 2024 11:08:56 -0700
Subject: [PATCH 18/20] Review feedback

---
 .../sig-scheduling/4816-dra-prioritized-list/README.md | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/keps/sig-scheduling/4816-dra-prioritized-list/README.md b/keps/sig-scheduling/4816-dra-prioritized-list/README.md
index 5df673b0cdc..d6a694c02e9 100644
--- a/keps/sig-scheduling/4816-dra-prioritized-list/README.md
+++ b/keps/sig-scheduling/4816-dra-prioritized-list/README.md
@@ -135,7 +135,7 @@ checklist items _must_ be updated for the enhancement to be released.
 Items marked with (R) are required *prior to targeting to a milestone / release*.
 
 - [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
-- [ ] (R) KEP approvers have approved the KEP status as `implementable`
+- [x] (R) KEP approvers have approved the KEP status as `implementable`
 - [x] (R) Design details are appropriately documented
 - [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
   - [ ] e2e Tests for all Beta API Operations (endpoints)
@@ -144,8 +144,8 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
 - [x] (R) Graduation criteria is in place
   - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) 
 - [x] (R) Production readiness review completed
-- [ ] (R) Production readiness review approved
-- [ ] "Implementation History" section is up-to-date for milestone
+- [x] (R) Production readiness review approved
+- [x] "Implementation History" section is up-to-date for milestone
 - [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
 - [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
 
@@ -601,7 +601,7 @@ ensure they are handled by the scheduler as described in this KEP.
 
 - Gather feedback
 - Implement node scoring
-- Cluster auto scaler implementation
+- Evaluate feasibilty of cluster auto scaler implementation
 - Additional tests are in Testgrid and linked in KEP
 
 #### GA
@@ -1066,6 +1066,8 @@ Major milestones might include:
 - when the KEP was retired or superseded
 -->
 
+1.32 Enhancements Freeze - KEP merged, alpha implementation initiated
+
 ## Drawbacks
 
 <!--

From 02d762447a89f4a1bac7014d0e52c0dd26c5c61b Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Wed, 9 Oct 2024 14:27:15 -0700
Subject: [PATCH 19/20] Clarify implications of no scoring

---
 .../4816-dra-prioritized-list/README.md            | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/keps/sig-scheduling/4816-dra-prioritized-list/README.md b/keps/sig-scheduling/4816-dra-prioritized-list/README.md
index d6a694c02e9..fe6a3ceb2a8 100644
--- a/keps/sig-scheduling/4816-dra-prioritized-list/README.md
+++ b/keps/sig-scheduling/4816-dra-prioritized-list/README.md
@@ -481,11 +481,15 @@ Alternatively, we can refactor to make this code more defensible via a feature
 gate.
 
 DRA today works on a "first match" basis for a given node. That would not change
-with this KEP. However, in order for the scheduler to prefer a node that has the
-initial prioritized device request, those requests would need a higher score.
-This will be implemented for beta. For alpha, the scheduler may still pick a
-node with a less preferred device, if there are nodes with each type of device
-available.
+with this KEP; on any given node, devices will be tried in the priority order
+listed in the main request, and the first fit will be returned. However, in
+practice, nodes typically only have one type of device that would satisfy any of
+the three requests. That means that individual nodes with any of the listed
+devices will show as valid nodes for the workload. In order for the for the
+scheduler to prefer a node that has the initial prioritized device request,
+those requests would need a higher score, which currently is planned for beta of
+this feature. For alpha, the scheduler may still pick a node with a less
+preferred device, if there are nodes with each type of device available.
 
 ### Test Plan
 

From 8faa89b155358cc1d4ecfee3fc27e16e632adac9 Mon Sep 17 00:00:00 2001
From: John Belamaric <jbelamaric@google.com>
Date: Wed, 9 Oct 2024 14:53:36 -0700
Subject: [PATCH 20/20] Fix typo

---
 .../sig-scheduling/4816-dra-prioritized-list/README.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/keps/sig-scheduling/4816-dra-prioritized-list/README.md b/keps/sig-scheduling/4816-dra-prioritized-list/README.md
index fe6a3ceb2a8..c6a13d2106c 100644
--- a/keps/sig-scheduling/4816-dra-prioritized-list/README.md
+++ b/keps/sig-scheduling/4816-dra-prioritized-list/README.md
@@ -485,11 +485,11 @@ with this KEP; on any given node, devices will be tried in the priority order
 listed in the main request, and the first fit will be returned. However, in
 practice, nodes typically only have one type of device that would satisfy any of
 the three requests. That means that individual nodes with any of the listed
-devices will show as valid nodes for the workload. In order for the for the
-scheduler to prefer a node that has the initial prioritized device request,
-those requests would need a higher score, which currently is planned for beta of
-this feature. For alpha, the scheduler may still pick a node with a less
-preferred device, if there are nodes with each type of device available.
+devices will show as valid nodes for the workload. In order for the scheduler to
+prefer a node that has the initial prioritized device request, those requests
+would need a higher score, which currently is planned for beta of this feature.
+For alpha, the scheduler may still pick a node with a less preferred device, if
+there are nodes with each type of device available.
 
 ### Test Plan