generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 262
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Expose the information about the order of pending workloads
- Loading branch information
Showing
2 changed files
with
369 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,349 @@ | ||
# KEP-168: Pending workloads visibility | ||
|
||
<!-- | ||
This is the title of your KEP. Keep it short, simple, and descriptive. A good | ||
title can help communicate what the KEP is and should be considered as part of | ||
any review. | ||
--> | ||
|
||
<!-- | ||
A table of contents is helpful for quickly jumping to sections of a KEP and for | ||
highlighting any additional information provided beyond the standard KEP | ||
template. | ||
Ensure the TOC is wrapped with | ||
<code><!-- toc --&rt;<!-- /toc --&rt;</code> | ||
tags, and then generate with `hack/update-toc.sh`. | ||
--> | ||
|
||
<!-- toc --> | ||
- [Summary](#summary) | ||
- [Motivation](#motivation) | ||
- [Goals](#goals) | ||
- [Non-Goals](#non-goals) | ||
- [Proposal](#proposal) | ||
- [User Stories (Optional)](#user-stories-optional) | ||
- [Story 1](#story-1) | ||
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) | ||
- [Risks and Mitigations](#risks-and-mitigations) | ||
- [Time and memory consuming computation of top pending workloads](#time-and-memory-consuming-computation-of-top-pending-workloads) | ||
- [Too large local queue object](#too-large-local-queue-object) | ||
- [Increased number of localqueue updates](#increased-number-of-localqueue-updates) | ||
- [Design Details](#design-details) | ||
- [Local Queue API](#local-queue-api) | ||
- [Test Plan](#test-plan) | ||
- [Prerequisite testing updates](#prerequisite-testing-updates) | ||
- [Unit Tests](#unit-tests) | ||
- [Integration tests](#integration-tests) | ||
- [Graduation Criteria](#graduation-criteria) | ||
- [Implementation History](#implementation-history) | ||
- [Drawbacks](#drawbacks) | ||
- [Alternatives](#alternatives) | ||
<!-- /toc --> | ||
|
||
## Summary | ||
|
||
The enhancement extends the API of LocalQueue and ClusterQueue to expose the | ||
information about the order of their pending workloads. | ||
|
||
## Motivation | ||
|
||
Currently, there is no visibility to users of the contests of the queue by a | ||
user. | ||
. | ||
|
||
<!-- | ||
This section is for explicitly listing the motivation, goals, and non-goals of | ||
this KEP. Describe why the change is important and the benefits to users. The | ||
motivation section can optionally provide links to [experience reports] to | ||
demonstrate the interest in a KEP within the wider Kubernetes community. | ||
[experience reports]: https://github.com/golang/go/wiki/ExperienceReports | ||
--> | ||
|
||
### Goals | ||
|
||
- expose the order of workloads in the LocalQueue and ClusterQueue | ||
|
||
<!-- | ||
List the specific goals of the KEP. What is it trying to achieve? How will we | ||
know that this has succeeded? | ||
--> | ||
|
||
### Non-Goals | ||
|
||
- expose the information about workload position for each workload individually | ||
|
||
<!-- | ||
What is out of scope for this KEP? Listing non-goals helps to focus discussion | ||
and make progress. | ||
--> | ||
|
||
## Proposal | ||
|
||
The proposal is to extend the APIs for the status of LocalQueue and ClusterQueue | ||
to expose the order to workloads. The order will be only exposed up to some | ||
configurable depth. | ||
|
||
The approach of keeping the information in the status of | ||
the queues will allow to expose the information without an extra cost of new | ||
API requests, as the statuses are already updated on changes for the number of | ||
pending workloads. | ||
|
||
In order to keep the size of the information constrained only the head of the | ||
queue of pending workloads will be exposed. | ||
|
||
<!-- | ||
This is where we get down to the specifics of what the proposal actually is. | ||
This should have enough detail that reviewers can understand exactly what | ||
you're proposing, but should not include things like API designs or | ||
implementation. What is the desired outcome and how do we measure success?. | ||
The "Design Details" section below is for the real | ||
nitty-gritty. | ||
--> | ||
|
||
### User Stories (Optional) | ||
|
||
<!-- | ||
Detail the things that people will be able to do if this KEP is implemented. | ||
Include as much detail as possible so that people can understand the "how" of | ||
the system. The goal here is to make this feel real for users without getting | ||
bogged down. | ||
--> | ||
|
||
#### Story 1 | ||
|
||
As a user of Kueue with LocalQueue visibility I would like to know the | ||
position of my workload in the local queue. Knowing the position would allow me | ||
to compute the estimate arrival time (ETA) of my workload. | ||
|
||
I would like to be able to get this information by inspecting the local queue | ||
status. | ||
|
||
<!-- | ||
#### Story 2 | ||
As a user of Kueue with ClusterQueue visibility I would like to know | ||
the position of my workload in the cluster queue, so that I can estimate the | ||
ETA of my workload. | ||
I would like to be able to get this information by inspecting the cluster queue | ||
status. | ||
--> | ||
|
||
### Notes/Constraints/Caveats (Optional) | ||
|
||
<!-- | ||
What are the caveats to the proposal? | ||
What are some important details that didn't come across above? | ||
Go in to as much detail as necessary here. | ||
This might be a good place to talk about core concepts and how they relate. | ||
--> | ||
|
||
### Risks and Mitigations | ||
|
||
#### Time and memory consuming computation of top pending workloads | ||
|
||
Currently, we organize the cluster queue as a heap, thus only the top of the | ||
heap is readily available. If we want to get to know the top N pending workloads | ||
we may need to copy the heap, sort and select N, which is time and memory | ||
consuming. | ||
|
||
In order to mitigate this risk we may need to migrate from using a heap to | ||
red-black trees. | ||
|
||
#### Too large local queue object | ||
|
||
As the number of pending workloads is arbitrarily large there is a risk that the | ||
status information about the workloads may exceed the etcd limit of 1.5Mi on | ||
object size. In order to allow feature extensions of the structure we | ||
should assume not more that 500Ki is used. | ||
|
||
In order to mitigate this risk we put a constraint on the number of exposed | ||
pending workloads. We limit the number to 1000. | ||
|
||
#### Increased number of localqueue updates | ||
|
||
As we put the global cluster queue position into the list of top pending workloads | ||
changes to one local queue may trigger updates in another local queue connected | ||
to the same cluster queue, due to status changes in the structure. | ||
|
||
In order to mitigate this risk we need to consider: | ||
1. batching updates to local queues, by a batch period, similarly as Job status | ||
updates are batched in Kubernetes. | ||
2. allow to control the number of N workloads which expose their position in the | ||
cluster queue. Setting low value for the limit would reduce the change of | ||
changes in one local queue to trigger an update in another local queue. | ||
|
||
|
||
<!-- | ||
What are the risks of this proposal, and how do we mitigate? Think broadly. | ||
For example, consider both security and how this will impact the larger | ||
Kubernetes ecosystem. | ||
How will security be reviewed, and by whom? | ||
How will UX be reviewed, and by whom? | ||
Consider including folks who also work outside the SIG or subproject. | ||
--> | ||
|
||
## Design Details | ||
|
||
### Local Queue API | ||
|
||
```golang | ||
// PendingWorkload contains the information identifying a pending workload in | ||
// the local queue. | ||
type PendingWorkload struct { | ||
// Name indicates the name of the pending workload. | ||
Name string | ||
|
||
// Position indicates the position of the workload among all pending | ||
// workloads in the cluster queue. | ||
Position *int32 | ||
} | ||
|
||
type PendingWorkloadsStatus struct { | ||
// TopList contains the list of top pending workloads. | ||
// +listType=map | ||
// +listMapKey=name | ||
// +listMapKey=namespace | ||
// +optional | ||
TopList []PendingWorkload | ||
} | ||
|
||
// LocalQueueStatus defines the observed state of LocalQueue | ||
type LocalQueueStatus struct { | ||
... | ||
// PendingWorkloadsStatus contains the information exposed about the current | ||
// status of the queue of pending workloads. | ||
// +optional | ||
PendingWorkloadsStatus *PendingWorkloadsStatus | ||
... | ||
} | ||
``` | ||
|
||
The returned information is controlled by the global Kueue configuration: | ||
|
||
```golang | ||
// Configuration is the Schema for the kueueconfigurations API | ||
type Configuration struct { | ||
... | ||
// PendingWorkloadsVisibility is configuration to expose the information | ||
// about the top pending workloads in the local queue. | ||
PendingWorkloadsVisibility *PendingWorkloadsVisibility `json:"pendingWorkloadsVisibility,omitempty"` | ||
} | ||
|
||
type PendingWorkloadsVisibility struct { | ||
// MaxPendingWorkloadsInStatus indicates the maximal number of pending | ||
// workloads for which their local queue order is exposed. | ||
// Defaults to 100. | ||
MaxTopPendingWorkloads *int32 | ||
} | ||
``` | ||
|
||
<!-- | ||
This section should contain enough information that the specifics of your | ||
change are understandable. This may include API specs (though not always | ||
required) or even code snippets. If there's any ambiguity about HOW your | ||
proposal will be implemented, this is the place to discuss them. | ||
--> | ||
|
||
### Test Plan | ||
|
||
<!-- | ||
**Note:** *Not required until targeted at a release.* | ||
The goal is to ensure that we don't accept enhancements with inadequate testing. | ||
All code is expected to have adequate tests (eventually with coverage | ||
expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines] | ||
when drafting this test plan. | ||
[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md | ||
--> | ||
|
||
[x] I/we understand the owners of the involved components may require updates to | ||
existing tests to make this code solid enough prior to committing the changes necessary | ||
to implement this enhancement. | ||
|
||
##### Prerequisite testing updates | ||
|
||
<!-- | ||
Based on reviewers feedback describe what additional tests need to be added prior | ||
implementing this enhancement to ensure the enhancements have also solid foundations. | ||
--> | ||
|
||
#### Unit Tests | ||
|
||
<!-- | ||
In principle every added code should have complete unit test coverage, so providing | ||
the exact set of tests will not bring additional value. | ||
However, if complete unit test coverage is not possible, explain the reason of it | ||
together with explanation why this is acceptable. | ||
--> | ||
|
||
<!-- | ||
Additionally, try to enumerate the core package you will be touching | ||
to implement this enhancement and provide the current unit coverage for those | ||
in the form of: | ||
- <package>: <date> - <current test coverage> | ||
This can inform certain test coverage improvements that we want to do before | ||
extending the production code to implement this enhancement. | ||
--> | ||
|
||
- `<package>`: `<date>` - `<test coverage>` | ||
|
||
#### Integration tests | ||
|
||
<!-- | ||
Describe what tests will be added to ensure proper quality of the enhancement. | ||
After the implementation PR is merged, add the names of the tests here. | ||
--> | ||
|
||
### Graduation Criteria | ||
|
||
<!-- | ||
Clearly define what it means for the feature to be implemented and | ||
considered stable. | ||
If the feature you are introducing has high complexity, consider adding graduation | ||
milestones with these graduation criteria: | ||
- [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels] | ||
- [Feature gate][feature gate] lifecycle | ||
- [Deprecation policy][deprecation-policy] | ||
[feature gate]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md | ||
[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions | ||
[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/ | ||
--> | ||
|
||
## Implementation History | ||
|
||
<!-- | ||
Major milestones in the lifecycle of a KEP should be tracked in this section. | ||
Major milestones might include: | ||
- the `Summary` and `Motivation` sections being merged, signaling SIG acceptance | ||
- the `Proposal` section being merged, signaling agreement on a proposed design | ||
- the date implementation started | ||
- the first Kubernetes release where an initial version of the KEP was available | ||
- the version of Kubernetes where the KEP graduated to general availability | ||
- when the KEP was retired or superseded | ||
--> | ||
|
||
## Drawbacks | ||
|
||
<!-- | ||
Why should this KEP _not_ be implemented? | ||
--> | ||
|
||
## Alternatives | ||
|
||
<!-- | ||
What other approaches did you consider, and why did you rule them out? These do | ||
not need to be as detailed as the proposal, but should include enough | ||
information to express the idea and why it was not acceptable. | ||
--> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
title: Pending workloads visibility | ||
kep-number: 168 | ||
authors: | ||
- "@mimowo" | ||
status: provisional | ||
creation-date: 2023-07-14 | ||
reviewers: | ||
- "@ahg-g" | ||
- "@alculquicondor" | ||
- "@mwielgus" | ||
approvers: | ||
- "@ahg-g" | ||
- "@alculquicondor" | ||
|
||
stage: stable | ||
|
||
latest-milestone: "v0.5" | ||
milestone: | ||
stable: "v0.5" | ||
disable-supported: false |