-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a blog post about QueueingHint #43686
Closed
Closed
Changes from 1 commit
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
2701c1a
blog post: QueueingHint
sanposhiho ba18bc4
address the review
sanposhiho 0c73dc3
mention precheck
sanposhiho dc5d233
fix based on reviews
sanposhiho 1d2f831
fix based on reviews
sanposhiho 8d9a069
fix based on reviews
sanposhiho 3cdf93b
change date to 19th
sanposhiho 430c9f3
Update content/en/blog/_posts/2023-12-19-scheduler-queueinghint/index.md
sanposhiho File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
99 changes: 99 additions & 0 deletions
99
content/en/blog/_posts/2023-11-xx-scheduler-queueinghint/index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
--- | ||
layout: blog | ||
title: "Kubernetes v1.28: QueueingHint brings a new possibility to optimize our scheduling" | ||
date: 2023-10-25T10:00:00-08:00 | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
slug: scheduler-queueinghint | ||
--- | ||
|
||
**Author:** [Kensei Nakada](https://github.com/sanposhiho) (Mercari) | ||
|
||
The scheduler is the core component that decides which Node Pods run on. | ||
Basically, it schedules Pods **one by one**, | ||
and thus the larger your cluster is, the more crucial the throughput of the scheduler is. | ||
|
||
The throughput of the scheduler is our eternal challenge, | ||
over the years, SIG-Scheduling have been putting effort to improve the scheduling throughput by many enhancements. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
In this blog post, I'll introduce a recent major improvement in the scheduler, named QueueingHint. | ||
|
||
We'll go through the explanation of the basic background knowledge of the scheduler, | ||
and how QueueingHint improves our scheduling throughput. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Scheduling Queue | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The scheduler has Scheduling Queue which has all unscheduled Pods. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Scheduling Queue is composed of three places in it - ActiveQ, BackoffQ and Unschedulable Pod Pool. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- ActiveQ: Pods which are ready to get scheduling. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- BackoffQ: Pods which are waiting for the backoff, and will be put into ActiveQ after that. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Unschedulable Pod Pool: Pods which should not be scheduled for now. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Scheduling Framework and Plugins | ||
|
||
[Scheduling Framework](/docs/concepts/scheduling-eviction/scheduling-framework/) | ||
|
||
The scheduler is implemented with Scheduling Framework. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
And, each scheduling requirements/preferences is implemented as a plugin. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
(e.g., PodAffinity is implemented in the PodAffinity plugin.) | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The first phase, called Scheduling Cycle, takes Pods from activeQ **one by one**, gather all plugins' idea, | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
and lastly decides a Node to run the Pod, or concludes that the Pod cannot go to anywhere for now. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
If the scheduling is successful, the second phase, called Binding Cycle, binds the Pod with the Node. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
But, if it turns out that the Pod cannot go to anywhere in Scheduling Cycle, | ||
Binding Cycle isn't executed, instead the Pod is moved back to Scheduling Queue. | ||
There are some exception cases though, such unscheduled Pod is basically put into Unschedulable Pod Pool. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Pods in Unschedulable Pod Pool are moved to ActiveQ/BackoffQ | ||
only when Scheduling Queue thinks they might be schedulable if we retry the scheduling. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
That is a crucial step because Scheduling Cycle is performed for Pods one by one - | ||
if we didn't have Unschedulable Pod Pool and kept retrying the scheduling of any Pods, | ||
Scheduling Cycle is wasted for Pods with no hope to be scheduled. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Then, how do they decide when to move? How do they notice that Pods might be schedulable now? | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
There we go, QueueingHint comes in. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## QueueingHint | ||
|
||
QueueingHint is callback functions per plugin to notice an object addition/update/deletion in the cluster (we call them cluster events) | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
that may make Pods schedulable. | ||
|
||
Let's say PodA has a required PodAffinity, and got rejected in scheduling cycle by PodAffinity plugin | ||
because no Node has any Pod matching with PodA's PodAffinity. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
![PodA got rejected by PodAffinity](./queueinghint1.png) | ||
|
||
When an unscheduled Pod is put into Unschedulable Pod Pool, Scheduling Queue remembers which plugins caused the scheduling failure of the Pod. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
In this example, Scheduling Queue notes that PodA was rejected by PodAffinity. | ||
|
||
PodA will never be schedulable until PodAffinity failure is resolved somehow. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Scheduling Queue uses QueueingHint from failure plugins, which is PodAffinity in the example. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
QueueingHint subscribes a perticular cluster event and make a decision whether an incoming event could make the Pod schedulable. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Thinking about when PodAffinity failure could be resolved, | ||
one possible scenario is that an existing Pod gets a new label which matches with PodA's PodAffinity. | ||
|
||
PodAffinity QueueingHint checks all Pod updates happening in the cluster, | ||
and when it catches such update, the scheduling queue moves PodA to activeQ/backoffQ. | ||
|
||
![PodA is moved by PodAffinity QueueingHint](./queueinghint2.png) | ||
|
||
## What's new in v1.28 | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
We have been working on the development of QueueingHint since v1.27. | ||
In v1.27, only one alpha plugin (DRA) supported QueueingHint, | ||
and in v1.28, some stable plugins start to work with QueueingHint. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
QueueingHint is not something user-facing, but we have a feature gate () as a safety net | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
because QueueingHint changes a critical path of the scheduler a lot. | ||
sanposhiho marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Getting involved | ||
|
||
These features are managed by Kubernetes [SIG Scheduling](https://github.com/kubernetes/community/tree/master/sig-scheduling). | ||
|
||
Please join us and share your feedback. | ||
|
||
## How can I learn more? | ||
|
||
- [KEP-4247: Per-plugin callback functions for efficient requeueing in the scheduling queue](https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/4247-queueinghint/README.md) |
Binary file added
BIN
+683 KB
content/en/blog/_posts/2023-11-xx-scheduler-queueinghint/queueinghint1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+583 KB
content/en/blog/_posts/2023-11-xx-scheduler-queueinghint/queueinghint2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.29 now, please. Also - until we change the style guide: Title Case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've marked this conversation as “not resolved” because the feedback is still applicable.