Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set "Multi Scheduling Profiles" to "implementable" #1483

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 63 additions & 20 deletions keps/sig-scheduling/20200114-multi-scheduling-profiles.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ approvers:
editor: TBD
creation-date: 2020-01-14
last-updated: 2020-01-14
status: provisional
status: implementable
see-also:
- "/keps/sig-scheduling/20180409-scheduling-framework.md"
- "/keps/sig-scheduling/20190226-default-even-pod-spreading.md"
Expand Down Expand Up @@ -42,17 +42,16 @@ see-also:
- [Design Details](#design-details)
- [Test Plan](#test-plan)
- [Graduation Criteria](#graduation-criteria)
- [Alpha -> Beta Graduation](#alpha---beta-graduation)
- [Beta -> GA Graduation](#beta---ga-graduation)
- [Alpha (v1.18):](#alpha-v118)
- [Implementation History](#implementation-history)
<!-- /toc -->

## Release Signoff Checklist

- [x] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)
- [ ] KEP approvers have set the KEP status to `implementable`
- [ ] Design details are appropriately documented
- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [x] KEP approvers have set the KEP status to `implementable`
- [x] Design details are appropriately documented
- [x] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [ ] Graduation criteria is in place
- [ ] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
Expand Down Expand Up @@ -136,6 +135,7 @@ looks like the following:
type KubeSchedulerConfiguration struct {
...
SchedulerName string
AlgorithmSource SchedulerAlgorithmSource
HardPodAffinitySymmetricWeight
Plugins *Plugins
PluginConfig []PluginConfig
Expand All @@ -158,14 +158,21 @@ type KubeSchedulerProfile struct {
}
```

Note that we remove `AlgorithmSource` from the new API. Its functionality becomes redundant to
what can be configured with `Plugins` and `PluginConfig`.

##### Conversion between API versions

During conversion from `v1alpha1` to `v1alpha2`, we will copy all the necessary
parameters from KubeSchedulerConfiguration into one item in the `Profiles` list.
During conversion of `kubescheduler.config.k8s.io` from `v1alpha1` to `v1alpha2`, we will copy all
the necessary parameters from KubeSchedulerConfiguration into one item in the `Profiles` list.

In particular, configurations done by using `AlgorithmSource` will produce different values for
`Plugins` and `PluginConfig`.
This is similar to what we already do internally in [`legacy_registry.go`](
https://github.com/kubernetes/kubernetes/blob/fb66e807cd317254e5c7bf134186ddbfba757ef4/pkg/scheduler/framework/plugins/legacy_registry.go#L149)

`HardPodAffinitySymmetricWeight` would be moved to be a `PluginConfig.Arg` in
the `PluginConfig` slice for the plugin `InterPodAffinity` as
`HardPodAffinityWeight`.
the `PluginConfig` slice for the plugin `InterPodAffinity` as `HardPodAffinityWeight`.

##### Defaults

Expand All @@ -192,16 +199,20 @@ similar result as the binary is starting.
`SchedulerName` values will be validated to not repeat among the items of
`Profiles`.

Since kube-scheduler has only one queue, we will validate that all `Plugins.QueueSort`
alculquicondor marked this conversation as resolved.
Show resolved Hide resolved
configurations are strictly the same.

##### CLI flags binding

Note that, if component config is used, deprecated flags are currently ignored,
which includes `scheduler-name` and `hard-pod-affinity-symmetric-weight`. This
implies that we only have to worry about these flags in relationship with the
default profile.
Note that, if component config is used, deprecated flags are currently ignored, which includes
`scheduler-name`, `algorithm-provider` and `hard-pod-affinity-symmetric-weight`. This implies
that we only have to worry about these flags in relationship with the default profile.

Thus, if component config is not used, we will preserve the behavior of the
flags as follows:
- `scheduler-name` will be bound to its counterpart in the default profile.
- `algorithm-provider` will produce different `Plugins` configurations. For examples, it will
produce an empty configuration for `default-scheduler`.
- `hard-pod-affinity-symmetric-weight` will be bound to a new deprecated option
that will be processed into a `pluginConfig` slice of the default profile,
like follows:
Expand Down Expand Up @@ -230,6 +241,10 @@ the scheduler queue.
framework instance from the registry corresponding to the specified scheduler
name.

Note that all framework instances will make use of the same shared cache
(for nodes and pods), from which a snapshot is taken for each scheduling cycle.
This is the main advantage over running multiple schedulers in a cluster.

### Risks and Mitigations

Operators could introduce profiles that disable scheduling features exposed in
Expand All @@ -244,19 +259,47 @@ the scheduler documentation.

### Test Plan

TODO
The following tests need to be in place:

- **Unit Tests**:
- Component Config API conversion, validation and defaults
- Core scheduler implementation. Current tests that use a default scheduler
(or default framework) should continue passing with no configuration changes.
- **Integration tests**: Current tests with a default scheduler should continue passing with no
configuration changes. We need new tests in `test/integration/scheduler` exercising more than one
profile, in which:
- Each profile would favor specific nodes, so that we can verify assignment.
- Pods get binding events for the selected scheduler name.
- Pods that don't specify a scheduler name continue to be scheduled by the default profile.
liggitt marked this conversation as resolved.
Show resolved Hide resolved

*Note on E2E tests*

Due to the proposed architecture, where a single kube-scheduler binary runs all the profiles, E2E
tests wouldn't increase the coverage of this feature over unit and integration tests.
Additionally, profiles can only be provided statically during cluster creation with our current
test infra. This implies that an independent job would be needed for each scheduler configuration.
But, as stated in our goals, this KEP doesn't introduce new default profiles.

### Graduation Criteria

##### Alpha -> Beta Graduation
#### Alpha (v1.18):

TODO
These are the required changes:

##### Beta -> GA Graduation
- [ ] New `kubescheduler.config.k8s.io/v1alpha2` API.
- [ ] Conversion from `kubescheduler.config.k8s.io/v1alpha1`
- [ ] Validation.
- [ ] Defaults.
- [ ] Scheduler can run more than one framework:
- [ ] Scheduler adds unscheduled pods to the pending queue for more than one name.
- [ ] Scheduler uses a framework using the scheduler name specified by the pod.
- [ ] Tests from [Test Plan](#test-plan).

TODO
Note that we don't require a feature gate as users already have to opt-in by using
`kubescheduler.config.k8s.io/v1alpha2` instead of the previous version.

## Implementation History

- 2020-01-14: Initial KEP sent out for review, including Summary, Motivation
and Proposal
and Proposal.
- 2020-01-21: Test Plan and Alpha Graduation criteria in KEP.