More general scheduling constraints #367

thockin · 2014-07-07T20:24:52Z

There have been a few folks who have asked about machine constraints for scheduling. Let's use this issue as a place to gather ideas and requirements.

@timothysc

verdverm · 2014-07-07T20:32:51Z

I have noticed the FirstFit (default?) scheduler co-locates pods when there are open machines available. Each of these machines has a single cpu.

It would be nice to use information about available cpu and a pod's expected cpu requirements

sed 's/cpu/other_machine_stat/'

monnand · 2014-07-07T20:34:47Z

Currently, kubelet could get stats from cAdvisor which would be useful for scheduler. It could provide different percentiles of CPU and memory usage of a container (including the root container, i.e. the machine).

thockin · 2014-07-07T20:35:04Z

That's just "scheduling", as opposed to machine constraints, though very
coarsely they feel similar :)

On Mon, Jul 7, 2014 at 1:32 PM, Tony Worm [email protected] wrote:

I have noticed the FirstFit (default?) scheduler co-locates pods when
there are open machines available. Each of these machines has a single cpu.

It would be nice to use information about available cpu and a pod's
expected cpu requirements

sed 's/cpu/other_machine_stat/'

Reply to this email directly or view it on GitHub
#367 (comment)
.

timothysc · 2014-07-07T20:42:42Z

Labels "seems" like the ideal place to enable a rank & requirements to define constraints. However labels would need to be regularly published by minions.

e.g.
rank = memory
requirements = gpu & clusterXYZ

I have a couple of concerns here:

This treads into the full scale scheduling world.
Config syntax = ?, DSL? ...

thockin · 2014-07-07T20:47:54Z

Let's worry about semantics before syntax. We have a similar issue open
for label selectors in general - we can discuss syntax there.

On Mon, Jul 7, 2014 at 1:42 PM, Timothy St. Clair [email protected]
wrote:

Labels "seems" like the ideal place to enable a rank & requirements to
define constraints. For example:

rank = memory
requirements = gpu & clusterXYZ

I have a couple of concerns here:

This treads into the full scale scheduling world.

Config syntax = ?, DSL? ...

Reply to this email directly or view it on GitHub
#367 (comment)
.

timothysc · 2014-07-07T20:56:12Z

FWIW I often view constraints as a SQL query on a nvp store.

SELECT Resources
FROM Pool
WHERE Requirements
ORDER BY Rank
...

The hardest part are the 'fields' in an nvp store.

bgrant0607 · 2014-07-07T21:34:12Z

Scheduling based on resources and constraints are 2 significantly different issues.

We have several issues open about resource (and QoS) awareness: #147 , #160 , #168 , #274 , #317.

Constraint syntax/semantics: We should start with the proposed label selector mechanism, #341 .

timothysc · 2014-07-09T02:45:44Z

I'm ok with doing the selection from a set of offers/resources from the scheduler.

Provided the offers have enough NVP information to enable discrimination.

thockin · 2014-07-09T02:51:28Z

I don't know about NVP - where can I read more on it?
On Jul 8, 2014 7:45 PM, "Timothy St. Clair" [email protected]
wrote:

I'm ok with doing the selection from a set of offers/resources from the
scheduler.

Provided the offers have enough NVP information to enable discrimination.

Reply to this email directly or view it on GitHub
#367 (comment)
.

bgrant0607 · 2014-07-09T03:16:40Z

Searching for "NVP SQL" or "name value pair SQL" or "key value pair SQL" comes up with lots of hits. Common arguments against are performance and loss of control over DB schema. But I'm getting the feeling that we're barking up the wrong forest.

@timothysc What are you trying to do? Right now, k8s has essentially no intelligent scheduling. However, that's not a desirable end state. If what you want is a scheduler, we should figure out how to support scheduling plugins and/or layers on top of k8s.

thockin · 2014-07-09T03:32:31Z

Name Value Pairs? Now I feel dumb :)

On Tue, Jul 8, 2014 at 7:51 PM, Tim Hockin [email protected] wrote:

I don't know about NVP - where can I read more on it?
On Jul 8, 2014 7:45 PM, "Timothy St. Clair" [email protected]
wrote:

I'm ok with doing the selection from a set of offers/resources from the
scheduler.

Provided the offers have enough NVP information to enable discrimination.

Reply to this email directly or view it on GitHub
#367 (comment)
.

bgrant0607 · 2014-07-09T04:08:57Z

Something somewhat different than label selectors is per-attribute limits for spreading. Aurora is one system that supports this model:
https://aurora.incubator.apache.org/documentation/latest/configuration-reference/#specifying-scheduling-constraints

This is more relevant to physical rather than virtual deployments. I'd consider it a distinct mechanism from constraints. @timothysc If you'd like this, we should file a separate issue. However, I'd prefer a a new failure tolerance scheduling policy object that specifies a label selector to identify the set of instances to be spread. We could debate about how to describe what kind and/or how much spreading to apply, but I'd initially just leave it entirely up to the infrastructure.

timothysc · 2014-07-09T14:58:12Z

I completely agree its more relevant to physical rather then virtual deployments.

I was somewhat testing the possibility of enabling the capabilities for more general purpose scheduling, on par with a mini-Condor approach but it's not a requirement.

Aurora or Marathon -esk capabilities will fill the gap.
https://github.com/mesosphere/marathon/wiki/Constraints

bgrant0607 · 2014-10-17T23:33:38Z

Note that in order to add constraints, we'd need a way to attach labels to minions/nodes.

timothysc · 2014-10-20T15:26:01Z

That is what I had alluded to earlier, but it received luke warm attention. In fact, I believe Wilkes had chimed in on a different thread regarding this topic.

brendandburns · 2014-10-20T15:52:30Z

I think we should have labels for worker nodes, but they need to be
dynamic, and that's tough without a re-scheduler.

For now, I think we should use resources on nodes, since they are already
there, and the are known to be static.

You can add resource requests to pods, to achieve appropriate scheduling.

Brendan
On Oct 20, 2014 8:26 AM, "Timothy St. Clair" [email protected]
wrote:

That is what I had alluded to earlier, but it received luke warm
attention. In fact, I believe Wilkes had chimed in on a different thread
regarding this topic.

—
Reply to this email directly or view it on GitHub
#367 (comment)
.

erictune · 2014-10-20T16:27:45Z

Replication controllers reschedule pods when the machines they are on are no longer available. Seems like replication controller could do the same if the machine becomes infeasible for scheduling. A fairly simple loop can recheck predicates as a background task in the scheduler, and move pods to terminated state if they no longer fit.

Questions:

If a pod running pod is updated so that it's requirements no longer match the machine it is bound to, what happens?
- pod moves to terminated state
- refuse update
- both, and let users be able to control the behavior (yay)

bgrant0607 · 2014-10-20T16:39:01Z

The minion/node controller (#1366) should be responsible for killing pods with mismatched label selectors, and then, yes, replication controllers would recreate them.

Re. @erictune's question: Yes, we could support both, for instance, using a URL parameter to select the desired behavior.

brendandburns · 2014-10-20T17:23:22Z

Yeah, having the kubelet kill pods that don't match makes the most sense.

--brendan

On Mon, Oct 20, 2014 at 9:39 AM, bgrant0607 [email protected]
wrote:

The minion/node controller (#1366
#1366) should
be responsible for killing pods with mismatched label selectors, and then,
yes, replication controllers would recreate them.

Re. @erictune https://github.com/erictune's question: Yes, we could
support both, for instance, using a URL parameter to select the desired
behavior.

—
Reply to this email directly or view it on GitHub
#367 (comment)
.

timothysc · 2015-06-18T02:17:33Z

@davidopp

Constraint evals to simlpe true or false expressions, so it should really be &&, || this way you can connect them including simple predicate matching.

GROUPBY (cluster) || rack==funkytown

The more I think about it, the less I want to tread into config language space given the cattle idiom on services. Weather they are soft or hard could either be denoted kia keyword or some other semantics.

davidopp · 2015-06-18T06:10:22Z

So IIUC the way the thing you're proposing would work would like this?

GROUPBY expr1 || expr2 => put a virtual label X on all the machines that match expr1 or expr2, and then try to co-locate all the pods of the service on machines with label X
GROUPBY expr1 && expr2 => put a virtual label X on all the machines that match expr1 and expr2, and then try to co-locate all the pods of the service on machines with label X
SPREADBY expr1 || expr2 => put a virtual label X on all the machines that match expr1 or expr2, and then try to spread all the pods of the service across machines with label X
SPREADBY expr1 && expr2 => put a virtual label X on all the machines that match expr1 and expr2, and then try to spread all the pods of the service across machines with label X

It would be good to flesh out some use cases...

bgrant0607 · 2015-06-20T03:55:34Z

I agree that flavors of affinity and anti-affinity are the basic 2 features that would satisfy most use cases.

With respect to As Simple As Possible, specifying just whether to group or spread seams like the simplest possible API. That needs to be associated with some set of pods, via label selector (in which object TBD). Node groups to concentrate in or spread across could be configured in the scheduler in most cases.

jcderr · 2015-09-17T22:08:54Z

+1

I deploy some fairly hefty celery tasks in our cluster, and definitively do not ever want more than one running on the same host at the same time. I'd rather some get left unscheduled and run a monitoring task to pick them up by scaling my cluster up.

dchen1107 · 2015-09-17T23:03:09Z

cc/ @pwittrock

dchen1107 · 2015-09-17T23:03:47Z

#13524

davidopp · 2015-12-07T05:51:26Z

This is part of #18261

timothysc · 2015-12-08T16:03:12Z

@davidopp I think it reasonable to close this issue, in lieu of the assorted proposal.

davidopp · 2015-12-14T04:16:13Z

@timothysc Let's wait until we merge the proposal.

Caches container data for 5 seconds before updating it

Baremetal cleanup

bgrant0607 · 2016-05-17T06:09:13Z

Affinity/anti-affinity proposals merged and implementations are underway.

Baremetal cleanup

change packages to bases in Manifest

Bug 1877793: Force releasing the lock on exit for KS

untangle plugin runner a bit

bgrant0607 mentioned this issue Jul 9, 2014

Generalize label selectors #341

Closed

monnand mentioned this issue Jul 9, 2014

Use godep to manage dependencies #378

Merged

bgrant0607 mentioned this issue Jul 16, 2014

Allow cluster resources to be subdivided #442

Closed

smarterclayton added the enhancement label Jul 17, 2014

brendandburns added this to the 0.7 milestone Sep 24, 2014

bgrant0607 mentioned this issue Sep 24, 2014

Initial cut of a spreading and generic scheduler. #1420

Merged

bgrant0607 modified the milestones: v0.8, v0.7 Sep 26, 2014

bgrant0607 added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Sep 30, 2014

bgrant0607 mentioned this issue Oct 1, 2014

Daemon (was Feature: run-on-every-node scheduling/replication (aka per-node controller or daemon controller)) #1518

Closed

bgrant0607 assigned davidopp Oct 4, 2014

bgrant0607 mentioned this issue Jun 25, 2015

Anti affinity predicate for pods #9560

Closed

bgrant0607 mentioned this issue Jul 15, 2015

Update examples to create service before RC (so that spreading works) #11144

Closed

davidopp mentioned this issue Jul 22, 2015

Can nodeselector schedule based on in / not in option selection enable affinity / anti - affinity #11707

Closed

davidopp added team/control-plane and removed team/master labels Aug 23, 2015

This was referenced Sep 24, 2015

Proposal: Affinity Priority for pods of different RC/services #14484

Closed

Write a doc describing our status and plans for hard/soft affinity/anti-affinity (scheduling) #14816

Closed

davidopp mentioned this issue Dec 6, 2015

Inter-pod topological affinity/anti-affinity proposal #18265

Merged

vishh pushed a commit to vishh/kubernetes that referenced this issue Apr 6, 2016

Merge pull request kubernetes#367 from kateknister/master

0a9f963

Caches container data for 5 seconds before updating it

keontang pushed a commit to keontang/kubernetes that referenced this issue May 14, 2016

Merge pull request kubernetes#367 from ddysher/baremetal-clean

928e3b2

Baremetal cleanup

bgrant0607 closed this as completed May 17, 2016

keontang pushed a commit to keontang/kubernetes that referenced this issue Jul 1, 2016

Merge pull request kubernetes#367 from ddysher/baremetal-clean

f042e13

Baremetal cleanup

harryge00 pushed a commit to harryge00/kubernetes that referenced this issue Aug 11, 2016

Merge pull request kubernetes#367 from ddysher/baremetal-clean

b9fcdb2

Baremetal cleanup

mqliang pushed a commit to mqliang/kubernetes that referenced this issue Dec 8, 2016

Merge pull request kubernetes#367 from ddysher/baremetal-clean

9224fd0

Baremetal cleanup

mqliang pushed a commit to mqliang/kubernetes that referenced this issue Mar 3, 2017

Merge pull request kubernetes#367 from ddysher/baremetal-clean

50b060b

Baremetal cleanup

wking pushed a commit to wking/kubernetes that referenced this issue Jul 21, 2020

Merge pull request kubernetes#367 from Liujingfang1/bases

9c2320c

change packages to bases in Manifest

smarterclayton pushed a commit to smarterclayton/kubernetes that referenced this issue Sep 22, 2020

Merge pull request kubernetes#367 from tnozicka/fix-ks-release-on-cancel

8a39924

Bug 1877793: Force releasing the lock on exit for KS

linxiulei pushed a commit to linxiulei/kubernetes that referenced this issue Jan 18, 2024

Merge pull request kubernetes#367 from grosser/grosser/unwrap

a999207

untangle plugin runner a bit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More general scheduling constraints #367

More general scheduling constraints #367

thockin commented Jul 7, 2014

verdverm commented Jul 7, 2014

monnand commented Jul 7, 2014

thockin commented Jul 7, 2014

timothysc commented Jul 7, 2014

thockin commented Jul 7, 2014

timothysc commented Jul 7, 2014

bgrant0607 commented Jul 7, 2014

timothysc commented Jul 9, 2014

thockin commented Jul 9, 2014

bgrant0607 commented Jul 9, 2014

thockin commented Jul 9, 2014

bgrant0607 commented Jul 9, 2014

timothysc commented Jul 9, 2014

bgrant0607 commented Oct 17, 2014

timothysc commented Oct 20, 2014

brendandburns commented Oct 20, 2014

erictune commented Oct 20, 2014

bgrant0607 commented Oct 20, 2014

brendandburns commented Oct 20, 2014

timothysc commented Jun 18, 2015

davidopp commented Jun 18, 2015

bgrant0607 commented Jun 20, 2015

jcderr commented Sep 17, 2015

dchen1107 commented Sep 17, 2015

dchen1107 commented Sep 17, 2015

davidopp commented Dec 7, 2015

timothysc commented Dec 8, 2015

davidopp commented Dec 14, 2015

bgrant0607 commented May 17, 2016

More general scheduling constraints #367

More general scheduling constraints #367

Comments

thockin commented Jul 7, 2014

verdverm commented Jul 7, 2014

monnand commented Jul 7, 2014

thockin commented Jul 7, 2014

timothysc commented Jul 7, 2014

thockin commented Jul 7, 2014

timothysc commented Jul 7, 2014

bgrant0607 commented Jul 7, 2014

timothysc commented Jul 9, 2014

thockin commented Jul 9, 2014

bgrant0607 commented Jul 9, 2014

thockin commented Jul 9, 2014

bgrant0607 commented Jul 9, 2014

timothysc commented Jul 9, 2014

bgrant0607 commented Oct 17, 2014

timothysc commented Oct 20, 2014

brendandburns commented Oct 20, 2014

erictune commented Oct 20, 2014

bgrant0607 commented Oct 20, 2014

brendandburns commented Oct 20, 2014

timothysc commented Jun 18, 2015

davidopp commented Jun 18, 2015

bgrant0607 commented Jun 20, 2015

jcderr commented Sep 17, 2015

dchen1107 commented Sep 17, 2015

dchen1107 commented Sep 17, 2015

davidopp commented Dec 7, 2015

timothysc commented Dec 8, 2015

davidopp commented Dec 14, 2015

bgrant0607 commented May 17, 2016