Skip to content
This repository has been archived by the owner on Feb 1, 2021. It is now read-only.

add 'balanced' scheduler strategy #227

Closed
wants to merge 3 commits into from
Closed

add 'balanced' scheduler strategy #227

wants to merge 3 commits into from

Conversation

phemmer
Copy link

@phemmer phemmer commented Jan 7, 2015

This adds a 'balanced' strategy. This is a very basic strategy which will evenly distribute containers among the docker hosts.

@MrGossett
Copy link

+1

@vieux
Copy link
Contributor

vieux commented Jan 7, 2015

@phemmer don't you need the overcommit parameter here as well ?

Docker doesn't return the exact amount of memory available, a little less (the system returns a little less)

@phemmer
Copy link
Author

phemmer commented Jan 7, 2015

Not sure. I opted to leave it out as I thought the strategy should be basic.

The idea behind the binpacking algorithm is to fill a single 'bin' to it's max, and then continue to the next one, so overcommit is necessary there as you need to know maximum capacity (I'm sure you're aware of this, just adding here for completeness).

This algorithm is meant to be a lot simpler in that it evenly distributes everything, so a maximum isn't as important, and it won't refuse to start a container due to lack of available resources. Because it won't refuse to start a container, overcommit doesn't make much sense.

My thoughts on why it shouldn't be limited is that I think it would be unexpected for swarm to refuse to start containers, as this is not how docker behaves. There's also currently nothing that warns the user that they are approaching the maximum capacity. So a sudden refusal to start containers can be very bad.

I can easily add this in if desired. Perhaps with an insanely high overcommit default value so the same effect is reached.

@vieux
Copy link
Contributor

vieux commented Jan 7, 2015

This thing is, if all the machines in your cluster have 2gigs of RAM and you want to run a container with -m 2g with your strategy, it won't work.

@phemmer
Copy link
Author

phemmer commented Jan 7, 2015

I think you have an incomplete thought: "if all the machines in your cluster" what?

@vieux
Copy link
Contributor

vieux commented Jan 7, 2015

@phemmer sorry, updated

@vieux
Copy link
Contributor

vieux commented Jan 7, 2015

but we are thinking about moving the overcommit outside of the strategies, it would be a top level thing.

@phemmer
Copy link
Author

phemmer commented Jan 7, 2015

Ah, yes, because of the pre-check node filter. Valid point, I'll add it in.

@phemmer
Copy link
Author

phemmer commented Jan 7, 2015

Ok, added in.

However one comment on the overcommitness used by swarm is that it has a critical difference from linux's overcommit. In linux, 100 overcommit means no overcommit, and to use exactly how much memory is available. Swarm is treating 0 as no overcommit. The advantage of using 100 as no overcommit is that if you want to prevent the system from launching something that uses nearly all the available resource, you can, by setting the overcommit value to 90 or so.
I think this would be a good idea for swarm to adopt, but I kept this strategy using the same behavior as binpack for consistency.

I'll start on some tests for the strategy if there are no further changes requested.

@vieux
Copy link
Contributor

vieux commented Jan 7, 2015

if we use this, it means our default whould be 105 ?

@phemmer
Copy link
Author

phemmer commented Jan 7, 2015

You could keep the scale if you wanted, and use 1.00 as no overcommit, or you can use 100.
So either 1.05 or 105 would be the equivalent of the current 0.05. 1.05 is probably more intuitive as then it becomes standard multiplication (100mb * 1.05 = 105mb)

@chanwit
Copy link
Contributor

chanwit commented Jan 8, 2015

+1
@phemmer I'm about to propose the "least running containers" strategy to balance my cluster. Hopefully I can use yours instead of inventing a new one. Cheers!

@phemmer
Copy link
Author

phemmer commented Jan 8, 2015

Well there is one thing that might be unexpected. This strategy doesn't consider whether the container is running or not. If you have 10 stopped containers on one node, but 0 running, and 5 running containers on another node, it will place the new container on the one with 5 containers.
This was done as it's how the binpack strategy behaves. Though it might be a good idea to add a flag to control the behavior.

@phemmer
Copy link
Author

phemmer commented Jan 14, 2015

Will rebase for #228 and work on adding some tests.

@vieux
Copy link
Contributor

vieux commented Jan 14, 2015

thanks @phemmer, sorry about that.

@vieux
Copy link
Contributor

vieux commented Jan 17, 2015

@phemmer could you please add some tests similar to binpacking_test.go ?

@@ -57,7 +57,7 @@ var (
}
flStrategy = cli.StringFlag{
Name: "strategy",
Usage: "placement strategy to use [binpacking, random]",
Usage: "PlacementStrategy to use [balanced, binpacking, random]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you add the tests, can you switch this back to placement strategy (lowercase + space)

@phemmer
Copy link
Author

phemmer commented Jan 17, 2015

Sorry for the delay. Rebased, node.go fixed, and tests added.

@phemmer
Copy link
Author

phemmer commented Jan 17, 2015

I'm also still uncertain about the overcommit thing. As mentioned earlier, this scheduler will allow you to add 2x 2gb containers to a 3gb node, and this isn't how binpack behaves.
I don't much like the idea of having it refuse to launch containers if they exceed the limits, but the reasons for doing so are pretty strong, and consistency is another big factor.

I see a few possible solutions:

  • Leave as is (perhaps open a ticket to continue discussion on the matter)
  • Enforce the limit, and add an option to toggle it.
  • Have the binpack scheduler fall back to the balanced scheduler if all nodes are full, so that it gets the same behavior.

@vieux vieux added this to the Swarm Beta 0.1.0 milestone Jan 19, 2015
@aluzzardi
Copy link
Contributor

@phemmer Can you please clarify?

@phemmer
Copy link
Author

phemmer commented Jan 19, 2015

@aluzzardi Beyond the explanation already provided, i'm not sure how. Perhaps an example:

Lets say you have 2 nodes, each with 1.5gb memory.
You start one container using 1gb, it goes to node A.
You start another container using 1gb, it goes to node B.
You start a third container using 1gb, it goes to node A.
Node A's total reserved memory is now 2gb.

@aluzzardi
Copy link
Contributor

@phemmer With how much overcommit?

@phemmer
Copy link
Author

phemmer commented Jan 19, 2015

none, or even the default 0.05. But less than 150%.

@vieux
Copy link
Contributor

vieux commented Jan 19, 2015

Ok, In my opinion, it should work the same way as binpacking: the 3rd commit shouldn't start because there is not enough ressource.

// what do you think of the name "spread" instead of balanced ?

@phemmer
Copy link
Author

phemmer commented Jan 19, 2015

But then why doesn't docker behave that way? Docker will happily let you launch a container even if it exceeds the node's capacity.

As for the name, i'm not too fond of "spread", as I think the term is rather ambiguous. The "random" strategy spreads containers mostly evenly based on count, so the name should differentiate how the strategy behaves.

@vieux
Copy link
Contributor

vieux commented Jan 21, 2015

In any case, we going to postpone the merge of this PR after the RC, so we can try to find a solution that works for everybody.

@vieux vieux removed this from the Swarm Beta 0.1.0 milestone Jan 21, 2015
@rgbkrk
Copy link

rgbkrk commented Jan 21, 2015

Looking forward to this one so long as it doesn't overcommit. For JupyterHub's dockerspawner and tmpnb we expect to fill up some expected amount of memory, pooling them in advance. We'd rather not see it overcommit. If it's configurable though, that's a different thing.

@jhamrick jhamrick mentioned this pull request Jan 21, 2015
@vieux
Copy link
Contributor

vieux commented Jan 27, 2015

@jhamrick are you using this PR as is, or tweaked in some ways ?

@jhamrick
Copy link
Contributor

@vieux I'm not actually using this PR; I just made a small modification to the binpacking strategy (just reverses the sort order, so it does more of a round robin thing). I may switch to this strategy once the PR is merged, though.

@dustbyte
Copy link
Contributor

If I may suggest, why not considering composability through a pipeline of applicable strategies?

That is, each strategy returns a set of potential candidates that are passed to the next one. In fine, the first element of the resulting set (into which members are considered equivalent) is chosen.

It would add a little more complexity within each strategy but would remove the need to repeat code.

In the cli point of view, this could be expressed as such:

swarm manage --strategies=binpacking,balanced ...

in which case the binpacking strategy would be applied before the balanced strategy.

@tnachen
Copy link
Contributor

tnachen commented Jan 27, 2015

@mota I'm not sure strategies really compose since they often have competing priorities , so at least I don't see a good use case for it yet.
And IMO it becomes harder to implement strategy, since each strategy shouldn't simply take a list of candidates from the last one, lots of them still has look at the global state (ie: balanced needs to balance across cluster) as a whole and then trying to see if any of the ones matches the passed in candidates.
I'm more in favor with a single strategy, and if really need to support configurations per strategy that favors different scenarios.

@dustbyte
Copy link
Contributor

@tnachen You're right, strategies interfere with each other.

Nevertheless, I don't agree with your idea of a one monolithic strategy that is applied statefully.

In the first place because filters are applied before strategies. They cannot operate on a cluster as a whole, but only on a subset of it.

Secondly because I think it is best to let the user chose which priority he or she values the most.

However, I'd like to revise my paper and add a little subtlety regarding the implementation I propose.

The idea would be not to associate to each node a pipeline score at each pipeline execution.
Each strategy would still be free to remove unsatisfying candidates, but would add to the remaining nodes a value to their score.

At the end of the pipeline process, the node with the best score is chosen.

That way, the balanced strategy would be implemented such as it only adds to each node's score a value of its own.
Therefore, the current implementation of balanced could be either matched with balanced,binpacking or binpacking,balanced.

@tnachen
Copy link
Contributor

tnachen commented Jan 28, 2015

Since a filter is user defined, it's supporting cases where users explicitly can prune the selection to what they want, and that sunset is what I refer to as global state.

I think what's missing in your proposal is a concrete use case that deems this necessary, I'm not against the idea, but I'm hoping to keep the scheduler simple to begin with as it can become very hard to reason with what you described.

@tnachen
Copy link
Contributor

tnachen commented Jan 28, 2015

And balanced with binpack composed
Together doesn't make much sense to me too.

@phemmer
Copy link
Author

phemmer commented Jan 28, 2015

@mota Perhaps you can provide some examples of how you expect a combination of schedulers to work. Because binpacking is pretty much the complete opposite of balanced. I don't see how they can co-exist.

@dustbyte
Copy link
Contributor

@phemmer sure thing.

Maybe my view is flawed but as things evolve, I don't see strategies as a code of conduct but more as a best-effort behavior.

First thing, I think the elimination of incapable nodes in terms of resource usage should be made in the filtering phase, not in the strategy phase as it is done currently.

Thus, what I propose is simply to tune the behavior of the scheduler. Let's say you want to favor spreading instead of stacking, you should use the balanced behavior. If in contrary you favor stacking, then you use the binpacking behavior. And finally, if you want to reach a best effort candidate, you apply both.

Tell me if I'm wrong but as I see it, the implementation of balanced you provide is pretty much the binpacking strategy with one added dimension.

@bacongobbler
Copy link
Contributor

The strategy README should be updated to reflect this new strategy as well

return nil, ErrNoResourcesAvailable
}

sort.Sort(scores)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of re-implementing the wheel here, you can just call sort.Sort(sort.Reverse(scores)) on a scores structure. That should clean up a lot of the boilerplate.

@aluzzardi
Copy link
Contributor

@phemmer I just merged #458 since it's more recent and merges properly (there have been many changes to the scheduler since this PR was opened).

Does it fits your needs?

@vieux
Copy link
Contributor

vieux commented Mar 17, 2015

ping @phemmer ?

@vieux vieux added current and removed next labels Mar 17, 2015
@phemmer
Copy link
Author

phemmer commented Mar 17, 2015

Sorry, haven't had a chance to actually build and use it. But from looking at the implementation, this appears to have the same effect as the strategy in this PR, so I think it's good.

@vieux
Copy link
Contributor

vieux commented Mar 25, 2015

cool, @phemmer I'm closing this, please comment if you have any issue.

@vieux vieux closed this Mar 25, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.