-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Conversation
+1 |
@phemmer don't you need the overcommit parameter here as well ? Docker doesn't return the exact amount of memory available, a little less (the system returns a little less) |
Not sure. I opted to leave it out as I thought the strategy should be basic. The idea behind the binpacking algorithm is to fill a single 'bin' to it's max, and then continue to the next one, so overcommit is necessary there as you need to know maximum capacity (I'm sure you're aware of this, just adding here for completeness). This algorithm is meant to be a lot simpler in that it evenly distributes everything, so a maximum isn't as important, and it won't refuse to start a container due to lack of available resources. Because it won't refuse to start a container, overcommit doesn't make much sense. My thoughts on why it shouldn't be limited is that I think it would be unexpected for swarm to refuse to start containers, as this is not how docker behaves. There's also currently nothing that warns the user that they are approaching the maximum capacity. So a sudden refusal to start containers can be very bad. I can easily add this in if desired. Perhaps with an insanely high overcommit default value so the same effect is reached. |
This thing is, if all the machines in your cluster have 2gigs of RAM and you want to run a container with |
I think you have an incomplete thought: "if all the machines in your cluster" what? |
@phemmer sorry, updated |
but we are thinking about moving the overcommit outside of the strategies, it would be a top level thing. |
Ah, yes, because of the pre-check node filter. Valid point, I'll add it in. |
Ok, added in. However one comment on the overcommitness used by swarm is that it has a critical difference from linux's overcommit. In linux, I'll start on some tests for the strategy if there are no further changes requested. |
if we use this, it means our default whould be |
You could keep the scale if you wanted, and use |
+1 |
Well there is one thing that might be unexpected. This strategy doesn't consider whether the container is running or not. If you have 10 stopped containers on one node, but 0 running, and 5 running containers on another node, it will place the new container on the one with 5 containers. |
Will rebase for #228 and work on adding some tests. |
thanks @phemmer, sorry about that. |
@phemmer could you please add some tests similar to binpacking_test.go ? |
@@ -57,7 +57,7 @@ var ( | |||
} | |||
flStrategy = cli.StringFlag{ | |||
Name: "strategy", | |||
Usage: "placement strategy to use [binpacking, random]", | |||
Usage: "PlacementStrategy to use [balanced, binpacking, random]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you add the tests, can you switch this back to placement strategy
(lowercase + space)
Sorry for the delay. Rebased, node.go fixed, and tests added. |
I'm also still uncertain about the overcommit thing. As mentioned earlier, this scheduler will allow you to add 2x 2gb containers to a 3gb node, and this isn't how binpack behaves. I see a few possible solutions:
|
@phemmer Can you please clarify? |
@aluzzardi Beyond the explanation already provided, i'm not sure how. Perhaps an example: Lets say you have 2 nodes, each with 1.5gb memory. |
@phemmer With how much overcommit? |
none, or even the default |
Ok, In my opinion, it should work the same way as binpacking: the 3rd commit shouldn't start because there is not enough ressource. // what do you think of the name "spread" instead of balanced ? |
But then why doesn't docker behave that way? Docker will happily let you launch a container even if it exceeds the node's capacity. As for the name, i'm not too fond of "spread", as I think the term is rather ambiguous. The "random" strategy spreads containers mostly evenly based on count, so the name should differentiate how the strategy behaves. |
In any case, we going to postpone the merge of this PR after the RC, so we can try to find a solution that works for everybody. |
Looking forward to this one so long as it doesn't overcommit. For JupyterHub's dockerspawner and tmpnb we expect to fill up some expected amount of memory, pooling them in advance. We'd rather not see it overcommit. If it's configurable though, that's a different thing. |
@jhamrick are you using this PR as is, or tweaked in some ways ? |
@vieux I'm not actually using this PR; I just made a small modification to the binpacking strategy (just reverses the sort order, so it does more of a round robin thing). I may switch to this strategy once the PR is merged, though. |
If I may suggest, why not considering composability through a pipeline of applicable strategies? That is, each strategy returns a set of potential candidates that are passed to the next one. In fine, the first element of the resulting set (into which members are considered equivalent) is chosen. It would add a little more complexity within each strategy but would remove the need to repeat code. In the cli point of view, this could be expressed as such:
in which case the binpacking strategy would be applied before the balanced strategy. |
@mota I'm not sure strategies really compose since they often have competing priorities , so at least I don't see a good use case for it yet. |
@tnachen You're right, strategies interfere with each other. Nevertheless, I don't agree with your idea of a one monolithic strategy that is applied statefully. In the first place because filters are applied before strategies. They cannot operate on a cluster as a whole, but only on a subset of it. Secondly because I think it is best to let the user chose which priority he or she values the most. However, I'd like to revise my paper and add a little subtlety regarding the implementation I propose. The idea would be not to associate to each node a pipeline score at each pipeline execution. At the end of the pipeline process, the node with the best score is chosen. That way, the |
Since a filter is user defined, it's supporting cases where users explicitly can prune the selection to what they want, and that sunset is what I refer to as global state. I think what's missing in your proposal is a concrete use case that deems this necessary, I'm not against the idea, but I'm hoping to keep the scheduler simple to begin with as it can become very hard to reason with what you described. |
And balanced with binpack composed |
@mota Perhaps you can provide some examples of how you expect a combination of schedulers to work. Because binpacking is pretty much the complete opposite of balanced. I don't see how they can co-exist. |
@phemmer sure thing. Maybe my view is flawed but as things evolve, I don't see strategies as a code of conduct but more as a best-effort behavior. First thing, I think the elimination of incapable nodes in terms of resource usage should be made in the filtering phase, not in the strategy phase as it is done currently. Thus, what I propose is simply to tune the behavior of the scheduler. Let's say you want to favor spreading instead of stacking, you should use the Tell me if I'm wrong but as I see it, the implementation of |
The strategy README should be updated to reflect this new strategy as well |
return nil, ErrNoResourcesAvailable | ||
} | ||
|
||
sort.Sort(scores) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of re-implementing the wheel here, you can just call sort.Sort(sort.Reverse(scores))
on a scores
structure. That should clean up a lot of the boilerplate.
ping @phemmer ? |
Sorry, haven't had a chance to actually build and use it. But from looking at the implementation, this appears to have the same effect as the strategy in this PR, so I think it's good. |
cool, @phemmer I'm closing this, please comment if you have any issue. |
This adds a 'balanced' strategy. This is a very basic strategy which will evenly distribute containers among the docker hosts.