Skip to content
This repository has been archived by the owner on Feb 1, 2021. It is now read-only.

Bugs in binpacking strategy #253

Closed
jhamrick opened this issue Jan 19, 2015 · 8 comments
Closed

Bugs in binpacking strategy #253

jhamrick opened this issue Jan 19, 2015 · 8 comments
Assignees
Labels
Milestone

Comments

@jhamrick
Copy link
Contributor

Is the bin packing strategy supposed to choose the machine that has the most resources free, or the fewest resources free? Currently, it appears that it chooses the machine with the fewest free resources, I think due to the implementation of Less, which actually tests for greater: https://github.com/docker/swarm/blob/master/scheduler/strategy/binpacking.go#L81

If this is intentional, it might be good to add a comment clarifying that, or alternately just select the last element of the sorted list instead of the first (or use sort.Reverse).

Additionally, it seems that the total score is always zero. I think this is because dividing by 200 using integer division will always truncate to zero: https://github.com/docker/swarm/blob/master/scheduler/strategy/binpacking.go#L44 This line is equivalent to just dividing by 2; changing the code to do that seems to fix the problem.

@michealbenedict
Copy link

Is the bin packing strategy supposed to choose the machine that has the most resources free, or the fewest resources free?

In an ideal world, the bin packing scheduler should be capable of packing the appropriate processes that result in maximum utilization of the node's resources (implicitly reducing wastage of resources).

Currently, it appears that it chooses the machine with the fewest free resources, I think due to the implementation of Less, which actually tests for greater: https://github.com/docker/swarm/blob/master/scheduler/strategy/binpacking.go#L81

I think it is otherwise i.e it chooses machines with most free resources. But, I agree that the working of the Sort() method here is a bit counter intuitive given that the expectation for Sort() is elements ordered by small to big but here it the opposite.

Additionally, it seems that the total score is always zero. I think this is because dividing by 200 using integer division will always truncate to zero: https://github.com/docker/swarm/blob/master/scheduler/strategy/binpacking.go#L44 This line is equivalent to just dividing by 2; changing the code to do that seems to fix the problem.

I agree dividing by 2 makes more sense. That said dividing by 200 does not truncate to zero -- given that the score is multiplied by 100 first, so the value (unless 1) will always be greater than 200.

@jhamrick
Copy link
Contributor Author

I think it is otherwise i.e it chooses machines with most free resources.

Well, at least from testing this out and adding in debug statements to verify, it currently chooses the machine with the fewest free resources (i.e. the one with the most containers already allocated to it).

That said dividing by 200 does not truncate to zero -- given that the score is multiplied by 100 first, so the value (unless 1) will always be greater than 200.

Hmm, well I will admit I am not particularly familiar with Go, so perhaps this is not the actual reason (does Go do order of operations backwards? In every other language I know, this would divide by 200 first, then multiply by 100). Regardless, there is definitely something causing the score to truncate to zero, that is fixed by dividing by two instead.

For reference, these are the changes I made to test and fix these issues: jhamrick@0cd196d I am happy to submit a PR, but I unfortunately don't think I know enough Go to properly update the tests.

@vieux
Copy link
Contributor

vieux commented Jan 19, 2015

I'll let @aluzzardi take a look at this one.
I must say I'm surprised because we have tests and this appears to be the behaviour we want.

@vieux vieux modified the milestone: Swarm Beta 0.1.0 Jan 19, 2015
@phemmer
Copy link

phemmer commented Jan 19, 2015

The behavior of the strategy is correct. Information on the binpacking strategy can be found here: https://en.wikipedia.org/wiki/Bin_packing_problem
The idea is that you will fill a "bin" to it's maximum capacity before placing items in another "bin".

What you're probably after is a strategy which places containers on nodes with the least utilization. There is an open PR to add a strategy which does this: #227

@jhamrick
Copy link
Contributor Author

Ok, so that answers my question about how the strategy is supposed to operate -- though I think it would still be nice if this were documented and if the code were less ambiguous (and like I said, I am happy to submit a PR for this). @phemmer the strategy I'm looking for does indeed seem like the one you're working on in #227, thanks for the pointer.

There is still the other issue of the score always being zero, however. I think the tests don't catch this because (1) when the scores are all zero, the sort order stays the same so the same node is always chosen, and (2) a node is considered to be full based on cpuScore and memoryScore, not total, so nodes that are full (rightfully) don't get considered. So, for simple cases this isn't going to make a difference, but for more complex cases it certainly will.

@phemmer
Copy link

phemmer commented Jan 20, 2015

That 'divide by 200 multiply by 100' line looked like a problem to me as well. In fact I had created a commit on my personal repo to fix it. But I didn't submit it as a PR, and I thought that was because I determined it wasn't really an issue. But maybe it is, I'll go back and review it again.

@vieux
Copy link
Contributor

vieux commented Jan 20, 2015

@jhamrick feel free to add a test in scheduler/strategy/binpacking_test.go and make a PR so we understand what's wrong.

@jhamrick
Copy link
Contributor Author

See #258 for an example that causes the scheduler to fail.

jhamrick added a commit to jhamrick/swarm that referenced this issue Jan 20, 2015
@aluzzardi aluzzardi reopened this Jan 20, 2015
@vieux vieux closed this as completed in 11ebdb0 Jan 20, 2015
elferink pushed a commit to elferink/swarm that referenced this issue Mar 6, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants