fleetctl: optimize stop/unload commands #1344

iamruinous · 2015-09-04T16:42:45Z

Use the Unit API instead of Units API to speed up lookups on
environments with a large number of distinct services.

Fixes #1343

Use the Unit API instead of Units API to speed up lookups on environments with a large number of distinct services. Fixes coreos#1343

wuqixuan · 2015-09-06T09:10:22Z

Cool modification.

mischief · 2015-09-23T19:22:50Z

lgtm, @jonboulle ?

jonboulle · 2015-09-23T21:02:23Z

I'm slightly wary of this because presumably there's some threshold where calling Units() is more efficient than all the round trips of individual look ups - perhaps we could have a sanity limit?

wuqixuan · 2015-09-24T01:36:01Z

@jonboulle , yes cAPI.Units are very useful in some scenarios. So we should not remove cAPI.Units. We just remove findUnits. In this case, absolutely, no need to call cAPI.Units.
So, lgtm for this solution.

jonboulle · 2015-09-28T23:49:48Z

@wuqixuan I don't really understand; my point is that in the case of someone running, say, fleetctl stop my_service{1..200}, this will now involve 200 x RPCs to fleet instead of one.

wuqixuan · 2015-09-29T02:58:50Z

@jonboulle , I got what you mean. Maybe you guess, if the number of units is bigger than a certain one, the individual look ups method will be slower than the SINGLE Units() RPC. That's why you suggested to have a limitation check. Some case use Units(), some case use individual look ups.

I think some stress testing can be tested. If it's ok for the individual look ups method, that's fine. If the case is like what you said, maybe we can have a better algorithm.

jonboulle · 2015-10-01T19:36:28Z

Right, that is what I meant. I also wonder how this will be affected by #1260 - perhaps that will obviate it?

tixxdz · 2016-02-29T10:56:32Z

@jonboulle @wuqixuan @iamruinous @mischief yes this needs more thoughts, for me this doesn't look the best way to go. It seems that this is the same logic that was used in this patch that was merged: #1341 , that patch removed two extra loops that were iterating over all units! So was it really an optimization of multiple "Unit()" calls vs one "Units()" call or is it due to those extra loops ?

The right thing to do is to fetch all units at the beginning and filter them, with one RPC call "Units()" then have one loop do the real operation, state check or whatever and you have to deal with races in case a unit disappear or does not have the expected state and so on... that's part of fleet design so just check or ignore those errors and go on, and later if you want to wait for the desired state you will have another loop that waits for all previous units.

So in this case removing that Units() call and replacing it with hard coded cAPI.Unit() inside a loop does not seem a real optimization, unless I'm missing something ? could we please have some benchmark numbers ? In that case yes we have to change it and merge the patches.

Thank you!

kayrus · 2016-02-29T11:02:11Z

@jonboulle I agree with @tixxdz. There should be one loop for the real operations.

jonboulle · 2016-02-29T11:59:46Z

So in this case removing that Units() call and replacing it with hard coded cAPI.Unit() inside a loop does not seem a real optimization, unless I'm missing something ?

Clearly it's not this simple, or this patch would have never been submitted. Per #1344 (comment) there is presumably a point at which single calls become more efficient.

@tixxdz can you try some basic benchmarking to get some more information?

tixxdz · 2016-02-29T12:06:07Z

@jonboulle yes I'm on it, I've identified also some call sites, we clearly need to reduce of course the number of loops by hooking some callback functions to directly operate on single units that were retrieved with one single Units() call. Yes I'll do some benchmarking and try to write that as generic as possible. This needs to be done before the overwrite #1295 stuff since with that we have to operate on different units that are in different states, so for the moment I'm on this one. Thanks

tixxdz · 2016-02-29T12:08:24Z

One thing also if we do that single Unit() call this just moves the real problem from the client to the agent which in return add more processing and load on and so on... we clearly need a better way here. The trade off should be on the client side

antrik · 2016-02-29T18:59:03Z

@tixxdz although I haven't tried actually profiling this, I'm fairly confident the performance has got nothing to do with how many loops the client does -- the cost is most likely pretty much entirely in the RPCs to the server. So the actual question at hand is when it's cheaper to do a single Units() RPC that retrieves all units (including those we are not interested in), and when it's cheaper to do individual Unit() RPCs for each of the units we are interested in.

The problem this PR is trying to address is that if there is a huge amount of units present, the Units() call will always be very expensive, even if we are actually interested in only one of them. If on the other hand we indeed want to operate on a large percentage of all units at once, retrieving all say 1000 units in one call might still be cheaper than getting only the 500 we are interested in one by one.

However, creating a heuristic for that would be hairy; and quite frankly, the latter situation seems like a niche case -- I presume that in an overwhelming majority of cases, people will only operate on few units, in which case this patch provides a huge boost; while for the rare cases where a request really covers a large percentage of all units, it would be slow anyways, and the extra slowdown doesn't really change the overall picture all that much... So IMHO that should be fine to merge as is.

If we really want to provide for all eventualities, the right solution seems introducing a new RPC that takes a list of units, and retrieves just those in the list. However, until someone actually has a use case where this really matters, I'd tend to consider it overkill...

kayrus · 2016-03-01T09:20:33Z

Here is the script which submits several units into fleet.

tixxdz · 2016-04-06T14:33:40Z

@iamruinous would it be possible if you could provide some benchmark comparisons ? currently Units() performs only one RPC call to retrieve recursively the list of all units, same thing for job entries, so the pressure will be a little bit on etcd to do extra I/O operations... the difference is not that high only if etcd I/O operations take longer...
Having a comparison here may help, and ideally as noted by others something smart enough to chose which path to take based on the number of units to query will be the best solution.

tixxdz · 2016-04-08T09:31:34Z

@jonboulle

Clearly it's not this simple, or this patch would have never been submitted. Per #1344 (comment) there is presumably a point at which single calls become more efficient.
@tixxdz can you try some basic benchmarking to get some more information?

So for Unit() vs Units() basic benchmarking with 500 units from a client perspective fleetctl status (printing to output + ssh connection...) showed up:
real 0m1.621s
user 0m0.026s
sys 0m0.013s

real 0m0.721s
user 0m0.021s
sys 0m0.012s

Not sure if this is enough... but Units() API currently performs only one RPC for all units, it instructs etcd to perform a recursive call into /unit/ , same thing for /job/
https://github.com/coreos/fleet/blob/master/registry/job.go#L92
https://github.com/coreos/fleet/blob/master/registry/job.go#L104

Why I think just using Units() is ok for now: it allows us to have the same consistency through all code paths, same error checks, and it doesn't take that much unless etcd disk I/O operations are heavy and there is lot of information to transfer (data fragmentation... etc) which in that case using Unit() to query a small number of units would be fine.

For me and as all the comments here have suggested: the approach is to have concluding benchmark results, do the change in a high level function (code consistency) and make it smart enough so it can switch from Unit() to Units() based on the supplied units.

antrik · 2016-04-08T12:38:21Z

For the record, I stand by my earlier analysis: I think this is a good change, heavily optimising the common case in exchange for a smaller regression in the less common case. (And a case distinction seems problematic, as it depends on the number of existing units, not only the number of units affected by the current command...)

Regarding consistency, I do agree in a way: it should be changed to use Unit for all the commands where this consideration applies.

tixxdz · 2016-04-08T13:44:46Z

@antrik could you please confirm with some benchmark results etc... ?

kayrus · 2016-04-19T10:47:01Z

I suppose optimization was already implemented within these PRs: #1376, #1433, #1520

antrik · 2016-04-19T14:16:14Z

@kayrus well, from a quick glance, it seems #1376 somewhat mitigates the cost of doing a Units() call, by avoiding making individual calls for each unit in addition to doing the initial lookup of all units? This means that this PR here would have been an even clearer win before (not a tradeoff like it is now) -- but the discussions above apply to the code as it is now.

dongsupark · 2016-11-28T16:43:13Z

I did some benchmarks. (time duration, in seconds)

number of units == 100

1.1. stop

Without the patch : 3.46
With the patch : 2.97

1.2. unload

Without the patch : 2.78
With the patch : 2.83

With the patch it's better.

number of units == 300

2.1. stop

Without the patch : 11.00
With the patch : 9.89

2.2. unload

Without the patch : 7.04
With the patch : 6.91

It's nearly the same, with the patch it's slightly better.

number of units == 500

3.1. stop

Without the patch : 36.31
With the patch : 41.70

3.2. unload

Without the patch : 11.78
With the patch : 12.01

Without the patch it's better.

When the number of units is higher than 500, it's always better without the patch. So I'd assume, we need to get data only up to 500 units.
So the turning point would be around 300 units. Technically speaking, it's better to change to Unit() when it's less than 300, while it's better to keep Units() when it's more than 300.
However, most of scalability issues occur when there are huge number of units, more than 1k, or even 10k units. So in the long run, it's definitely better to keep Units() call.

Do we need to set a threshold to toggle a some kind of Units() mode, around 300 units? Doable, but I don't think it's worth doing that.

So I suggest to not merge this PR. Let's just close.

(I also tried to introduce an internal cache for units, to optimize performance in every case. That didn't bring anything, but other unexpected side-effects. Let's not go that way..)

dongsupark · 2016-11-30T10:03:02Z

I'll close it. If anyone has other ideas, please let me know.

jonboulle · 2016-11-30T10:10:19Z

Thanks for the detailed analysis!

fleetctl: optimize stop/unload commands

5a8d47b

Use the Unit API instead of Units API to speed up lookups on environments with a large number of distinct services. Fixes coreos#1343

jonboulle added the needs review label Feb 10, 2016

jonboulle added this to the v0.13.0 milestone Feb 10, 2016

This was referenced Mar 2, 2016

usability: significant performance/usability issues with fleetctl batch unit processing and slow etcd storage #1455

Closed

usability: garbage collection in /_coreos.com/fleet/job registry directory #1456

Open

antrik mentioned this pull request Mar 14, 2016

fleetctl: destroy using wildcards can search in the repository #1256

Closed

tixxdz added reviewed/needs more information area/performance labels Apr 6, 2016

tixxdz added the priority/Pmaybe label Apr 8, 2016

tixxdz added this to the v1.0.0 milestone Apr 25, 2016

dongsupark force-pushed the master branch from 0338168 to 56b539f Compare May 23, 2016 17:57

dongsupark force-pushed the master branch 2 times, most recently from 14580b0 to 63b20dc Compare June 22, 2016 10:22

dongsupark force-pushed the master branch 3 times, most recently from 23d4b13 to 6fb1256 Compare July 1, 2016 11:07

dongsupark force-pushed the master branch 2 times, most recently from 54409ab to a1a21b8 Compare July 8, 2016 11:38

dongsupark force-pushed the master branch 3 times, most recently from 150d30c to 4002bf5 Compare August 16, 2016 08:54

dongsupark force-pushed the master branch 7 times, most recently from 3b60e93 to 875d938 Compare August 23, 2016 11:00

dongsupark force-pushed the master branch 2 times, most recently from bdc94e8 to fa5aa3a Compare August 30, 2016 12:26

dongsupark force-pushed the master branch 3 times, most recently from 20a3e96 to 3aaa1ab Compare November 10, 2016 15:24

dongsupark force-pushed the master branch 2 times, most recently from eb6872f to 365565e Compare November 24, 2016 15:35

dongsupark closed this Nov 30, 2016

dongsupark mentioned this pull request Nov 30, 2016

fleetctl stop is slow with large number of units #1343

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fleetctl: optimize stop/unload commands #1344

fleetctl: optimize stop/unload commands #1344

iamruinous commented Sep 4, 2015

wuqixuan commented Sep 6, 2015

mischief commented Sep 23, 2015

jonboulle commented Sep 23, 2015

wuqixuan commented Sep 24, 2015

jonboulle commented Sep 28, 2015

wuqixuan commented Sep 29, 2015

jonboulle commented Oct 1, 2015

tixxdz commented Feb 29, 2016

kayrus commented Feb 29, 2016

jonboulle commented Feb 29, 2016

tixxdz commented Feb 29, 2016

tixxdz commented Feb 29, 2016

antrik commented Feb 29, 2016

kayrus commented Mar 1, 2016

tixxdz commented Apr 6, 2016

tixxdz commented Apr 8, 2016

antrik commented Apr 8, 2016

tixxdz commented Apr 8, 2016

kayrus commented Apr 19, 2016 •

edited

Loading

antrik commented Apr 19, 2016

dongsupark commented Nov 28, 2016

dongsupark commented Nov 30, 2016

jonboulle commented Nov 30, 2016

fleetctl: optimize stop/unload commands #1344

fleetctl: optimize stop/unload commands #1344

Conversation

iamruinous commented Sep 4, 2015

wuqixuan commented Sep 6, 2015

mischief commented Sep 23, 2015

jonboulle commented Sep 23, 2015

wuqixuan commented Sep 24, 2015

jonboulle commented Sep 28, 2015

wuqixuan commented Sep 29, 2015

jonboulle commented Oct 1, 2015

tixxdz commented Feb 29, 2016

kayrus commented Feb 29, 2016

jonboulle commented Feb 29, 2016

tixxdz commented Feb 29, 2016

tixxdz commented Feb 29, 2016

antrik commented Feb 29, 2016

kayrus commented Mar 1, 2016

tixxdz commented Apr 6, 2016

tixxdz commented Apr 8, 2016

antrik commented Apr 8, 2016

tixxdz commented Apr 8, 2016

kayrus commented Apr 19, 2016 • edited Loading

antrik commented Apr 19, 2016

dongsupark commented Nov 28, 2016

dongsupark commented Nov 30, 2016

jonboulle commented Nov 30, 2016

kayrus commented Apr 19, 2016 •

edited

Loading