-
Notifications
You must be signed in to change notification settings - Fork 302
fleetctl: optimize stop/unload commands #1344
Conversation
iamruinous
commented
Sep 4, 2015
Use the Unit API instead of Units API to speed up lookups on environments with a large number of distinct services. Fixes coreos#1343
Cool modification. |
lgtm, @jonboulle ? |
I'm slightly wary of this because presumably there's some threshold where calling |
@jonboulle , yes cAPI.Units are very useful in some scenarios. So we should not remove cAPI.Units. We just remove findUnits. In this case, absolutely, no need to call cAPI.Units. |
@wuqixuan I don't really understand; my point is that in the case of someone running, say, |
@jonboulle , I got what you mean. Maybe you guess, if the number of units is bigger than a certain one, the individual look ups method will be slower than the SINGLE Units() RPC. That's why you suggested to have a limitation check. Some case use Units(), some case use individual look ups. I think some stress testing can be tested. If it's ok for the individual look ups method, that's fine. If the case is like what you said, maybe we can have a better algorithm. |
Right, that is what I meant. I also wonder how this will be affected by #1260 - perhaps that will obviate it? |
@jonboulle @wuqixuan @iamruinous @mischief yes this needs more thoughts, for me this doesn't look the best way to go. It seems that this is the same logic that was used in this patch that was merged: #1341 , that patch removed two extra loops that were iterating over all units! So was it really an optimization of multiple "Unit()" calls vs one "Units()" call or is it due to those extra loops ? The right thing to do is to fetch all units at the beginning and filter them, with one RPC call "Units()" then have one loop do the real operation, state check or whatever and you have to deal with races in case a unit disappear or does not have the expected state and so on... that's part of fleet design so just check or ignore those errors and go on, and later if you want to wait for the desired state you will have another loop that waits for all previous units. So in this case removing that Units() call and replacing it with hard coded cAPI.Unit() inside a loop does not seem a real optimization, unless I'm missing something ? could we please have some benchmark numbers ? In that case yes we have to change it and merge the patches. Thank you! |
@jonboulle I agree with @tixxdz. There should be one loop for the real operations. |
Clearly it's not this simple, or this patch would have never been submitted. Per #1344 (comment) there is presumably a point at which single calls become more efficient. @tixxdz can you try some basic benchmarking to get some more information? |
@jonboulle yes I'm on it, I've identified also some call sites, we clearly need to reduce of course the number of loops by hooking some callback functions to directly operate on single units that were retrieved with one single Units() call. Yes I'll do some benchmarking and try to write that as generic as possible. This needs to be done before the overwrite #1295 stuff since with that we have to operate on different units that are in different states, so for the moment I'm on this one. Thanks |
One thing also if we do that single Unit() call this just moves the real problem from the client to the agent which in return add more processing and load on and so on... we clearly need a better way here. The trade off should be on the client side |
@tixxdz although I haven't tried actually profiling this, I'm fairly confident the performance has got nothing to do with how many loops the client does -- the cost is most likely pretty much entirely in the RPCs to the server. So the actual question at hand is when it's cheaper to do a single Units() RPC that retrieves all units (including those we are not interested in), and when it's cheaper to do individual Unit() RPCs for each of the units we are interested in. The problem this PR is trying to address is that if there is a huge amount of units present, the Units() call will always be very expensive, even if we are actually interested in only one of them. If on the other hand we indeed want to operate on a large percentage of all units at once, retrieving all say 1000 units in one call might still be cheaper than getting only the 500 we are interested in one by one. However, creating a heuristic for that would be hairy; and quite frankly, the latter situation seems like a niche case -- I presume that in an overwhelming majority of cases, people will only operate on few units, in which case this patch provides a huge boost; while for the rare cases where a request really covers a large percentage of all units, it would be slow anyways, and the extra slowdown doesn't really change the overall picture all that much... So IMHO that should be fine to merge as is. If we really want to provide for all eventualities, the right solution seems introducing a new RPC that takes a list of units, and retrieves just those in the list. However, until someone actually has a use case where this really matters, I'd tend to consider it overkill... |
Here is the script which submits several units into fleet. |
@iamruinous would it be possible if you could provide some benchmark comparisons ? currently Units() performs only one RPC call to retrieve recursively the list of all units, same thing for job entries, so the pressure will be a little bit on etcd to do extra I/O operations... the difference is not that high only if etcd I/O operations take longer... |
So for Unit() vs Units() basic benchmarking with 500 units from a client perspective real 0m0.721s Not sure if this is enough... but Why I think just using Units() is ok for now: it allows us to have the same consistency through all code paths, same error checks, and it doesn't take that much unless etcd disk I/O operations are heavy and there is lot of information to transfer (data fragmentation... etc) which in that case using For me and as all the comments here have suggested: the approach is to have concluding benchmark results, do the change in a high level function (code consistency) and make it smart enough so it can switch from |
For the record, I stand by my earlier analysis: I think this is a good change, heavily optimising the common case in exchange for a smaller regression in the less common case. (And a case distinction seems problematic, as it depends on the number of existing units, not only the number of units affected by the current command...) Regarding consistency, I do agree in a way: it should be changed to use |
@antrik could you please confirm with some benchmark results etc... ? |
@kayrus well, from a quick glance, it seems #1376 somewhat mitigates the cost of doing a |
14580b0
to
63b20dc
Compare
23d4b13
to
6fb1256
Compare
54409ab
to
a1a21b8
Compare
150d30c
to
4002bf5
Compare
3b60e93
to
875d938
Compare
bdc94e8
to
fa5aa3a
Compare
20a3e96
to
3aaa1ab
Compare
eb6872f
to
365565e
Compare
I did some benchmarks. (time duration, in seconds)
1.1. stop
1.2. unload
With the patch it's better.
2.1. stop
2.2. unload
It's nearly the same, with the patch it's slightly better.
3.1. stop
3.2. unload
Without the patch it's better. When the number of units is higher than 500, it's always better without the patch. So I'd assume, we need to get data only up to 500 units. Do we need to set a threshold to toggle a some kind of So I suggest to not merge this PR. Let's just close. (I also tried to introduce an internal cache for units, to optimize performance in every case. That didn't bring anything, but other unexpected side-effects. Let's not go that way..) |
I'll close it. If anyone has other ideas, please let me know. |
Thanks for the detailed analysis! |