Optimize RangeRequest Revision count #16510

mehvaibh · 2023-08-30T19:32:20Z

What would you like to be added?

Present Behavior

Currently ETCD iterates on all matching keys in MVCC for Range query to calculate count in RangeResponse
Although it returns limited number of revisions from key(included) to end(excluded) at the given rev but iterates on all revisions to calculate the total number of revisions.

Change Requested

New no-count mode in RangeRequest structure.
By default, it will be false to not break current ETCD RangeResponse.Count contract and keep backward compatibility.

When explicitly set in RangeRequest , then ETCD will respect limit and break without iterating over all remaining revisions and returns fake Count in RangeResponse which client/consumers should ignore.

This is motivated by some previous PR and discussions, in which limit enforcement has to be reverted because of breaking ETCD contract and thus breaking K8s remainingItemCount feature.
Previous PR references:

Why is this needed?

Performance Improvement for Paginated Range query.

As with no-count feature, ETCD does not need to count all revisions to calculate Count for RangeResponse.
This means retrieving X objects time is not dependent on total number of matching objects in ETCD.

Pasting some results from my experiment.
Here :

Non-Optimized server refer to ETCD build from main branch
Optimized server just have one change ,i.e. break revisions loop when limit is reached.

Etcd InstanceType: m5.4xlarge

Test cmd: Get first 2 game-config objects from large number of game-configs present in ETCD (Column 2 in below results)

time etcdctl get --prefix /registry/configmaps/default/game-config --keys-only --limit 2

SNO	# ConfigMap	Non optimized server(ms)	Optimized server(ms)	% gain (total time)	Time only for Revisions() Non-Optimized	Time only for Revisions() Optimized
1	10,000	20	18	10	600µs	6.348µs
2	50,000	23	18	21.73913	7.703931ms	6.348µs
3	100,000	32	18	43.75	16.22648ms	6.348µs
4	500,000	200	18	91	171.659658ms	6.348µs

Time only for Revisions() columns in above measures time only for this code block

Please let me know if this change request look good then I can prepare PR for code changes.

The text was updated successfully, but these errors were encountered:

mehvaibh · 2023-08-31T17:34:33Z

would like your opinion on this
@ahrtr @serathius @chaochn47 @geetasg

serathius · 2023-08-31T17:37:52Z

Kubernetes depends on count, so doesn't look like this feature will get much use.

ahrtr · 2023-08-31T18:49:43Z

Making the condition being false (e.g. no sorting, etc.) can get the same better performance? @mehvaibh

mehvaibh · 2023-08-31T19:15:32Z

Kubernetes depends on count, so doesn't look like this feature will get much use.

Yes, API Server uses count for calculating remainingItemCount.

If we implement this feature, API Server can also take advantage of this.
I have a separate proposal for that which I will start discussion with API Server.

But here, I wanted to check feasibility from ETCD POV.

mehvaibh · 2023-08-31T19:23:07Z

Making the condition being false (e.g. no sorting, etc.) can get the same better performance?

Not sure I understand this correctly but seems like revisions will loop on to calculate total even if limit is less than total. Pls correct me if I am wrong.

ahrtr · 2023-09-01T09:49:38Z

In your example, you set the limit to 2. If you change it to a larger limit (e.g. 1K), I think the performance difference may be negligible (can you double check that?). In large range query, the bottleneck should be reading data from the storage engine instead of counting the total number in memory.

Adding the no-count feature only makes sense for the extreme use case (e.g. get only 2 records from a large number of matched records e.g. 500K). Also note that it's only valid when no sort is needed.

serathius · 2023-09-01T12:45:07Z

I think Kubernetes would benefit much more from serving pagination from watch cache instead of trying to optimise patination on etcd side, but I'm biased by my own work https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/2340-Consistent-reads-from-cache.

cc @mborsz

mehvaibh · 2023-09-06T19:24:41Z

time ./etcdctl get --prefix /registry/configmaps/default  --limit 2000 > /dev/null

SNO	# ConfigMap	Non optimized server(ms)	Optimized server(ms)	% gain (total time)
1	50,000	30	21	30
2	100,000	40	21	47.5

@ahrtr I tried with 2K page limit. The optimized server(respects limit) latency remains the same while non-optimized server(current ETCD) latencies increases with number of objects.

Also note that it's only valid when no sort is needed.

Agree. Sort range queries overrides the limit and thus all objects needs to be retrieved regardless.

This optimization can benefit no sort paginated range request.

mehvaibh · 2023-09-06T19:33:27Z

I think Kubernetes would benefit much more from serving pagination from watch cache instead of trying to optimise patination on etcd side, but I'm biased by my own work

@serathius Thanks for the update!

I think this optimization can also help reduce paginated no sort range queries latencies specially when number of objects are high. As ETCD needs to spend relatively less time completing paginated no sort range transactions it will be helpful in large DB with high number of objects count and under significant load (i.e. varying mix of read/write transactions).
Please correct me if my understanding is wrong.

ahrtr · 2023-09-07T07:50:19Z

@ahrtr I tried with 2K page limit. The optimized server(respects limit) latency remains the same while non-optimized server(current ETCD) latencies increases with number of objects.

thx for the feedback. I am not against it as long as the corresponding K8s's proposal can be accepted [in other words, as long as there are valid use cases].

fuweid · 2023-09-07T08:31:40Z

This proposal looks like continue field in the kubernetes API or the more field in the Range response.
And the apiserver only uses the more field for pagination but the GA feature RemainingItemCount relies on this Count.

https://github.com/kubernetes/apiserver/blob/ea59eb34554704ce1e2336717c668f16f763bc65/pkg/storage/etcd3/store.go#L732

Most of times, the client should get result from apiserver's cache. But the client can still force query to etcd cluster since the kubernetes allows user to do that. I think it shoud be good if we can prevent burst cpu usage from comparing all keyspaces.

serathius · 2023-09-07T08:46:32Z

And the apiserver only uses the more field for pagination but the GA feature RemainingItemCount relies on this Count.

This ^, Kubernetes will not break GA feature. Implementing this on etcd is worthless until you make proposal that is accepted on Kubernetes side.

But the client can still force query to etcd cluster since the kubernetes allows user to do that

Not true, watch cache is just not used for all types of requests. KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/2340-Consistent-reads-from-cache is about broadening types of requests that are served from apiserver. End goal is to serve all requests from cache.

siyuanfoundation · 2023-10-05T21:58:49Z

This feature can be useful to check the health of serializable read: use the first key to check reading from bolt db is ok. Existing check uses Range(non-exist key), which does not hit the db file. More context in #16007

mehvaibh added the type/feature label Aug 30, 2023

jmhbnz added the area/performance label Aug 31, 2023

chaochn47 mentioned this issue Sep 26, 2023

Hook up etcd to kubernetes' prow infrastructure to run CI jobs #16653

Closed

hakuna-matatah mentioned this issue Apr 3, 2024

Optimize retrieving remainingItemCount from etcd for APIServer List chunking calls for performance gain on etcd kubernetes/enhancements#4567

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize RangeRequest Revision count #16510

Optimize RangeRequest Revision count #16510

mehvaibh commented Aug 30, 2023 •

edited

Loading

mehvaibh commented Aug 31, 2023

serathius commented Aug 31, 2023

ahrtr commented Aug 31, 2023

mehvaibh commented Aug 31, 2023

mehvaibh commented Aug 31, 2023

ahrtr commented Sep 1, 2023

serathius commented Sep 1, 2023

mehvaibh commented Sep 6, 2023 •

edited

Loading

mehvaibh commented Sep 6, 2023

ahrtr commented Sep 7, 2023

fuweid commented Sep 7, 2023

serathius commented Sep 7, 2023

siyuanfoundation commented Oct 5, 2023

Optimize RangeRequest Revision count #16510

Optimize RangeRequest Revision count #16510

Comments

mehvaibh commented Aug 30, 2023 • edited Loading

What would you like to be added?

Present Behavior

Change Requested

Why is this needed?

mehvaibh commented Aug 31, 2023

serathius commented Aug 31, 2023

ahrtr commented Aug 31, 2023

mehvaibh commented Aug 31, 2023

mehvaibh commented Aug 31, 2023

ahrtr commented Sep 1, 2023

serathius commented Sep 1, 2023

mehvaibh commented Sep 6, 2023 • edited Loading

mehvaibh commented Sep 6, 2023

ahrtr commented Sep 7, 2023

fuweid commented Sep 7, 2023

serathius commented Sep 7, 2023

siyuanfoundation commented Oct 5, 2023

mehvaibh commented Aug 30, 2023 •

edited

Loading

mehvaibh commented Sep 6, 2023 •

edited

Loading