kvstreamer: account for the overhead of GetResponse and ScanResponse #97425

yuzefovich · 2023-02-21T22:51:14Z

The Streamer is careful to account for the requests (both the footprint
and the overhead) as well as to estimate the footprint of the responses.
However, it currently doesn't account for the overhead of the GetResponse
(currently 64 bytes) and ScanResponse (120 bytes) structs. We recently
saw a case where this overhead was the largest user of RAM which
contributed to the pod OOMing. This commit fixes this accounting oversight
in the following manner:

prior to issuing the BatchRequest, we estimate the overhead of
a response to each request in the batch. Notably, the BatchResponse will
contain a RequestUnion object as well as the GetResponse or ScanResponse
object for each request
once the BatchResponse is received, we reconcile the budget to track
the precise memory usage of the responses (ignoring the RequestUnion
since we don't keep a reference to it). We already tracked the
"footprint" and now we also include the "overhead" with both being
released to the budget on Result.Release call.

We track this "responses overhead" usage separate from the target bytes
usage (the "footprint") since the KV server doesn't include the overhead
when determining how to handle TargetBytes limit, and we must behave
in the same manner.

It's worth noting that the overhead of the response structs is
proportional to the number of requests included in the BatchRequest
since every request will get a corresponding (possibly empty) response.

Fixes: #97279.

Release note: None

blathers-crl · 2023-02-21T22:51:19Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2023-02-21T22:51:28Z

This change is

yuzefovich · 2023-02-22T03:55:14Z

This shows an expected minor improvement on tpch_concurrency metric. Averaged over 20 runs it increased it from 76.75 to 79.5.

DrewKimball

Reviewed 6 of 6 files at r1, 1 of 2 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/kv/kvclient/kvstreamer/size.go line 42 at r1 (raw file):

		panic("GetRequest and ScanRequest have different overheads")
	}
	scanResponseUnionOverhead := int64(unsafe.Sizeof(kvpb.ResponseUnion_Get{}))

Should it be kvpb.ResponseUnion_Scan{} here?

pkg/kv/kvclient/kvstreamer/streamer.go line 1014 at r1 (raw file):

			minResponsesOverhead = scanResponseOverhead + responseUnionOverhead
		}
		minAcceptableBudget := minTargetBytes + minResponsesOverhead

[nit] I was confused by this code at first - it might be nice to clarify that this is an estimate for the amount of memory that will be required to return a single response, since we will make no progress without at least managing that.

The Streamer is careful to account for the requests (both the footprint and the overhead) as well as to estimate the footprint of the responses. However, it currently doesn't account for the overhead of the GetResponse (currently 64 bytes) and ScanResponse (120 bytes) structs. We recently saw a case where this overhead was the largest user of RAM which contributed to the pod OOMing. This commit fixes this accounting oversight in the following manner: - prior to issuing the BatchRequest, we estimate the overhead of a response to each request in the batch. Notably, the BatchResponse will contain a RequestUnion object as well as the GetResponse or ScanResponse object for each request - once the BatchResponse is received, we reconcile the budget to track the precise memory usage of the responses (ignoring the RequestUnion since we don't keep a reference to it). We already tracked the "footprint" and now we also include the "overhead" with both being released to the budget on `Result.Release` call. We track this "responses overhead" usage separately from the target bytes usage (the "footprint") since the KV server doesn't include the overhead when determining how to handle `TargetBytes` limit, and we must behave in the same manner. It's worth noting that the overhead of the response structs is proportional to the number of requests included in the BatchRequest since every request will get a corresponding (possibly empty) response. Release note: None

yuzefovich

TFTR!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)

pkg/kv/kvclient/kvstreamer/size.go line 42 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Should it be kvpb.ResponseUnion_Scan{} here?

Indeed, thanks.

pkg/kv/kvclient/kvstreamer/streamer.go line 1014 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] I was confused by this code at first - it might be nice to clarify that this is an estimate for the amount of memory that will be required to return a single response, since we will make no progress without at least managing that.

Done. I also adjusted the logic here to better represent reality. In particular, we want to account for the overhead of the responses for all requests in the BatchRequest (since each of them will get a corresponding response struct).

craig · 2023-02-22T05:41:03Z

Build succeeded:

Bazel Essential CI (Cockroach)

blathers-crl · 2023-02-22T05:41:13Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

error creating merge commit from 40d28a5 to blathers/backport-release-22.2-97425: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

yuzefovich added the backport-22.2.x label Feb 21, 2023

yuzefovich force-pushed the streamer-fix branch 2 times, most recently from e06c981 to ea8cee5 Compare February 22, 2023 01:19

yuzefovich marked this pull request as ready for review February 22, 2023 01:20

yuzefovich requested a review from a team as a code owner February 22, 2023 01:20

yuzefovich requested review from rytaft and DrewKimball and removed request for rytaft February 22, 2023 01:20

yuzefovich force-pushed the streamer-fix branch 2 times, most recently from 8caf145 to 377f06e Compare February 22, 2023 02:46

DrewKimball approved these changes Feb 22, 2023

View reviewed changes

yuzefovich force-pushed the streamer-fix branch from 377f06e to 84b8e7d Compare February 22, 2023 04:17

yuzefovich force-pushed the streamer-fix branch from 84b8e7d to 40d28a5 Compare February 22, 2023 04:19

yuzefovich commented Feb 22, 2023

View reviewed changes

craig bot merged commit 286b3e2 into cockroachdb:master Feb 22, 2023

yuzefovich deleted the streamer-fix branch February 22, 2023 05:50

yuzefovich mentioned this pull request Feb 22, 2023

release-22.2: kvstreamer: account for the overhead of GetResponse and ScanResponse #97499

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvstreamer: account for the overhead of GetResponse and ScanResponse #97425

kvstreamer: account for the overhead of GetResponse and ScanResponse #97425

yuzefovich commented Feb 21, 2023 •

edited

Loading

blathers-crl bot commented Feb 21, 2023

cockroach-teamcity commented Feb 21, 2023

yuzefovich commented Feb 22, 2023

DrewKimball left a comment

yuzefovich left a comment

craig bot commented Feb 22, 2023

blathers-crl bot commented Feb 22, 2023

kvstreamer: account for the overhead of GetResponse and ScanResponse #97425

kvstreamer: account for the overhead of GetResponse and ScanResponse #97425

Conversation

yuzefovich commented Feb 21, 2023 • edited Loading

blathers-crl bot commented Feb 21, 2023

cockroach-teamcity commented Feb 21, 2023

yuzefovich commented Feb 22, 2023

DrewKimball left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

craig bot commented Feb 22, 2023

blathers-crl bot commented Feb 22, 2023

yuzefovich commented Feb 21, 2023 •

edited

Loading