release-22.2: kvstreamer: account for the overhead of GetResponse and ScanResponse #97499

yuzefovich · 2023-02-22T18:39:07Z

Backport 1/1 commits from #97425.

/cc @cockroachdb/release

The Streamer is careful to account for the requests (both the footprint
and the overhead) as well as to estimate the footprint of the responses.
However, it currently doesn't account for the overhead of the GetResponse
(currently 64 bytes) and ScanResponse (120 bytes) structs. We recently
saw a case where this overhead was the largest user of RAM which
contributed to the pod OOMing. This commit fixes this accounting oversight
in the following manner:

prior to issuing the BatchRequest, we estimate the overhead of
a response to each request in the batch. Notably, the BatchResponse will
contain a RequestUnion object as well as the GetResponse or ScanResponse
object for each request
once the BatchResponse is received, we reconcile the budget to track
the precise memory usage of the responses (ignoring the RequestUnion
since we don't keep a reference to it). We already tracked the
"footprint" and now we also include the "overhead" with both being
released to the budget on Result.Release call.

We track this "responses overhead" usage separate from the target bytes
usage (the "footprint") since the KV server doesn't include the overhead
when determining how to handle TargetBytes limit, and we must behave
in the same manner.

It's worth noting that the overhead of the response structs is
proportional to the number of requests included in the BatchRequest
since every request will get a corresponding (possibly empty) response.

Fixes: #97279.

Release note: None

Release justification: stability improvement.

blathers-crl · 2023-02-22T18:39:10Z

cockroach-teamcity · 2023-02-22T18:39:22Z

This change is

yuzefovich · 2023-02-22T18:39:50Z

I'll let this bake on master for a week or so in case any of the tests become flaky.

DrewKimball

Reviewable status: complete! 1 of 0 LGTMs obtained

The Streamer is careful to account for the requests (both the footprint and the overhead) as well as to estimate the footprint of the responses. However, it currently doesn't account for the overhead of the GetResponse (currently 64 bytes) and ScanResponse (120 bytes) structs. We recently saw a case where this overhead was the largest user of RAM which contributed to the pod OOMing. This commit fixes this accounting oversight in the following manner: - prior to issuing the BatchRequest, we estimate the overhead of a response to each request in the batch. Notably, the BatchResponse will contain a RequestUnion object as well as the GetResponse or ScanResponse object for each request - once the BatchResponse is received, we reconcile the budget to track the precise memory usage of the responses (ignoring the RequestUnion since we don't keep a reference to it). We already tracked the "footprint" and now we also include the "overhead" with both being released to the budget on `Result.Release` call. We track this "responses overhead" usage separately from the target bytes usage (the "footprint") since the KV server doesn't include the overhead when determining how to handle `TargetBytes` limit, and we must behave in the same manner. It's worth noting that the overhead of the response structs is proportional to the number of requests included in the BatchRequest since every request will get a corresponding (possibly empty) response. Release note: None

yuzefovich · 2023-02-23T17:55:47Z

Looks like there might be an extraordinary 22.2.x release, and I want to get this fix in if possible. Nightlies on master didn't seem to reveal any flakes.

yuzefovich requested a review from DrewKimball February 22, 2023 18:39

yuzefovich requested a review from a team as a code owner February 22, 2023 18:39

DrewKimball approved these changes Feb 22, 2023

View reviewed changes

yuzefovich force-pushed the backport22.2-97425 branch from 23e69b3 to 2ebe7b8 Compare February 23, 2023 17:17

yuzefovich merged commit 51bbca7 into cockroachdb:release-22.2 Feb 23, 2023

yuzefovich deleted the backport22.2-97425 branch February 23, 2023 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-22.2: kvstreamer: account for the overhead of GetResponse and ScanResponse #97499

release-22.2: kvstreamer: account for the overhead of GetResponse and ScanResponse #97499

yuzefovich commented Feb 22, 2023

blathers-crl bot commented Feb 22, 2023 •

edited by yuzefovich

Loading

cockroach-teamcity commented Feb 22, 2023

yuzefovich commented Feb 22, 2023

DrewKimball left a comment

yuzefovich commented Feb 23, 2023

release-22.2: kvstreamer: account for the overhead of GetResponse and ScanResponse #97499

release-22.2: kvstreamer: account for the overhead of GetResponse and ScanResponse #97499

Conversation

yuzefovich commented Feb 22, 2023

blathers-crl bot commented Feb 22, 2023 • edited by yuzefovich Loading

cockroach-teamcity commented Feb 22, 2023

yuzefovich commented Feb 22, 2023

DrewKimball left a comment

Choose a reason for hiding this comment

yuzefovich commented Feb 23, 2023

blathers-crl bot commented Feb 22, 2023 •

edited by yuzefovich

Loading