-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-22.2: kvstreamer: account for the overhead of GetResponse and ScanResponse #97499
release-22.2: kvstreamer: account for the overhead of GetResponse and ScanResponse #97499
Conversation
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
I'll let this bake on master for a week or so in case any of the tests become flaky. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained
The Streamer is careful to account for the requests (both the footprint and the overhead) as well as to estimate the footprint of the responses. However, it currently doesn't account for the overhead of the GetResponse (currently 64 bytes) and ScanResponse (120 bytes) structs. We recently saw a case where this overhead was the largest user of RAM which contributed to the pod OOMing. This commit fixes this accounting oversight in the following manner: - prior to issuing the BatchRequest, we estimate the overhead of a response to each request in the batch. Notably, the BatchResponse will contain a RequestUnion object as well as the GetResponse or ScanResponse object for each request - once the BatchResponse is received, we reconcile the budget to track the precise memory usage of the responses (ignoring the RequestUnion since we don't keep a reference to it). We already tracked the "footprint" and now we also include the "overhead" with both being released to the budget on `Result.Release` call. We track this "responses overhead" usage separately from the target bytes usage (the "footprint") since the KV server doesn't include the overhead when determining how to handle `TargetBytes` limit, and we must behave in the same manner. It's worth noting that the overhead of the response structs is proportional to the number of requests included in the BatchRequest since every request will get a corresponding (possibly empty) response. Release note: None
23e69b3
to
2ebe7b8
Compare
Looks like there might be an extraordinary 22.2.x release, and I want to get this fix in if possible. Nightlies on master didn't seem to reveal any flakes. |
Backport 1/1 commits from #97425.
/cc @cockroachdb/release
The Streamer is careful to account for the requests (both the footprint
and the overhead) as well as to estimate the footprint of the responses.
However, it currently doesn't account for the overhead of the GetResponse
(currently 64 bytes) and ScanResponse (120 bytes) structs. We recently
saw a case where this overhead was the largest user of RAM which
contributed to the pod OOMing. This commit fixes this accounting oversight
in the following manner:
a response to each request in the batch. Notably, the BatchResponse will
contain a RequestUnion object as well as the GetResponse or ScanResponse
object for each request
the precise memory usage of the responses (ignoring the RequestUnion
since we don't keep a reference to it). We already tracked the
"footprint" and now we also include the "overhead" with both being
released to the budget on
Result.Release
call.We track this "responses overhead" usage separate from the target bytes
usage (the "footprint") since the KV server doesn't include the overhead
when determining how to handle
TargetBytes
limit, and we must behavein the same manner.
It's worth noting that the overhead of the response structs is
proportional to the number of requests included in the BatchRequest
since every request will get a corresponding (possibly empty) response.
Fixes: #97279.
Release note: None
Release justification: stability improvement.