-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvstreamer: account for the overhead of GetResponse and ScanResponse #97425
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
e06c981
to
ea8cee5
Compare
8caf145
to
377f06e
Compare
This shows an expected minor improvement on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r1, 1 of 2 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)
pkg/kv/kvclient/kvstreamer/size.go
line 42 at r1 (raw file):
panic("GetRequest and ScanRequest have different overheads") } scanResponseUnionOverhead := int64(unsafe.Sizeof(kvpb.ResponseUnion_Get{}))
Should it be kvpb.ResponseUnion_Scan{}
here?
pkg/kv/kvclient/kvstreamer/streamer.go
line 1014 at r1 (raw file):
minResponsesOverhead = scanResponseOverhead + responseUnionOverhead } minAcceptableBudget := minTargetBytes + minResponsesOverhead
[nit] I was confused by this code at first - it might be nice to clarify that this is an estimate for the amount of memory that will be required to return a single response, since we will make no progress without at least managing that.
377f06e
to
84b8e7d
Compare
The Streamer is careful to account for the requests (both the footprint and the overhead) as well as to estimate the footprint of the responses. However, it currently doesn't account for the overhead of the GetResponse (currently 64 bytes) and ScanResponse (120 bytes) structs. We recently saw a case where this overhead was the largest user of RAM which contributed to the pod OOMing. This commit fixes this accounting oversight in the following manner: - prior to issuing the BatchRequest, we estimate the overhead of a response to each request in the batch. Notably, the BatchResponse will contain a RequestUnion object as well as the GetResponse or ScanResponse object for each request - once the BatchResponse is received, we reconcile the budget to track the precise memory usage of the responses (ignoring the RequestUnion since we don't keep a reference to it). We already tracked the "footprint" and now we also include the "overhead" with both being released to the budget on `Result.Release` call. We track this "responses overhead" usage separately from the target bytes usage (the "footprint") since the KV server doesn't include the overhead when determining how to handle `TargetBytes` limit, and we must behave in the same manner. It's worth noting that the overhead of the response structs is proportional to the number of requests included in the BatchRequest since every request will get a corresponding (possibly empty) response. Release note: None
84b8e7d
to
40d28a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)
pkg/kv/kvclient/kvstreamer/size.go
line 42 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Should it be
kvpb.ResponseUnion_Scan{}
here?
Indeed, thanks.
pkg/kv/kvclient/kvstreamer/streamer.go
line 1014 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] I was confused by this code at first - it might be nice to clarify that this is an estimate for the amount of memory that will be required to return a single response, since we will make no progress without at least managing that.
Done. I also adjusted the logic here to better represent reality. In particular, we want to account for the overhead of the responses for all requests in the BatchRequest (since each of them will get a corresponding response struct).
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 40d28a5 to blathers/backport-release-22.2-97425: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
The Streamer is careful to account for the requests (both the footprint
and the overhead) as well as to estimate the footprint of the responses.
However, it currently doesn't account for the overhead of the GetResponse
(currently 64 bytes) and ScanResponse (120 bytes) structs. We recently
saw a case where this overhead was the largest user of RAM which
contributed to the pod OOMing. This commit fixes this accounting oversight
in the following manner:
a response to each request in the batch. Notably, the BatchResponse will
contain a RequestUnion object as well as the GetResponse or ScanResponse
object for each request
the precise memory usage of the responses (ignoring the RequestUnion
since we don't keep a reference to it). We already tracked the
"footprint" and now we also include the "overhead" with both being
released to the budget on
Result.Release
call.We track this "responses overhead" usage separate from the target bytes
usage (the "footprint") since the KV server doesn't include the overhead
when determining how to handle
TargetBytes
limit, and we must behavein the same manner.
It's worth noting that the overhead of the response structs is
proportional to the number of requests included in the BatchRequest
since every request will get a corresponding (possibly empty) response.
Fixes: #97279.
Release note: None