Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvstreamer: account for the overhead of GetResponse and ScanResponse #97425

Merged
merged 1 commit into from
Feb 22, 2023

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Feb 21, 2023

The Streamer is careful to account for the requests (both the footprint
and the overhead) as well as to estimate the footprint of the responses.
However, it currently doesn't account for the overhead of the GetResponse
(currently 64 bytes) and ScanResponse (120 bytes) structs. We recently
saw a case where this overhead was the largest user of RAM which
contributed to the pod OOMing. This commit fixes this accounting oversight
in the following manner:

  • prior to issuing the BatchRequest, we estimate the overhead of
    a response to each request in the batch. Notably, the BatchResponse will
    contain a RequestUnion object as well as the GetResponse or ScanResponse
    object for each request
  • once the BatchResponse is received, we reconcile the budget to track
    the precise memory usage of the responses (ignoring the RequestUnion
    since we don't keep a reference to it). We already tracked the
    "footprint" and now we also include the "overhead" with both being
    released to the budget on Result.Release call.

We track this "responses overhead" usage separate from the target bytes
usage (the "footprint") since the KV server doesn't include the overhead
when determining how to handle TargetBytes limit, and we must behave
in the same manner.

It's worth noting that the overhead of the response structs is
proportional to the number of requests included in the BatchRequest
since every request will get a corresponding (possibly empty) response.

Fixes: #97279.

Release note: None

@blathers-crl
Copy link

blathers-crl bot commented Feb 21, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the streamer-fix branch 2 times, most recently from e06c981 to ea8cee5 Compare February 22, 2023 01:19
@yuzefovich yuzefovich marked this pull request as ready for review February 22, 2023 01:20
@yuzefovich yuzefovich requested a review from a team as a code owner February 22, 2023 01:20
@yuzefovich yuzefovich requested review from rytaft and DrewKimball and removed request for rytaft February 22, 2023 01:20
@yuzefovich yuzefovich force-pushed the streamer-fix branch 2 times, most recently from 8caf145 to 377f06e Compare February 22, 2023 02:46
@yuzefovich
Copy link
Member Author

This shows an expected minor improvement on tpch_concurrency metric. Averaged over 20 runs it increased it from 76.75 to 79.5.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 6 of 6 files at r1, 1 of 2 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/kv/kvclient/kvstreamer/size.go line 42 at r1 (raw file):

		panic("GetRequest and ScanRequest have different overheads")
	}
	scanResponseUnionOverhead := int64(unsafe.Sizeof(kvpb.ResponseUnion_Get{}))

Should it be kvpb.ResponseUnion_Scan{} here?


pkg/kv/kvclient/kvstreamer/streamer.go line 1014 at r1 (raw file):

			minResponsesOverhead = scanResponseOverhead + responseUnionOverhead
		}
		minAcceptableBudget := minTargetBytes + minResponsesOverhead

[nit] I was confused by this code at first - it might be nice to clarify that this is an estimate for the amount of memory that will be required to return a single response, since we will make no progress without at least managing that.

The Streamer is careful to account for the requests (both the footprint
and the overhead) as well as to estimate the footprint of the responses.
However, it currently doesn't account for the overhead of the GetResponse
(currently 64 bytes) and ScanResponse (120 bytes) structs. We recently
saw a case where this overhead was the largest user of RAM which
contributed to the pod OOMing. This commit fixes this accounting oversight
in the following manner:
- prior to issuing the BatchRequest, we estimate the overhead of
a response to each request in the batch. Notably, the BatchResponse will
contain a RequestUnion object as well as the GetResponse or ScanResponse
object for each request
- once the BatchResponse is received, we reconcile the budget to track
the precise memory usage of the responses (ignoring the RequestUnion
since we don't keep a reference to it). We already tracked the
"footprint" and now we also include the "overhead" with both being
released to the budget on `Result.Release` call.

We track this "responses overhead" usage separately from the target bytes
usage (the "footprint") since the KV server doesn't include the overhead
when determining how to handle `TargetBytes` limit, and we must behave
in the same manner.

It's worth noting that the overhead of the response structs is
proportional to the number of requests included in the BatchRequest
since every request will get a corresponding (possibly empty) response.

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)


pkg/kv/kvclient/kvstreamer/size.go line 42 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Should it be kvpb.ResponseUnion_Scan{} here?

Indeed, thanks.


pkg/kv/kvclient/kvstreamer/streamer.go line 1014 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] I was confused by this code at first - it might be nice to clarify that this is an estimate for the amount of memory that will be required to return a single response, since we will make no progress without at least managing that.

Done. I also adjusted the logic here to better represent reality. In particular, we want to account for the overhead of the responses for all requests in the BatchRequest (since each of them will get a corresponding response struct).

@craig
Copy link
Contributor

craig bot commented Feb 22, 2023

Build succeeded:

@craig craig bot merged commit 286b3e2 into cockroachdb:master Feb 22, 2023
@blathers-crl
Copy link

blathers-crl bot commented Feb 22, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 40d28a5 to blathers/backport-release-22.2-97425: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvstreamer: account for the overhead of the responses
3 participants