Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvstreamer: improve the avg response size heuristic #83472

Merged
merged 1 commit into from
Jun 29, 2022

Conversation

yuzefovich
Copy link
Member

This commit improves the heuristic we use for estimating the average
response size. Previously, we used a simple average, and now we multiple
the average by 1.5.

Having this multiple is good for a couple of reasons:

  • this allows us to fulfill requests that are slightly larger than the
    current average. For example, imagine that we're processing three
    requests of sizes 100B, 110B, and 120B sequentially, one at a time.
    Without the multiple, after the first request, our estimate would be 100B
    so the second request would come back empty (with ResumeNextBytes=110),
    so we'd have to re-issue the second request. At that point the average is
    105B, so the third request would again come back empty and need to be
    re-issued with larger TargetBytes. Having the multiple allows us to
    handle such a scenario without any requests coming back empty. In
    particular, TPCH Q17 has similar setup.
  • this allows us to slowly grow the TargetBytes parameter over time when
    requests can be returned partially multiple times (i.e. Scan requests
    spanning multiple rows). Consider a case when a single Scan request has
    to return 1MB worth of data, but each row is only 100B. With the initial
    estimate of 1KB, every request would always come back with exactly 10
    rows, and the avg response size would always stay at 1KB. We'd end up
    issuing 1000 of such requests. Having a multiple here allows us to grow
    the estimate over time, reducing the total number of requests needed.

This multiple seems to fix the remaining perf regression on Q17 when
comparing against the streamer OFF config.

This commit also introduces a cluster setting that controls this
multiple. Value of 1.5 was chosen using tpchvec/bench and this
setting.

Additionally, I introduced a similar cluster setting for the initial avg
response size estimate (currently hard-coded at 1KiB) and used
tpchvec/bench, and it showed that 1KiB value is pretty good. It was
also the value mentioned in the RFC, so I decided to remove the
corresponding setting.

Addresses: #82159.

Release note: None

This commit improves the heuristic we use for estimating the average
response size. Previously, we used a simple average, and now we multiple
the average by 1.5.

Having this multiple is good for a couple of reasons:
- this allows us to fulfill requests that are slightly larger than the
current average. For example, imagine that we're processing three
requests of sizes 100B, 110B, and 120B sequentially, one at a time.
Without the multiple, after the first request, our estimate would be 100B
so the second request would come back empty (with ResumeNextBytes=110),
so we'd have to re-issue the second request. At that point the average is
105B, so the third request would again come back empty and need to be
re-issued with larger TargetBytes. Having the multiple allows us to
handle such a scenario without any requests coming back empty. In
particular, TPCH Q17 has similar setup.
- this allows us to slowly grow the TargetBytes parameter over time when
requests can be returned partially multiple times (i.e. Scan requests
spanning multiple rows). Consider a case when a single Scan request has
to return 1MB worth of data, but each row is only 100B. With the initial
estimate of 1KB, every request would always come back with exactly 10
rows, and the avg response size would always stay at 1KB. We'd end up
issuing 1000 of such requests. Having a multiple here allows us to grow
the estimate over time, reducing the total number of requests needed.

This multiple seems to fix the remaining perf regression on Q17 when
comparing against the streamer OFF config.

This commit also introduces a cluster setting that controls this
multiple. Value of 1.5 was chosen using `tpchvec/bench` and this
setting.

Additionally, I introduced a similar cluster setting for the initial avg
response size estimate (currently hard-coded at 1KiB) and used
`tpchvec/bench`, and it showed that 1KiB value is pretty good. It was
also the value mentioned in the RFC, so I decided to remove the
corresponding setting.

Release note: None
@yuzefovich yuzefovich requested review from rharding6373, michae2 and a team June 27, 2022 22:00
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice work!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2)

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2)

@yuzefovich
Copy link
Member Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 28, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jun 28, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jun 29, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jun 29, 2022

Build failed (retrying...):

@yuzefovich
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 29, 2022

Already running a review

@craig
Copy link
Contributor

craig bot commented Jun 29, 2022

Build succeeded:

@craig craig bot merged commit 6bcabcf into cockroachdb:master Jun 29, 2022
@yuzefovich yuzefovich deleted the streamer-multiple branch June 29, 2022 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants