k/p/gen: Use `fragmented_vector` for `fetchable_partition_response` #11181

BenPope · 2023-06-03T00:27:45Z

Avoid oversize allocations by converting fetchable_partition_response to small_fragment_vector

Fixes #11017

Backports Required

Release Notes

Improvements

Kafka: Avoid very large contiguous allocations during fetch.

StephanDollberg

It's possible that large_fragment_vector is too far in the wrong direction

I think the general concern is that we shouldn't just blindly dump the fragmented_vector everywhere because it will create lots of memory wastage if we only put something like a couple 8 bytes elements into it (and this is even worse for the large_fragment_vector I guess).

fetch_response::partition_response isn't too large but I guess we are expecting many of those for topics with higher partition counts?

StephanDollberg · 2023-06-05T08:28:06Z

src/v/utils/fragmented_vector.h

-    /**
-     * Assign from a std::vector.
-     */
-    fragmented_vector& operator=(const std::vector<T>& rhs) noexcept {


Removing this because it is not needed anymore or for any other reason?

It was added to satisfy a use case that is no longer needed. If we want something like this I'd rather be explicit, as it can hide unexpected copies.

StephanDollberg · 2023-06-05T08:29:06Z

src/v/cloud_storage/tests/tx_range_manifest_test.cc

+make_fragmented_vector(std::initializer_list<T> in) {
+    fragmented_vector<T> ret;
+    for (auto& e : in) {
+        ret.push_back(e);


Can one move out of a initializer_list?

my hunch would be yes. but being a test helper, im not sure how much it matters?

BenPope · 2023-06-05T11:43:27Z

fetch_response::partition_response isn't too large but I guess we are expecting many of those for topics with higher partition counts?

Yeah, the oversize allocation was in: rptest.scale_tests.many_partitions_test.ManyPartitionsTest.test_many_partitions

StephanDollberg

Right makes sense, I guess the question is how much we are pessimizing responses with few partitions.

If there is no new data on a partition do we return both a metadata response and a partition response?

I guess it's fine either way as long as there is only a single one for each fetch request.

travisdowns · 2023-06-05T14:13:07Z

The cover page mentions metadata response, but this change affects the produce path, not metadata path, right?

So it's also perf/memory sensitive.

BenPope · 2023-06-05T14:21:45Z

The cover page mentions metadata response, but this change affects the produce path, not metadata path, right?

The fetch response is the primary fix, but I reworked some stuff in the generator, so the metadata response builds into a fragmented_vector, instead of a std::vector for metadata response, so a large alloc and copy is avoided there, too.

So it's also perf/memory sensitive.

Aye.

BenPope · 2023-06-05T22:12:46Z

I guess the question is how much we are pessimizing responses with few partitions.

Honestly, I think large_fragment_vector might not be the right choice for either operation. We've discussed changing fragmented_vector to have more std::vector-like behaviour for the first fragment (it doesn't make much sense for further fragments as a std::vector would roughly double anyway).

@travisdowns as the author of #8469, do you have any thoughts about the pessimisation here?

Towards the end of this PR it should be possible to avoid enable_fragmentation_resistance by using the regular path override mechanism for types.

If there is no new data on a partition do we return both a metadata response and a partition response?

They're separate responses for separate requests. Both occur quite often.

michael-redpanda · 2023-09-26T15:31:57Z

I noticed this has been sitting stale for a little bit. I think @BenPope has an outstanding question for @travisdowns irt if large_fragmented_vector is the right "tool for the job".

dotnwat

lgtm

dotnwat · 2023-09-27T19:59:47Z

src/v/kafka/protocol/schemata/generator.py

-{%- if field.nullable() %}
-{%- if flex %}
-{{ fname }} = reader.read_nullable_flex_array([version](protocol::decoder& reader) {
-{%- else %}
-{{ fname }} = reader.read_nullable_array([version](protocol::decoder& reader) {
-{%- endif %}
-{%- else %}
-{%- if flex %}
-{{ fname }} = reader.read_flex_array([version](protocol::decoder& reader) {
-{%- else %}
-{{ fname }} = reader.read_array([version](protocol::decoder& reader) {
-{%- endif %}
-{%- endif %}
+{%- set nullable = "nullable_" if field.nullable() else "" %}
+{%- set flex = "flex_" if flex else "" %}
+{{ fname }} = reader.read_{{nullable}}{{flex}}array([version](protocol::decoder& reader) {


dotnwat · 2023-09-27T20:03:15Z

src/v/cloud_storage/tests/tx_range_manifest_test.cc

+make_fragmented_vector(std::initializer_list<T> in) {
+    fragmented_vector<T> ret;
+    for (auto& e : in) {
+        ret.push_back(e);


my hunch would be yes. but being a test helper, im not sure how much it matters?

dotnwat · 2023-09-27T20:07:29Z

not sure if there are more outstanding questions. maybe ping travis?

travisdowns · 2023-09-29T13:28:45Z

@BenPope - yeah, the part where fragmented vector creates a large first segment as soon as the first entry is added is a problem.

Arguably it's worse here for fetch because this is a much hotter path in general, and it's also involved in the "poll" loop where we wake up periodically to do the whole fetch execution again but then go to sleep if we don't get enough bytes.

That said, there are 8 commits here and I think the first 7 seems non-controversial. So we could always go with those now and continue the discussion on fetch problem? The fetch is a problem and I guess we have to solve it.

dotnwat · 2023-10-01T23:46:51Z

@BenPope - yeah, the part where fragmented vector creates a large first segment as soon as the first entry is added is a problem.
Arguably it's worse here for fetch because this is a much hotter path in general, and it's also involved in the "poll" loop where we wake up periodically to do the whole fetch execution again but then go to sleep if we don't get enough bytes.

Is this just a matter of using a more modest segment size in the fragmented vector? maybe not, as it sounds like, between the "poll" case and large fetches the variability is large enough that we need something adaptive? maybe we could ditch the vector entirely and use a boost intrusive list...

BenPope · 2023-10-02T08:38:52Z

I've rebased everything but the last commit into #13854 with some minor changes.

BenPope · 2023-10-02T08:53:57Z

I've rebased everything but the last commit into #13854 with some minor changes.

I also accidentally pushed a rebase of this branch, so I've rebased again on top of #13854

mergify · 2023-10-02T08:54:27Z

⚠️ The sha of the head commit of this PR conflicts with #13854. Mergify cannot evaluate rules on this PR. ⚠️

The merge-base changed after approval.

travisdowns · 2023-10-03T20:41:14Z

Is this just a matter of using a more modest segment size in the fragmented vector? maybe not, as it sounds like, between the "poll" case and large fetches the variability is large enough that we need something adaptive? maybe we could ditch the vector entirely and use a boost intrusive list...

Right, I think it's exactly a case of using a small segment size, like say 1K. 1K isn't going to bother anyone and since these are small objects we can still fit plenty into a segment so I don't see much of a downside here. Yes there will be more segments, but none of the operations are really O(number_of_segments) anyway (arguably object destruction can be for primitives which avoid O(n) deletion cost otherwise, but oh well).

Signed-off-by: Ben Pope <[email protected]>

* Introduce `small_fragment_vector` * Switch `fetchable_partition_response` to it Signed-off-by: Ben Pope <[email protected]>

BenPope · 2023-10-10T20:35:07Z

Is this just a matter of using a more modest segment size in the fragmented vector? maybe not, as it sounds like, between the "poll" case and large fetches the variability is large enough that we need something adaptive? maybe we could ditch the vector entirely and use a boost intrusive list...

Right, I think it's exactly a case of using a small segment size, like say 1K. 1K isn't going to bother anyone and since these are small objects we can still fit plenty into a segment so I don't see much of a downside here. Yes there will be more segments, but none of the operations are really O(number_of_segments) anyway (arguably object destruction can be for primitives which avoid O(n) deletion cost otherwise, but oh well).

Done

dotnwat

lgtm

piyushredpanda · 2023-10-11T22:34:04Z

Known failure: #14053

vbotbuildovich · 2023-10-11T22:34:31Z

/backport v23.2.x

vbotbuildovich · 2023-10-11T22:34:32Z

/backport v23.1.x

vbotbuildovich · 2023-10-11T22:35:30Z

Failed to create a backport PR to v23.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-11181-v23.1.x-362 remotes/upstream/v23.1.x
git cherry-pick -x 793d7af15c408ef2ee99cc0c48a95903c3fca19c f33973cd55f9ff1af86dd1e5f484ebfd958ff9b1

Workflow run logs.

BenPope · 2023-10-13T00:18:01Z

/backport v23.1.x

BenPope self-assigned this Jun 3, 2023

github-actions bot added the area/redpanda label Jun 3, 2023

BenPope requested review from dotnwat and travisdowns June 3, 2023 00:28

BenPope added area/oversized-allocation area/kafka and removed area/redpanda labels Jun 3, 2023

BenPope force-pushed the fetchable_partition-allocs branch from 239a03f to 84a5002 Compare June 3, 2023 00:46

github-actions bot added the area/redpanda label Jun 3, 2023

piyushredpanda requested a review from StephanDollberg June 3, 2023 15:39

StephanDollberg reviewed Jun 5, 2023

View reviewed changes

StephanDollberg previously approved these changes Jun 5, 2023

View reviewed changes

StephanDollberg mentioned this pull request Jun 26, 2023

Using fragmented_vector in fetch response #11683

Closed

7 tasks

dotnwat previously approved these changes Sep 27, 2023

View reviewed changes

BenPope force-pushed the fetchable_partition-allocs branch from 84a5002 to 6865d5b Compare October 2, 2023 08:23

BenPope mentioned this pull request Oct 2, 2023

k/protocol && utils/fragmented_vector improvements #13854

Merged

7 tasks

BenPope dismissed stale reviews from StephanDollberg and dotnwat via bc63e9f October 2, 2023 08:53

BenPope force-pushed the fetchable_partition-allocs branch from 6865d5b to bc63e9f Compare October 2, 2023 08:53

StephanDollberg previously approved these changes Oct 3, 2023

View reviewed changes

michael-redpanda unassigned BenPope Oct 3, 2023

BenPope self-assigned this Oct 5, 2023

BenPope added 2 commits October 10, 2023 21:21

k/protocol/generator: Allow overriding the container for a member

793d7af

Signed-off-by: Ben Pope <[email protected]>

k/protocol/generator: Override fetchable_partition_response

f33973c

* Introduce `small_fragment_vector` * Switch `fetchable_partition_response` to it Signed-off-by: Ben Pope <[email protected]>

BenPope force-pushed the fetchable_partition-allocs branch from de9e26f to f33973c Compare October 10, 2023 20:32

BenPope requested review from dotnwat and StephanDollberg October 10, 2023 20:33

dotnwat approved these changes Oct 11, 2023

View reviewed changes

piyushredpanda merged commit ee0e298 into redpanda-data:dev Oct 11, 2023
25 of 28 checks passed

This was referenced Oct 13, 2023

[v23.1.x] Oversized allocation: 393216 bytes in kafka::op_context::create_response_placeholders #14140

Closed

[v23.1.x] k/p/gen: Use fragmented_vector for fetchable_partition_response #14141

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k/p/gen: Use `fragmented_vector` for `fetchable_partition_response` #11181

k/p/gen: Use `fragmented_vector` for `fetchable_partition_response` #11181

BenPope commented Jun 3, 2023 •

edited

Loading

StephanDollberg left a comment

StephanDollberg Jun 5, 2023

BenPope Jun 5, 2023

StephanDollberg Jun 5, 2023

dotnwat Sep 27, 2023

BenPope Oct 2, 2023

BenPope commented Jun 5, 2023 •

edited

Loading

StephanDollberg left a comment

travisdowns commented Jun 5, 2023

BenPope commented Jun 5, 2023

BenPope commented Jun 5, 2023 •

edited

Loading

michael-redpanda commented Sep 26, 2023

dotnwat left a comment

dotnwat Sep 27, 2023

dotnwat Sep 27, 2023

dotnwat commented Sep 27, 2023

travisdowns commented Sep 29, 2023

dotnwat commented Oct 1, 2023 •

edited

Loading

BenPope commented Oct 2, 2023

BenPope commented Oct 2, 2023

mergify bot commented Oct 2, 2023

travisdowns commented Oct 3, 2023

BenPope commented Oct 10, 2023

dotnwat left a comment

piyushredpanda commented Oct 11, 2023

vbotbuildovich commented Oct 11, 2023

vbotbuildovich commented Oct 11, 2023

vbotbuildovich commented Oct 11, 2023

BenPope commented Oct 13, 2023

k/p/gen: Use fragmented_vector for fetchable_partition_response #11181

k/p/gen: Use fragmented_vector for fetchable_partition_response #11181

Conversation

BenPope commented Jun 3, 2023 • edited Loading

Backports Required

Release Notes

Improvements

StephanDollberg left a comment

Choose a reason for hiding this comment

StephanDollberg Jun 5, 2023

Choose a reason for hiding this comment

BenPope Jun 5, 2023

Choose a reason for hiding this comment

StephanDollberg Jun 5, 2023

Choose a reason for hiding this comment

dotnwat Sep 27, 2023

Choose a reason for hiding this comment

BenPope Oct 2, 2023

Choose a reason for hiding this comment

BenPope commented Jun 5, 2023 • edited Loading

StephanDollberg left a comment

Choose a reason for hiding this comment

travisdowns commented Jun 5, 2023

BenPope commented Jun 5, 2023

BenPope commented Jun 5, 2023 • edited Loading

michael-redpanda commented Sep 26, 2023

dotnwat left a comment

Choose a reason for hiding this comment

dotnwat Sep 27, 2023

Choose a reason for hiding this comment

dotnwat Sep 27, 2023

Choose a reason for hiding this comment

dotnwat commented Sep 27, 2023

travisdowns commented Sep 29, 2023

dotnwat commented Oct 1, 2023 • edited Loading

BenPope commented Oct 2, 2023

BenPope commented Oct 2, 2023

mergify bot commented Oct 2, 2023

travisdowns commented Oct 3, 2023

BenPope commented Oct 10, 2023

dotnwat left a comment

Choose a reason for hiding this comment

piyushredpanda commented Oct 11, 2023

vbotbuildovich commented Oct 11, 2023

vbotbuildovich commented Oct 11, 2023

vbotbuildovich commented Oct 11, 2023

BenPope commented Oct 13, 2023

k/p/gen: Use `fragmented_vector` for `fetchable_partition_response` #11181

k/p/gen: Use `fragmented_vector` for `fetchable_partition_response` #11181

BenPope commented Jun 3, 2023 •

edited

Loading

BenPope commented Jun 5, 2023 •

edited

Loading

BenPope commented Jun 5, 2023 •

edited

Loading

dotnwat commented Oct 1, 2023 •

edited

Loading