New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Use fragmented vector for metadata response #8469

Merged

dotnwat merged 16 commits into redpanda-data:dev from travisdowns:td20-frag-vector-metadata

Feb 1, 2023

Member

travisdowns commented Jan 27, 2023 •

edited by ballard26

Loading

This series improves (I think) the fragmented vector class and uses it to
handle metadata responses which can be very large (multiple MB) when
there are many partitions and/or topics.

This helps prevent a fragmentation-implicated OOM, where we may fail to
allocate a large metadata response even though we have more than 1 GB
free memory.

The individual commits have additional details.

Fixes #8355, Fixes #8100

Backports Required

none - not confident this is important enough to backport

Release Notes

none

travisdowns requested review from dotnwat and ballard26

January 27, 2023 19:27

github-actions bot added the area/redpanda label

dotnwat previously approved these changes

View reviewed changes

Member

dotnwat left a comment

looks awesome. just added some thoughts along the way but i don't see any blockers

src/v/utils/fragmented_vector.h

-                  static_assert(
-                    (fragment_size & (fragment_size - 1)) == 0,
-                    "fragment size must be a power of 2");
-                  static_assert(fragment_size % sizeof(T) == 0);

Member

dotnwat Jan 27, 2023

static_assert(fragment_size % sizeof(T) == 0);

this one was i believe unintentional. there is a recent thread on team-core about it.

Member Author

travisdowns Jan 31, 2023

oh yeah, I replied over there: I think it may be intentional (see reply)

src/v/utils/fragmented_vector.h Show resolved Hide resolved

src/v/utils/fragmented_vector.h Outdated

@@ @@ -129,10 +129,17 @@ class fragmented_vector { @@
                           return *this;
                       }
-                      bool operator==(const const_iterator&) const = default;
+                      const iter operator++(int) {

Member

dotnwat Jan 27, 2023

oh i did this a while back but just changed the iterator concept in seastar that should have been prefix!

src/v/utils/fragmented_vector.h

Comment on lines -134 to -132

		friend ssize_t
		operator-(const const_iterator& a, const const_iterator& b) {

Member

dotnwat Jan 27, 2023

was operator== and operator- unnecessary now?

src/v/utils/fragmented_vector.h


		bool operator==(const iter&) const = default;

		friend ssize_t operator-(const iter& a, const iter& b) {

Member

dotnwat Jan 27, 2023

ahh see those operators are added back in a later commit.

Member Author

travisdowns Jan 31, 2023 •

edited

Loading

I am sorry about this, something got weird with the many commit rewrites I did to get this into an understandable state.

Member

dotnwat Jan 31, 2023

no worries

src/v/utils/fragmented_vector.h

+                  /**
+                   * Assign from a std::vector.
+                   */
+                  fragmented_vector& operator=(const std::vector<T>& rhs) noexcept {

Member

dotnwat Jan 27, 2023

nice!

src/v/utils/fragmented_vector.h

+                   */
+                  void clear() {
+                      // do the swap dance to actually clear the memory held by the vector
+                      std::vector<std::vector<T>>{}.swap(_frags);

Member

dotnwat Jan 27, 2023

is there a reason to prefer this over _frags.clear()?

Member Author

travisdowns Jan 31, 2023

Yes, as the comment mentions, this is to clear the memory used by the frags array: .clear() on a vector leaves the capacity unchanged. Here that's not useful (the the individual fragments themselves are set to 0 capacity) so we use this well-known (ha?) idiom to truly clear out the vector.

Another option is _frags.clear(); _frags.shrink_to_fit(); though apparently this is not guaranteed.

Member

dotnwat Jan 31, 2023

.clear() on a vector leaves the capacity unchanged

wow, sure enough. i missed that day in cppreference.com class

src/v/utils/fragmented_vector.h Outdated Show resolved Hide resolved

src/v/kafka/server/handlers/metadata.cc

Comment on lines -161 to +192

-                            return ss::make_ready_future<metadata_response::topic>(t);
+                            return ss::make_ready_future<metadata_response::topic>(
+                              std::move(t));

Member

dotnwat Jan 27, 2023

oh good catch! sorta surprised we couldn't do return t and get NRVO here.

Member Author

travisdowns Jan 31, 2023

Well we need to wrap it in the ready future and that will suppress NRVO, right?

Member

dotnwat Jan 31, 2023

i had some thought about how seastar had managed to optimize this case, but now i cannot find this reference. i agree it would be surprising.

Member Author

travisdowns Feb 1, 2023

@dotnwat - I think when you return a bare value T inside a continuation, it gets converted by seastar into a future<T> but can still use NVRO, becuase seastar jumps thru some hoops to ensure that works: maybe that's what you were thinking of?

Contributor

piyushredpanda commented Jan 31, 2023

This would be so awesome! Cannot wait for it to get in!

travisdowns dismissed dotnwat’s stale review via

15b6343

January 31, 2023 21:44

travisdowns force-pushed the td20-frag-vector-metadata branch from d581c3d to 15b6343 Compare

January 31, 2023 21:44

travisdowns requested a review from dotnwat

January 31, 2023 21:59

dotnwat previously approved these changes

View reviewed changes

travisdowns dismissed dotnwat’s stale review via

0323e74

January 31, 2023 23:11

travisdowns force-pushed the td20-frag-vector-metadata branch from 15b6343 to 0323e74 Compare

January 31, 2023 23:11

travisdowns added 14 commits

February 1, 2023 10:44


          Remove size restrictions on frag_vector elem type

9de617c

fragmented_vector is a vector-like class which limits the maximum
contiguous allocation required by breaking the elements up into chunks
of a specific maximum size call the "fragment size".

Currently, the vector places two restrictions on the fragment size:
 - It must be a power of two
 and one on the element size:
 - It must evenly divide the fragment size.

It seems to be that neither of these is necessary, and they severely
limit the element types that may be used: effectively, only elements
whose size is a power of 2 can meet these requirements.

The first restriction may be in place to ensure the fragment size lines
exactly with seastar span sizes, and the second may be so that there is
no internal waste within a fragment, but with many object sizes this
waste is very small.

We simply remove these restrictions for now.


          Add memory_size to fragmented_vector

c04b6c4

This returns the approximate in-memory size of the vector in bytes.

It is handy for methods which must estimate the in-memory size of
a given request.


          Add post-increment ops to frag vector iterator

b1033b1

These operations were not implemented previously, but are necessary
to satisfy the iterator concepts.


          Add non-const iterator support to frag vector

cda1a66

Previously, fragmented_vector only had const_iterator, but more
general use will require also a non-const iterator.

This change adds that capability, using a template iterator class
parameterized on whether the iterator is const or not.


          Add default constructor to frag vector iterator

60411d9

fragmented_vector<T>::iterator and const_iterator did not have a
default constructor, but without that it does not satisfy the
forward_iterator concept and some algorithms require that.


          Add operator + - and =- which take offsets

f7090c4

These operators move the iterators towards the random-access concept.


          Allow fragmented vector assigned from std::vector

94607f0

This change allows our fragmented vector to be assigned from std::vector
so that it can interop with vector in the request/response path.


          Add comparison operators to frag_vec::iterator

b7d53b6

fragmented_vector::iterator is in principle a random access iterator
so it should have less-than and other operators. Do it with spaceship.


          Add ostream insert operator for fragmented_vector

3fd07d7

So we can print it in tests.


          Restrict elems-per-frag to 2^n in frag vector

0293e5b

Restrict the number of elements in a full fragment to the highest power
of two that still satisfies the max_frag_bytes limits.

This keeps the indexing for operator[] considerably similar because
a shift is needed instead of a division-by-constant.

While we're in there, add an accessor for the number of elements held
in each fragment, allowing users to query this constexpr value.


          Add test coverage to fragmented vector

374fd3a

Add several new tests for fragmented vector, covering:

 - Assignment from std::vector
 - clear() method
 - const vs non-const iterators
 - Fragment sizing
 - Iterator arithmetic and comparison
 - Using vector in std::sort

Furthermore, a wrapper is created which checks the consistency of the
vector on every operator->() deference: by using this widely in the test
we fail fast when an inconsistency is discovered.

We use an internals-accessing friend class to enable some of this, e.g.,
the consistency checking, without needing to make internals public.


          Allow response writer to accept frag vectors

4893ffc

This change lets the write_array methods of response_writer
accept fragmented_vector in addition to std::vector.


          Add optional fragmentation resistance to schemata

de91d7c

Certain responses may be very large, such as a metadata response on
a cluster with many partitions. Since int schemata we map lists in
response types to std::vector, this may result in large contiguous
allocations, which is a no-no in redpanda: we should limit such
allocations to 128 K.

To fix this, we add an enable_fragmentation_resistance list in the
generator: array of types in this list will be implemented with a
large_fragment_vector instead of std::vector: this type limits the
contiguous memory allocated by making several smaller allocations if
necessary.


          Move topic metadata in two more places

b16ec52

Topic metadata is potentially large and slow to copy, so this change
moves it in a couple of places we weren't moving it before which
showed up in a large load test profile.

travisdowns force-pushed the td20-frag-vector-metadata branch from e956716 to c8a8f51 Compare

February 1, 2023 13:47

travisdowns added 2 commits

February 1, 2023 11:20


          Additional moves for partition metadata

5471ae8

Now that we are using fragmented_vector, which is move-only, we need
to use explicit moves in a few places and as a bonus this will be
faster.


          Remove .reserve() call on partition metadata

1eb014b

reserve() was being used on the std::vector of partition metadata to
size it correctly, but now that we are using fragmented_vector this
method no longer exists. It does not make sense for fragmented_vector
since that class does not copy data to a new contiguous area as it
grows, and always immediately allocates each fragment as its maximum
size, so reserve would not change the allocation or copy pattern at all.

travisdowns force-pushed the td20-frag-vector-metadata branch from c8a8f51 to 1eb014b Compare

February 1, 2023 14:24

Member Author

travisdowns commented Feb 1, 2023

@dotnwat everything is building now. There was a rebase to tip of dev and two more little commits (the two shown above):

5471ae8
1eb014b

To accommodate the move-only behavior of frag vector in a couple of places.

dotnwat approved these changes

View reviewed changes

dotnwat merged commit daf23c3 into redpanda-data:dev

This was referenced Feb 22, 2023

[v22.3.x] cluster/archival_metadata_stm: std::vector -> fragmented_vector #9046

Merged

[v22.2.x] cluster/archival_metadata_stm: std::vector -> fragmented_vector #9047

Merged

bharathv mentioned this pull request

rm_stm: replace std::vector usage with fragmented_vector #9484

Merged

7 tasks

BenPope added a commit to BenPope/redpanda that referenced this pull request


          utils/fragmented_vector: Remove assignment from std::vector

1a6173e

Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>

BenPope mentioned this pull request

k/p/gen: Use fragmented_vector for fetchable_partition_response #11181

Merged

7 tasks

BenPope added a commit to BenPope/redpanda that referenced this pull request


          utils/fragmented_vector: Remove assignment from std::vector

28a38a3

Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>

travisdowns mentioned this pull request

Every node OOMs during load test #5563

Closed

BenPope added a commit to BenPope/redpanda that referenced this pull request


          utils/fragmented_vector: Remove assignment from std::vector

a5c9ae1

Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>

BenPope added a commit to BenPope/redpanda that referenced this pull request


          utils/fragmented_vector: Remove assignment from std::vector

be19040

Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>

BenPope added a commit to BenPope/redpanda that referenced this pull request


          utils/fragmented_vector: Remove assignment from std::vector

9dd4c53

Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>

BenPope added a commit to BenPope/redpanda that referenced this pull request


          utils/fragmented_vector: Remove assignment from std::vector

bc63e9f

Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>

vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this pull request


          utils/fragmented_vector: Remove assignment from std::vector

9a383aa

Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit bc63e9f)

BenPope added a commit to BenPope/redpanda that referenced this pull request


          utils/fragmented_vector: Remove assignment from std::vector

c2f183e

Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit bc63e9f)

pgellert added a commit to pgellert/redpanda that referenced this pull request


          kafka: generalize response writer container

6a5ceb8

This is necessary to allow using chunked_vector with these encoder
methods.

This continues the work started in the following reference PR:
redpanda-data#8469

WillemKauf pushed a commit to WillemKauf/redpanda that referenced this pull request


          kafka: generalize response writer container

d64fd70

This is necessary to allow using chunked_vector with these encoder
methods.

This continues the work started in the following reference PR:
redpanda-data#8469

vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this pull request


          kafka: generalize response writer container

56a9918

This is necessary to allow using chunked_vector with these encoder
methods.

This continues the work started in the following reference PR:
redpanda-data#8469

(cherry picked from commit 6a5ceb8)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels