Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fragmented vector for metadata response #8469

Merged
merged 16 commits into from
Feb 1, 2023

Conversation

travisdowns
Copy link
Member

@travisdowns travisdowns commented Jan 27, 2023

This series improves (I think) the fragmented vector class and uses it to
handle metadata responses which can be very large (multiple MB) when
there are many partitions and/or topics.

This helps prevent a fragmentation-implicated OOM, where we may fail to
allocate a large metadata response even though we have more than 1 GB
free memory.

The individual commits have additional details.

Fixes #8355, Fixes #8100

Backports Required

  • none - not confident this is important enough to backport

Release Notes

  • none

dotnwat
dotnwat previously approved these changes Jan 27, 2023
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks awesome. just added some thoughts along the way but i don't see any blockers

static_assert(
(fragment_size & (fragment_size - 1)) == 0,
"fragment size must be a power of 2");
static_assert(fragment_size % sizeof(T) == 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static_assert(fragment_size % sizeof(T) == 0);

this one was i believe unintentional. there is a recent thread on team-core about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, I replied over there: I think it may be intentional (see reply)

src/v/utils/fragmented_vector.h Show resolved Hide resolved
@@ -129,10 +129,17 @@ class fragmented_vector {
return *this;
}

bool operator==(const const_iterator&) const = default;
const iter operator++(int) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i did this a while back but just changed the iterator concept in seastar that should have been prefix!

Comment on lines -134 to -132
friend ssize_t
operator-(const const_iterator& a, const const_iterator& b) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was operator== and operator- unnecessary now?


bool operator==(const iter&) const = default;

friend ssize_t operator-(const iter& a, const iter& b) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh see those operators are added back in a later commit.

Copy link
Member Author

@travisdowns travisdowns Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry about this, something got weird with the many commit rewrites I did to get this into an understandable state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries

/**
* Assign from a std::vector.
*/
fragmented_vector& operator=(const std::vector<T>& rhs) noexcept {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

*/
void clear() {
// do the swap dance to actually clear the memory held by the vector
std::vector<std::vector<T>>{}.swap(_frags);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to prefer this over _frags.clear()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as the comment mentions, this is to clear the memory used by the frags array: .clear() on a vector leaves the capacity unchanged. Here that's not useful (the the individual fragments themselves are set to 0 capacity) so we use this well-known (ha?) idiom to truly clear out the vector.

Another option is _frags.clear(); _frags.shrink_to_fit(); though apparently this is not guaranteed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.clear() on a vector leaves the capacity unchanged

wow, sure enough. i missed that day in cppreference.com class

src/v/utils/fragmented_vector.h Outdated Show resolved Hide resolved
Comment on lines -161 to +192
return ss::make_ready_future<metadata_response::topic>(t);
return ss::make_ready_future<metadata_response::topic>(
std::move(t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh good catch! sorta surprised we couldn't do return t and get NRVO here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we need to wrap it in the ready future and that will suppress NRVO, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had some thought about how seastar had managed to optimize this case, but now i cannot find this reference. i agree it would be surprising.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dotnwat - I think when you return a bare value T inside a continuation, it gets converted by seastar into a future<T> but can still use NVRO, becuase seastar jumps thru some hoops to ensure that works: maybe that's what you were thinking of?

@piyushredpanda
Copy link
Contributor

This would be so awesome! Cannot wait for it to get in!

dotnwat
dotnwat previously approved these changes Jan 31, 2023
fragmented_vector is a vector-like class which limits the maximum
contiguous allocation required by breaking the elements up into chunks
of a specific maximum size call the "fragment size".

Currently, the vector places two restrictions on the fragment size:
 - It must be a power of two
 and one on the element size:
 - It must evenly divide the fragment size.

It seems to be that neither of these is necessary, and they severely
limit the element types that may be used: effectively, only elements
whose size is a power of 2 can meet these requirements.

The first restriction may be in place to ensure the fragment size lines
exactly with seastar span sizes, and the second may be so that there is
no internal waste within a fragment, but with many object sizes this
waste is very small.

We simply remove these restrictions for now.
This returns the approximate in-memory size of the vector in bytes.

It is handy for methods which must estimate the in-memory size of
a given request.
These operations were not implemented previously, but are necessary
to satisfy the iterator concepts.
Previously, fragmented_vector only had const_iterator, but more
general use will require also a non-const iterator.

This change adds that capability, using a template iterator class
parameterized on whether the iterator is const or not.
fragmented_vector<T>::iterator and const_iterator did not have a
default constructor, but without that it does not satisfy the
forward_iterator concept and some algorithms require that.
These operators move the iterators towards the random-access concept.
This change allows our fragmented vector to be assigned from std::vector
so that it can interop with vector in the request/response path.
fragmented_vector::iterator is in principle a random access iterator
so it should have less-than and other operators. Do it with spaceship.
Restrict the number of elements in a full fragment to the highest power
of two that still satisfies the max_frag_bytes limits.

This keeps the indexing for operator[] considerably similar because
a shift is needed instead of a division-by-constant.

While we're in there, add an accessor for the number of elements held
in each fragment, allowing users to query this constexpr value.
Add several new tests for fragmented vector, covering:

 - Assignment from std::vector
 - clear() method
 - const vs non-const iterators
 - Fragment sizing
 - Iterator arithmetic and comparison
 - Using vector in std::sort

Furthermore, a wrapper is created which checks the consistency of the
vector on every operator->() deference: by using this widely in the test
we fail fast when an inconsistency is discovered.

We use an internals-accessing friend class to enable some of this, e.g.,
the consistency checking, without needing to make internals public.
This change lets the write_array methods of response_writer
accept fragmented_vector in addition to std::vector.
Certain responses may be very large, such as a metadata response on
a cluster with many partitions. Since int schemata we map lists in
response types to std::vector, this may result in large contiguous
allocations, which is a no-no in redpanda: we should limit such
allocations to 128 K.

To fix this, we add an enable_fragmentation_resistance list in the
generator: array of types in this list will be implemented with a
large_fragment_vector instead of std::vector: this type limits the
contiguous memory allocated by making several smaller allocations if
necessary.
Topic metadata is potentially large and slow to copy, so this change
moves it in a couple of places we weren't moving it before which
showed up in a large load test profile.
Now that we are using fragmented_vector, which is move-only, we need
to use explicit moves in a few places and as a bonus this will be
faster.
reserve() was being used on the std::vector of partition metadata to
size it correctly, but now that we are using fragmented_vector this
method no longer exists. It does not make sense for fragmented_vector
since that class does not copy data to a new contiguous area as it
grows, and always immediately allocates each fragment as its maximum
size, so reserve would not change the allocation or copy pattern at all.
@travisdowns
Copy link
Member Author

@dotnwat everything is building now. There was a rebase to tip of dev and two more little commits (the two shown above):

5471ae8
1eb014b

To accommodate the move-only behavior of frag vector in a couple of places.

@dotnwat dotnwat merged commit daf23c3 into redpanda-data:dev Feb 1, 2023
BenPope added a commit to BenPope/redpanda that referenced this pull request Jun 3, 2023
Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>
BenPope added a commit to BenPope/redpanda that referenced this pull request Jun 3, 2023
Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>
BenPope added a commit to BenPope/redpanda that referenced this pull request Oct 2, 2023
Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>
BenPope added a commit to BenPope/redpanda that referenced this pull request Oct 2, 2023
Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>
BenPope added a commit to BenPope/redpanda that referenced this pull request Oct 2, 2023
Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>
BenPope added a commit to BenPope/redpanda that referenced this pull request Oct 2, 2023
Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this pull request Oct 3, 2023
Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit bc63e9f)
BenPope added a commit to BenPope/redpanda that referenced this pull request Oct 3, 2023
Assignment from `std::vector` was introduced in redpanda-data#8469.

Now that kafka/protocol/generator can call
kafka::protocol::read_array<fragmented_vector>, this is no
longer needed, avoiding a large copy.

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit bc63e9f)
pgellert added a commit to pgellert/redpanda that referenced this pull request Mar 21, 2024
This is necessary to allow using chunked_vector with these encoder
methods.

This continues the work started in the following reference PR:
redpanda-data#8469
WillemKauf pushed a commit to WillemKauf/redpanda that referenced this pull request Mar 25, 2024
This is necessary to allow using chunked_vector with these encoder
methods.

This continues the work started in the following reference PR:
redpanda-data#8469
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this pull request Apr 10, 2024
This is necessary to allow using chunked_vector with these encoder
methods.

This continues the work started in the following reference PR:
redpanda-data#8469

(cherry picked from commit 6a5ceb8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants