-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use fragmented vector for metadata response #8469
Use fragmented vector for metadata response #8469
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks awesome. just added some thoughts along the way but i don't see any blockers
static_assert( | ||
(fragment_size & (fragment_size - 1)) == 0, | ||
"fragment size must be a power of 2"); | ||
static_assert(fragment_size % sizeof(T) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static_assert(fragment_size % sizeof(T) == 0);
this one was i believe unintentional. there is a recent thread on team-core about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah, I replied over there: I think it may be intentional (see reply)
src/v/utils/fragmented_vector.h
Outdated
@@ -129,10 +129,17 @@ class fragmented_vector { | |||
return *this; | |||
} | |||
|
|||
bool operator==(const const_iterator&) const = default; | |||
const iter operator++(int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i did this a while back but just changed the iterator concept in seastar that should have been prefix!
friend ssize_t | ||
operator-(const const_iterator& a, const const_iterator& b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was operator==
and operator-
unnecessary now?
|
||
bool operator==(const iter&) const = default; | ||
|
||
friend ssize_t operator-(const iter& a, const iter& b) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh see those operators are added back in a later commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry about this, something got weird with the many commit rewrites I did to get this into an understandable state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no worries
/** | ||
* Assign from a std::vector. | ||
*/ | ||
fragmented_vector& operator=(const std::vector<T>& rhs) noexcept { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
*/ | ||
void clear() { | ||
// do the swap dance to actually clear the memory held by the vector | ||
std::vector<std::vector<T>>{}.swap(_frags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason to prefer this over _frags.clear()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as the comment mentions, this is to clear the memory used by the frags
array: .clear()
on a vector leaves the capacity unchanged. Here that's not useful (the the individual fragments themselves are set to 0 capacity) so we use this well-known (ha?) idiom to truly clear out the vector.
Another option is _frags.clear(); _frags.shrink_to_fit();
though apparently this is not guaranteed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.clear() on a vector leaves the capacity unchanged
wow, sure enough. i missed that day in cppreference.com class
return ss::make_ready_future<metadata_response::topic>(t); | ||
return ss::make_ready_future<metadata_response::topic>( | ||
std::move(t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh good catch! sorta surprised we couldn't do return t
and get NRVO here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we need to wrap it in the ready future and that will suppress NRVO, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i had some thought about how seastar had managed to optimize this case, but now i cannot find this reference. i agree it would be surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dotnwat - I think when you return a bare value T
inside a continuation, it gets converted by seastar into a future<T>
but can still use NVRO, becuase seastar jumps thru some hoops to ensure that works: maybe that's what you were thinking of?
This would be so awesome! Cannot wait for it to get in! |
d581c3d
to
15b6343
Compare
15b6343
to
0323e74
Compare
fragmented_vector is a vector-like class which limits the maximum contiguous allocation required by breaking the elements up into chunks of a specific maximum size call the "fragment size". Currently, the vector places two restrictions on the fragment size: - It must be a power of two and one on the element size: - It must evenly divide the fragment size. It seems to be that neither of these is necessary, and they severely limit the element types that may be used: effectively, only elements whose size is a power of 2 can meet these requirements. The first restriction may be in place to ensure the fragment size lines exactly with seastar span sizes, and the second may be so that there is no internal waste within a fragment, but with many object sizes this waste is very small. We simply remove these restrictions for now.
This returns the approximate in-memory size of the vector in bytes. It is handy for methods which must estimate the in-memory size of a given request.
These operations were not implemented previously, but are necessary to satisfy the iterator concepts.
Previously, fragmented_vector only had const_iterator, but more general use will require also a non-const iterator. This change adds that capability, using a template iterator class parameterized on whether the iterator is const or not.
fragmented_vector<T>::iterator and const_iterator did not have a default constructor, but without that it does not satisfy the forward_iterator concept and some algorithms require that.
These operators move the iterators towards the random-access concept.
This change allows our fragmented vector to be assigned from std::vector so that it can interop with vector in the request/response path.
fragmented_vector::iterator is in principle a random access iterator so it should have less-than and other operators. Do it with spaceship.
So we can print it in tests.
Restrict the number of elements in a full fragment to the highest power of two that still satisfies the max_frag_bytes limits. This keeps the indexing for operator[] considerably similar because a shift is needed instead of a division-by-constant. While we're in there, add an accessor for the number of elements held in each fragment, allowing users to query this constexpr value.
Add several new tests for fragmented vector, covering: - Assignment from std::vector - clear() method - const vs non-const iterators - Fragment sizing - Iterator arithmetic and comparison - Using vector in std::sort Furthermore, a wrapper is created which checks the consistency of the vector on every operator->() deference: by using this widely in the test we fail fast when an inconsistency is discovered. We use an internals-accessing friend class to enable some of this, e.g., the consistency checking, without needing to make internals public.
This change lets the write_array methods of response_writer accept fragmented_vector in addition to std::vector.
Certain responses may be very large, such as a metadata response on a cluster with many partitions. Since int schemata we map lists in response types to std::vector, this may result in large contiguous allocations, which is a no-no in redpanda: we should limit such allocations to 128 K. To fix this, we add an enable_fragmentation_resistance list in the generator: array of types in this list will be implemented with a large_fragment_vector instead of std::vector: this type limits the contiguous memory allocated by making several smaller allocations if necessary.
Topic metadata is potentially large and slow to copy, so this change moves it in a couple of places we weren't moving it before which showed up in a large load test profile.
e956716
to
c8a8f51
Compare
Now that we are using fragmented_vector, which is move-only, we need to use explicit moves in a few places and as a bonus this will be faster.
reserve() was being used on the std::vector of partition metadata to size it correctly, but now that we are using fragmented_vector this method no longer exists. It does not make sense for fragmented_vector since that class does not copy data to a new contiguous area as it grows, and always immediately allocates each fragment as its maximum size, so reserve would not change the allocation or copy pattern at all.
c8a8f51
to
1eb014b
Compare
Assignment from `std::vector` was introduced in redpanda-data#8469. Now that kafka/protocol/generator can call kafka::protocol::read_array<fragmented_vector>, this is no longer needed, avoiding a large copy. Signed-off-by: Ben Pope <[email protected]>
Assignment from `std::vector` was introduced in redpanda-data#8469. Now that kafka/protocol/generator can call kafka::protocol::read_array<fragmented_vector>, this is no longer needed, avoiding a large copy. Signed-off-by: Ben Pope <[email protected]>
Assignment from `std::vector` was introduced in redpanda-data#8469. Now that kafka/protocol/generator can call kafka::protocol::read_array<fragmented_vector>, this is no longer needed, avoiding a large copy. Signed-off-by: Ben Pope <[email protected]>
Assignment from `std::vector` was introduced in redpanda-data#8469. Now that kafka/protocol/generator can call kafka::protocol::read_array<fragmented_vector>, this is no longer needed, avoiding a large copy. Signed-off-by: Ben Pope <[email protected]>
Assignment from `std::vector` was introduced in redpanda-data#8469. Now that kafka/protocol/generator can call kafka::protocol::read_array<fragmented_vector>, this is no longer needed, avoiding a large copy. Signed-off-by: Ben Pope <[email protected]>
Assignment from `std::vector` was introduced in redpanda-data#8469. Now that kafka/protocol/generator can call kafka::protocol::read_array<fragmented_vector>, this is no longer needed, avoiding a large copy. Signed-off-by: Ben Pope <[email protected]>
Assignment from `std::vector` was introduced in redpanda-data#8469. Now that kafka/protocol/generator can call kafka::protocol::read_array<fragmented_vector>, this is no longer needed, avoiding a large copy. Signed-off-by: Ben Pope <[email protected]> (cherry picked from commit bc63e9f)
Assignment from `std::vector` was introduced in redpanda-data#8469. Now that kafka/protocol/generator can call kafka::protocol::read_array<fragmented_vector>, this is no longer needed, avoiding a large copy. Signed-off-by: Ben Pope <[email protected]> (cherry picked from commit bc63e9f)
This is necessary to allow using chunked_vector with these encoder methods. This continues the work started in the following reference PR: redpanda-data#8469
This is necessary to allow using chunked_vector with these encoder methods. This continues the work started in the following reference PR: redpanda-data#8469
This is necessary to allow using chunked_vector with these encoder methods. This continues the work started in the following reference PR: redpanda-data#8469 (cherry picked from commit 6a5ceb8)
This series improves (I think) the fragmented vector class and uses it to
handle metadata responses which can be very large (multiple MB) when
there are many partitions and/or topics.
This helps prevent a fragmentation-implicated OOM, where we may fail to
allocate a large metadata response even though we have more than 1 GB
free memory.
The individual commits have additional details.
Fixes #8355, Fixes #8100
Backports Required
Release Notes