-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oversized 300K allocation during topic deletion #15610
Labels
Comments
The top 4 stacks. #1: Adding a delete op to the topic_table_delta table. Key stack:
#2: Very similar, same table I think with a slightly different command:
#3: Related to the above but weird.
#4: The chunk cache. Nothing to see here. |
mmaslankaprv
added a commit
to mmaslankaprv/redpanda
that referenced
this issue
Mar 6, 2024
A set of partition ids in each topic is a monotonically increasing sequence of numbers without gaps. This allows us to use a vector instead of an associative container to keep the partition metadata in topics table. Usage of `chunked_vector` prevents large memory allocations. Fixes: redpanda-data#15610 Signed-off-by: Michal Maslanka <[email protected]>
6 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Version & Environment
Redpanda version: 23.2.17
What went wrong?
In the OOM described in #15607, the final allocation is almost 300K:
This exceeds the memory allocation limit of 128K. This wasn't the underlying problem in this OOM (more than 99% of memory was used), but regardless we should fix any allocation larger than 128K.
The allocation occurs in the
topic_metadata_item::partitions
map while adding a partition to the map intopic_table
. The size makes sense: we have ~22,900 partitoins and assuming node hash map is an array of pointer-sized things (highly likely) we have 183K for that many elements: assuming a 2x growth strategy, this could allocate up to ~360K plus some headroom for load factor. So the 300K is completely explained.The decoded stack is:
What should have happened instead?
Allocatoins under the bound of 128K. I guess we need a different structure.
How to reproduce the issue?
The text was updated successfully, but these errors were encountered: