Every node OOMs during load test #5563

travisdowns · 2022-07-22T01:59:16Z

Version & Environment

Redpanda version: 0346aa1

What went wrong?

Out of memory conditions occur when trying to make moderately sized (> 500K < 1M) allocations during a load test, after the cluster controller is killed. Some failures indicate significant fragmentation.

What should have happened instead?

No out of memory condition, but instead a graceful and speedy recovery.

Additional information

Typical OOM:

DEBUG 2022-07-21 23:34:56,111907 [shard 0 seq 3] seastar_memory - Dumping seastar memory diagnostics
Used memory:  2435M
Free memory:  445M
Total memory: 3G

Small pools:
objsz	spansz	usedobj	memory	unused	wst%
8	4K	2k	28K	16K	58
10	4K	2	8K	8K	99
12	4K	5886k	67M	2K	0
14	4K	2	8K	8K	99
16	4K	5k	100K	17K	16
32	4K	41k	1M	14K	1
32	4K	3853k	121M	3849K	3
32	4K	9k	284K	4K	1
32	4K	19k	1M	501K	45
48	4K	13k	756K	136K	17
48	4K	142k	7M	618K	8
64	4K	36k	2M	73K	3
64	4K	323k	20M	144K	0
80	4K	146k	11M	127K	1
96	4K	2k	544K	366K	67
112	4K	4k	860K	465K	54
128	4K	262	112K	79K	70
160	4K	25k	8M	3849K	49
192	4K	3k	508K	33K	6
224	4K	3k	688K	26K	3
256	4K	30k	13M	5959K	44
320	8K	6k	3M	1033K	35
384	8K	3k	1M	13K	1
448	4K	1k	984K	330K	33
512	4K	2k	2M	640K	37
640	16K	30	240K	221K	92
768	16K	5k	4M	44K	1
896	8K	1k	1M	113K	10
1024	4K	117	180K	63K	35
1280	32K	1k	3M	703K	27
1536	32K	418	896K	269K	29
1792	16K	339	800K	207K	25
2048	8K	247	624K	130K	20
2560	64K	375	1M	213K	18
3072	64K	305	1M	237K	20
3584	32K	10	288K	252K	87
4096	16K	789	3M	412K	11
5120	128K	21	384K	275K	71
6144	128K	296	2M	270K	13
7168	64K	1	576K	567K	98
8192	32K	5k	36M	104K	0
10240	64K	300	3M	320K	9
12288	64K	8	1M	1M	92
14336	128K	285	6M	2M	29
16384	64K	18k	289M	1M	0
Page spans:
index	size	free	used	spans
0	4K	4K	259M	66k
1	8K	3M	6M	1k
2	16K	4M	9M	834
3	32K	56M	89M	5k
4	64K	77M	316M	6k
5	128K	82M	1181M	10k
6	256K	92M	1M	372
7	512K	131M	0B	262
8	1M	0B	491M	491
9	2M	0B	0B	0
10	4M	0B	4M	1
11	8M	0B	0B	0
12	16M	0B	16M	1
13	32M	0B	64M	2
14	64M	0B	0B	0
15	128M	0B	0B	0
16	256M	0B	0B	0
17	512M	0B	0B	0
18	1G	0B	0B	0
19	2G	0B	0B	0
20	4G	0B	0B	0
21	8G	0B	0B	0
22	16G	0B	0B	0
23	32G	0B	0B	0
24	64G	0B	0B	0
25	128G	0B	0B	0
26	256G	0B	0B	0
27	512G	0B	0B	0
28	1T	0B	0B	0
29	2T	0B	0B	0
30	4T	0B	0B	0
31	8T	0B	0B	0

ERROR 2022-07-21 23:34:56,115591 [shard 0 seq 4] seastar - Failed to allocate 528000 bytes
Aborting on shard 0.
Backtrace:
  0x46b2f46
  0x4715dd2
  0x2a2d90a9841f
  /opt/redpanda/lib/libc.so.6+0x4300a
  /opt/redpanda/lib/libc.so.6+0x22858
  0x46242a3
  0x46329d1
  0x23d72c9
  0x23dcb2a
  0x222339a
  0x46d034f
  0x46d4027
  0x46d13f9
  0x45f1851
  0x45ef96f
  0x1803dc6
  0x49d12c0
  /opt/redpanda/lib/libc.so.6+0x24082
  0x1800a0d

Relevant decoded backtrace:

 (inlined by) std::__1::__libcpp_allocate(unsigned long, unsigned long) at /home/ubuntu/redpanda/vbuild/llvm/install/bin/../include/c++/v1/new:271
 (inlined by) std::__1::allocator<kafka::metadata_response_partition>::allocate(unsigned long) at /home/ubuntu/redpanda/vbuild/llvm/install/bin/../include/c++/v1/__memory/allocator.h:105
 (inlined by) std::__1::allocator_traits<std::__1::allocator<kafka::metadata_response_partition> >::allocate(std::__1::allocator<kafka::metadata_response_partition>&, unsigned long) at /home/ubuntu/redpanda/vbuild/llvm/install/bin/../include/c++/v1/__memory/allocator_traits.h:262
 (inlined by) std::__1::vector<kafka::metadata_response_partition, std::__1::allocator<kafka::metadata_response_partition> >::__vallocate(unsigned long) at /home/ubuntu/redpanda/vbuild/llvm/install/bin/../include/c++/v1/vector:922
 (inlined by) std::__1::vector<kafka::metadata_response_partition, std::__1::allocator<kafka::metadata_response_partition> >::vector(std::__1::vector<kafka::metadata_response_partition, std::__1::allocator<kafka::metadata_response_partition> > const&) at /home/ubuntu/redpanda/vbuild/llvm/install/bin/../include/c++/v1/vector:1157
 (inlined by) kafka::metadata_response_topic::metadata_response_topic(kafka::metadata_response_topic const&) at /home/ubuntu/redpanda/vbuild/release/clang/src/v/kafka/protocol/schemata/metadata_response.h:95
 (inlined by) kafka::metadata_response_topic* std::__1::construct_at<kafka::metadata_response_topic, kafka::metadata_response_topic&, kafka::metadata_response_topic*>(kafka::metadata_response_topic*, kafka::metadata_response_topic&) at /home/ubuntu/redpanda/vbuild/llvm/install/bin/../include/c++/v1/__memory/construct_at.h:38
 (inlined by) void std::__1::allocator_traits<std::__1::allocator<kafka::metadata_response_topic> >::construct<kafka::metadata_response_topic, kafka::metadata_response_topic&, void, void>(std::__1::allocator<kafka::metadata_response_topic>&, kafka::metadata_response_topic*, kafka::metadata_response_topic&) at /home/ubuntu/redpanda/vbuild/llvm/install/bin/../include/c++/v1/__memory/allocator_traits.h:298
 (inlined by) void std::__1::__construct_range_forward<std::__1::allocator<kafka::metadata_response_topic>, kafka::metadata_response_topic*, kafka::metadata_response_topic*>(std::__1::allocator<kafka::metadata_response_topic>&, kafka::metadata_response_topic*, kafka::metadata_response_topic*, kafka::metadata_response_topic*&) at /home/ubuntu/redpanda/vbuild/llvm/install/bin/../include/c++/v1/memory:885
 (inlined by) std::__1::enable_if<__is_cpp17_forward_iterator<kafka::metadata_response_topic*>::value, void>::type std::__1::vector<kafka::metadata_response_topic, std::__1::allocator<kafka::metadata_response_topic> >::__construct_at_end<kafka::metadata_response_topic*>(kafka::metadata_response_topic*, kafka::metadata_response_topic*, unsigned long) at /home/ubuntu/redpanda/vbuild/llvm/install/bin/../include/c++/v1/vector:1006
 (inlined by) std::__1::vector<kafka::metadata_response_topic, std::__1::allocator<kafka::metadata_response_topic> >::vector(std::__1::vector<kafka::metadata_response_topic, std::__1::allocator<kafka::metadata_response_topic> > const&) at /home/ubuntu/redpanda/vbuild/llvm/install/bin/../include/c++/v1/vector:1158
 (inlined by) kafka::get_topic_metadata(kafka::request_context&, kafka::metadata_request&)::$_2::operator()(std::__1::vector<kafka::metadata_response_topic, std::__1::allocator<kafka::metadata_response_topic> >) at /home/ubuntu/redpanda/vbuild/release/clang/../../../src/v/kafka/server/handlers/metadata.cc:287
 (inlined by) seastar::future<std::__1::vector<kafka::metadata_response_topic, std::__1::allocator<kafka::metadata_response_topic> > > seastar::futurize<std::__1::vector<kafka::metadata_response_topic, std::__1::allocator<kafka::metadata_response_topic> > >::invoke<kafka::get_topic_metadata(kafka::request_context&, kafka::metadata_request&)::$_2, std::__1::vector<kafka::metadata_response_topic, std::__1::allocator<kafka::metadata_response_topic> > >(kafka::get_topic_metadata(kafka::request_context&, kafka::metadata_request&)::$_2&&, std::__1::vector<kafka::metadata_response_topic, std::__1::allocator<kafka::metadata_response_topic> >&&) at /home/ubuntu/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/future.hh:2144
 (inlined by) seastar::future<std::__1::vector<kafka::metadata_response_topic, std::__1::allocator<kafka::metadata_response_topic> > > seastar::future<std::__1::vector<kafka::metadata_response_topic, std::__1::allocator<kafka::metadata_response_topic> > >::then_impl<kafka::get_topic_metadata(kafka::request_context&, kafka::metadata_request&)::$_2, seastar::future<std::__1::vector<kafka::metadata_response_topic, std::__1::allocator<kafka::metadata_response_topic> > > >(kafka::get_topic_metadata(kafka::request_context&, kafka::metadata_request&)::$_2&&) at /home/ubuntu/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/future.hh:1608

The large allocation is the std::vector of metadata_response_partition objects inside the metadata_response_topic. The allocation of 528000 bytes represents 6k metadata objects, given the struct is 88 bytes, which is as expected as the topics involved in the load tests have 6,000 partitions.

This specific OOM occurs during an unnecessary copy, which we can eliminate with a std::move.

The text was updated successfully, but these errors were encountered:

travisdowns · 2022-07-22T04:32:11Z

Here is another OOM backtrace during this same test, the failed allocation was 2376192 bytes:

 (inlined by) std::__1::allocator_traits<std::__1::allocator<cluster::ntp_leader_revision> >::allocate(std::__1::allocator<cluster::ntp_leader_revision>&, unsigned long) at /home/ubuntu/redpanda/vbuild/llvm/install/bin/../include/c++/v1/__memory/allocator_traits.h:262
 (inlined by) std::__1::vector<cluster::ntp_leader_revision, std::__1::allocator<cluster::ntp_leader_revision> >::__vallocate(unsigned long) at /home/ubuntu/redpanda/vbuild/llvm/install/bin/../include/c++/v1/vector:922
 (inlined by) std::__1::vector<cluster::ntp_leader_revision, std::__1::allocator<cluster::ntp_leader_revision> >::vector(std::__1::vector<cluster::ntp_leader_revision, std::__1::allocator<cluster::ntp_leader_revision> > const&) at /home/ubuntu/redpanda/vbuild/llvm/install/bin/../include/c++/v1/vector:1157
 (inlined by) cluster::metadata_dissemination_handler::do_update_leadership(std::__1::vector<cluster::ntp_leader_revision, std::__1::allocator<cluster::ntp_leader_revision> >)::$_3::$_3($_3 const&) at /home/ubuntu/redpanda/vbuild/release/clang/../../../src/v/cluster/metadata_dissemination_handler.cc:76
 (inlined by) seastar::future<void> seastar::sharded<cluster::partition_leaders_table>::invoke_on_all<cluster::metadata_dissemination_handler::do_update_leadership(std::__1::vector<cluster::ntp_leader_revision, std::__1::allocator<cluster::ntp_leader_revision> >)::$_3>(seastar::smp_submit_to_options, cluster::metadata_dissemination_handler::do_update_leadership(std::__1::vector<cluster::ntp_leader_revision, std::__1::allocator<cluster::ntp_leader_revision> >)::$_3)::'lambda'(cluster::partition_leaders_table&)::('lambda'(cluster::partition_leaders_table&) const&) at /home/ubuntu/redpanda/vbuild/release/clang/rp_deps_install/include/seastar/core/sharded.hh:745
 (inlined by) std::__1::__compressed_pair_elem<seastar::future<void> seastar::sharded<cluster::partition_leaders_table>::invoke_on_all<cluster::metadata_dissemination_handler::do_update_leadership(std::__1::vector<cluster::ntp_leader_revision, std::__1::allocator<cluster::ntp_leader_revision> >)::$_3>(seastar::smp_submit_to_options, cluster::metadata_dissemination_handler::do_update_leadership(std::__1::vector<cluster::ntp_leader_revision, std::__1::allocator<cluster::ntp_leader_revision> >)::$_3)::'lambda'(cluster::partition_leaders_table&), 0, false>::__compressed_pair_elem<seastar::future<void> seastar::sharded<cluster::partition_leaders_table>::invoke_on_all<cluster::metadata_dissemination_handler::do_update_leadership(std::__1::vector<cluster::ntp_leader_revision, std::__1::allocator<cluster::ntp_leader_revision> >)::$_3>(seastar::smp_submit_to_options, cluster::metadata_dissemination_handler::do_update_leadership(std::__1::vector<cluster::ntp_leader_revision, std::__1::allocator<cluster::ntp_leader_revision> >)::$_3)::'lambda'(cluster::partition_leaders_table&) const&, 0ul>(std::__1::piecewise_construct_t, std::__1::tuple<seastar::future<void> seastar::sharded<cluster::partition_leaders_table>::invoke_on_all<cluster::metadata_dissemination_handler::do_update_leadership(std::__1::vector<cluster::ntp_leader_revision, std::__1::allocator<cluster::ntp_leader_revision> >)::$_3>(seastar::smp_submit_to_options, cluster::metadata_dissemination_handler::do_update_leadership(std::__1::vector<cluster::ntp_leader_revision, std::__1::allocator<cluster::ntp_leader_revision> >)::$_3)::'lambda'(cluster::partition_leaders_table&) const&>, std::__1::__tuple_indices<0ul>) at /home/ubuntu/redpanda/vbuild/llvm/install/bin/../include/c++/v1/__memory/compressed_pair.h:58

The copy specifically implicated here will be fixed when this commit makes it into our seastar fork.

However even with out that copy it still indicates a large vector of ntp_leader_revision objects, specifically 37128 objects of that type.

We see out of memory errors on the metadata path for large partition counts. One problematic place would have 3 or 4 copies of the partition list in flight at once. This change avoids this code entirely in the usual case that the metadata request isn't having a side effect of creating new topics and reduces copies even if it is. Issue redpanda-data#5563.

travisdowns · 2022-07-22T06:06:22Z

Summary of OOM types.

In this load test there were 32 nodes, all of which failed with an OOM. Here is the breakdown:

22 OOMs: 528,000 byte OOM described above creating a vector of metadata_response_partition objects
8 OOMs: of ~2,300,000 bytes In the metadata dissemination handler as described above.
1 OOM: 720,896 byte OOM in make_topic_response_from_topic_metadata while building the metadata_response_partition object in the outer std::transform call.
1 OOM: 13,623,500 in rpc::send_reply -> compression::stream_zstd::compress due to allocating a large temporary buffer to receive the decompression result, see #5566 .

We see out of memory errors on the metadata path for large partition counts. One problematic place would have 3 or 4 copies of the partition list in flight at once. This change avoids this code entirely in the usual case that the metadata request isn't having a side effect of creating new topics and reduces copies even if it is. Issue redpanda-data#5563.

travisdowns · 2023-07-04T17:16:22Z

The primary remaining issues were the large copies in the metadata response handler which were fixed in:

#8469

Add max_frag_bytes to the metadata memory estimate to account for the worse-cast overshoot during vector re-allocation. Issue redpanda-data#5563.

travisdowns added kind/bug Something isn't working DW labels Jul 22, 2022

travisdowns changed the title ~~OOM during load test~~ Every node OOMs during load test Jul 22, 2022

mmedenjak added the area/redpanda label Sep 30, 2022

travisdowns closed this as completed Jul 4, 2023

travisdowns added a commit to travisdowns/redpanda that referenced this issue Jul 4, 2023

Adjust metadata memory estimate for allocation

b03dc3a

Add max_frag_bytes to the metadata memory estimate to account for the worse-cast overshoot during vector re-allocation. Issue redpanda-data#5563.

This was referenced Jul 4, 2023

Adjust metadata memory estimate for allocation #11867

Merged

Metadata memory estimate misses string data #11958

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Every node OOMs during load test #5563

Every node OOMs during load test #5563

travisdowns commented Jul 22, 2022 •

edited

Loading

travisdowns commented Jul 22, 2022

travisdowns commented Jul 22, 2022

travisdowns commented Jul 4, 2023

Every node OOMs during load test #5563

Every node OOMs during load test #5563

Comments

travisdowns commented Jul 22, 2022 • edited Loading

Version & Environment

What went wrong?

What should have happened instead?

Additional information

travisdowns commented Jul 22, 2022

travisdowns commented Jul 22, 2022

travisdowns commented Jul 4, 2023

travisdowns commented Jul 22, 2022 •

edited

Loading