Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node OOM crash while decommissioning while under ingress pressure #9408

Closed
dlex opened this issue Mar 13, 2023 · 12 comments
Closed

Node OOM crash while decommissioning while under ingress pressure #9408

dlex opened this issue Mar 13, 2023 · 12 comments
Labels
kind/bug Something isn't working

Comments

@dlex
Copy link
Contributor

dlex commented Mar 13, 2023

Version & Environment

Redpanda version: v23.2.0-dev-278-g9ad94e101 - 9ad94e1019aef43dda64ef79c1286b72790e9972-dirty

Manual CDT environment using duck.py, Ubuntu 22.04.1 LTS on is4gen.xlarge (6 GiB/core)

What went wrong?

A redpanda node has crashed on memory allocation.

The test scenario was this:

  • 4-node cluster running since 22:15 and performing other test

  • Decommission of the node started at 22:35

  • By 00:05 decommission was still in progress

  • the node crashes with seastar_memory - Failed to allocate 2359296 bytes

  • Partitions were moving out very slowly:

    • log segments active on the node: 7550 at 22:22, 4630 by the time of the crash

KgoVerifierProducer has populated a topic with 315000 4K messages
then the producer went on emitting 128K messages at ~0.53 GiB/s
one of the nodes is stopped for 30 minutes and then stared back
the underreplicated partitions were being replicated into the node for ~1 hour
the node crashed with seastar - Failed to allocate 6291456 bytes

What should have happened instead?

The node should not have crashed, decommission should have completed successfully.

How to reproduce the issue?

The test for this case is still in a feature branch. The test is TieredStorageWithLoadTest.test_restarts

Additional information

Some insights into decommission progress

Metric At 22:35 (decommission start) At 00:05 (node crash)
vectorized_storage_log_log_segments_active 7550 4630
vectorized_raft_group_count 238 93
DEBUG 2023-03-11 00:05:42,760 [shard 0] seastar_memory - Failed to allocate 2359296 bytes at 0x536a19b 0x50beb17 0x50cd477 0x42287f3 0x41661ab 0x416cc9b 0x4167447 0x416688f 0x483170f 0x486ba33 0x515bd07 0x515eb6f 0x515cc53 0x50934a3 0x5091c9f 0x1d23eaf 0x53fb0db /opt/redpanda/lib/libc.so.6+0x2b1c7 /opt/redpanda/lib/libc.so.6+0x2b29f 0x1d1ee6f
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >, storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1, seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > seastar::future<storage::append_result>::then_impl_nrvo<storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1, seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >(storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&&)::'lambda'(seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >&&, storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&, seastar::future_state<storage::append_result>&&), storage::append_result>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >, seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > seastar::future<seastar::bool_class<seastar::stop_iteration_tag> >::handle_exception<storage::disk_log_appender::operator()(model::record_batch&)::$_5>(storage::disk_log_appender::operator()(model::record_batch&)::$_5&&)::'lambda'(storage::disk_log_appender::operator()(model::record_batch&)::$_5&&), seastar::futurize<storage::disk_log_appender::operator()(model::record_batch&)::$_5>::type seastar::future<seastar::bool_class<seastar::stop_iteration_tag> >::then_wrapped_nrvo<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> >, seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > seastar::future<seastar::bool_class<seastar::stop_iteration_tag> >::handle_exception<storage::disk_log_appender::operator()(model::record_batch&)::$_5>(storage::disk_log_appender::operator()(model::record_batch&)::$_5&&)::'lambda'(storage::disk_log_appender::operator()(model::record_batch&)::$_5&&)>(seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > seastar::future<seastar::bool_class<seastar::stop_iteration_tag> >::handle_exception<storage::disk_log_appender::operator()(model::record_batch&)::$_5>(storage::disk_log_appender::operator()(model::record_batch&)::$_5&&)::'lambda'(storage::disk_log_appender::operator()(model::record_batch&)::$_5&&)&&)::'lambda'(seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >&&, seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > seastar::future<seastar::bool_class<seastar::stop_iteration_tag> >::handle_exception<storage::disk_log_appender::operator()(model::record_batch&)::$_5>(storage::disk_log_appender::operator()(model::record_batch&)::$_5&&)::'lambda'(storage::disk_log_appender::operator()(model::record_batch&)::$_5&&)&, seastar::future_state<seastar::bool_class<seastar::stop_iteration_tag> >&&), seastar::bool_class<seastar::stop_iteration_tag> >
   --------
   seastar::internal::coroutine_traits_base<seastar::bool_class<seastar::stop_iteration_tag> >::promise_type
   --------
   seastar::internal::repeater<auto model::record_batch_reader::impl::do_action<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer, auto model::record_batch_reader::impl::do_for_each_ref<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::'lambda'(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&)>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >, auto model::record_batch_reader::impl::do_for_each_ref<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::'lambda'(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&)&&)::'lambda'()>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<storage::append_result>, auto model::record_batch_reader::impl::do_action<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer, auto model::record_batch_reader::impl::do_for_each_ref<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::'lambda'(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&)>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >, auto model::record_batch_reader::impl::do_for_each_ref<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::'lambda'(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&)&&)::'lambda0'(), auto model::record_batch_reader::impl::do_for_each_ref<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::'lambda'(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&) seastar::future<void>::then_impl_nrvo<auto model::record_batch_reader::impl::do_action<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer, auto model::record_batch_reader::impl::do_for_each_ref<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::'lambda'(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&)>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >, auto model::record_batch_reader::impl::do_for_each_ref<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::'lambda'(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&)&&)::'lambda0'(), seastar::future<storage::append_result> >(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&&)::'lambda'(seastar::internal::promise_base_with_type<storage::append_result>&&, auto model::record_batch_reader::impl::do_action<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer, auto model::record_batch_reader::impl::do_for_each_ref<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::'lambda'(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&)>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >, auto model::record_batch_reader::impl::do_for_each_ref<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::'lambda'(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&)&&)::'lambda0'()&, seastar::future_state<seastar::internal::monostate>&&), void>
   --------
   seastar::internal::do_with_state<std::__1::tuple<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>, seastar::future<storage::append_result> >
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<storage::append_result>, seastar::future<storage::append_result>::finally_body<auto model::record_batch_reader::for_each_ref<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >) &&::'lambda'(), true>, seastar::futurize<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>::type seastar::future<storage::append_result>::then_wrapped_nrvo<seastar::future<storage::append_result>, seastar::future<storage::append_result>::finally_body<auto model::record_batch_reader::for_each_ref<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >) &&::'lambda'(), true> >(seastar::future<storage::append_result>::finally_body<auto model::record_batch_reader::for_each_ref<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >) &&::'lambda'(), true>&&)::'lambda'(seastar::internal::promise_base_with_type<storage::append_result>&&, seastar::future<storage::append_result>::finally_body<auto model::record_batch_reader::for_each_ref<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >) &&::'lambda'(), true>&, seastar::future_state<storage::append_result>&&), storage::append_result>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<std::__1::tuple<storage::append_result, std::__1::vector<raft::offset_configuration, std::__1::allocator<raft::offset_configuration> > > >, auto raft::details::for_each_ref_extract_configuration<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(detail::base_named_type<long, model::model_offset_type, std::__1::integral_constant<bool, true> >, model::record_batch_reader&&, raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::'lambda'(std::__1::vector<raft::offset_configuration, std::__1::allocator<raft::offset_configuration> >&)::operator()(std::__1::vector<raft::offset_configuration, std::__1::allocator<raft::offset_configuration> >&)::'lambda'(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer), seastar::future<std::__1::tuple<storage::append_result, std::__1::vector<raft::offset_configuration, std::__1::allocator<raft::offset_configuration> > > > seastar::future<storage::append_result>::then_impl_nrvo<auto raft::details::for_each_ref_extract_configuration<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(detail::base_named_type<long, model::model_offset_type, std::__1::integral_constant<bool, true> >, model::record_batch_reader&&, raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::'lambda'(std::__1::vector<raft::offset_configuration, std::__1::allocator<raft::offset_configuration> >&)::operator()(std::__1::vector<raft::offset_configuration, std::__1::allocator<raft::offset_configuration> >&)::'lambda'(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer), seastar::future<std::__1::tuple<storage::append_result, std::__1::vector<raft::offset_configuration, std::__1::allocator<raft::offset_configuration> > > > >(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer&&)::'lambda'(seastar::internal::promise_base_with_type<std::__1::tuple<storage::append_result, std::__1::vector<raft::offset_configuration, std::__1::allocator<raft::offset_configuration> > > >&&, auto raft::details::for_each_ref_extract_configuration<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer>(detail::base_named_type<long, model::model_offset_type, std::__1::integral_constant<bool, true> >, model::record_batch_reader&&, raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer, std::__1::chrono::time_point<seastar::lowres_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >)::'lambda'(std::__1::vector<raft::offset_configuration, std::__1::allocator<raft::offset_configuration> >&)::operator()(std::__1::vector<raft::offset_configuration, std::__1::allocator<raft::offset_configuration> >&)::'lambda'(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::consumer)&, seastar::future_state<storage::append_result>&&), storage::append_result>
   --------
   seastar::internal::do_with_state<std::__1::tuple<std::__1::vector<raft::offset_configuration, std::__1::allocator<raft::offset_configuration> > >, seastar::future<std::__1::tuple<storage::append_result, std::__1::vector<raft::offset_configuration, std::__1::allocator<raft::offset_configuration> > > > >
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<storage::append_result>, raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::$_32, seastar::future<storage::append_result> seastar::future<std::__1::tuple<storage::append_result, std::__1::vector<raft::offset_configuration, std::__1::allocator<raft::offset_configuration> > > >::then_impl_nrvo<raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::$_32, seastar::future<storage::append_result> >(raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::$_32&&)::'lambda'(seastar::internal::promise_base_with_type<storage::append_result>&&, raft::consensus::disk_append(model::record_batch_reader&&, seastar::bool_class<raft::update_last_quorum_index>)::$_32&, seastar::future_state<std::__1::tuple<storage::append_result, std::__1::vector<raft::offset_configuration, std::__1::allocator<raft::offset_configuration> > > >&&), std::__1::tuple<storage::append_result, std::__1::vector<raft::offset_configuration, std::__1::allocator<raft::offset_configuration> > > >
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > >, raft::replicate_entries_stm::append_to_self()::$_6, seastar::future<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > > seastar::future<storage::append_result>::then_impl_nrvo<raft::replicate_entries_stm::append_to_self()::$_6, seastar::future<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > > >(raft::replicate_entries_stm::append_to_self()::$_6&&)::'lambda'(seastar::internal::promise_base_with_type<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > >&&, raft::replicate_entries_stm::append_to_self()::$_6&, seastar::future_state<storage::append_result>&&), storage::append_result>
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > >, seastar::future<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > > seastar::future<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > >::handle_exception<raft::replicate_entries_stm::append_to_self()::$_12>(raft::replicate_entries_stm::append_to_self()::$_12&&)::'lambda'(raft::replicate_entries_stm::append_to_self()::$_12&&), seastar::futurize<raft::replicate_entries_stm::append_to_self()::$_12>::type seastar::future<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > >::then_wrapped_nrvo<seastar::future<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > >, seastar::future<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > > seastar::future<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > >::handle_exception<raft::replicate_entries_stm::append_to_self()::$_12>(raft::replicate_entries_stm::append_to_self()::$_12&&)::'lambda'(raft::replicate_entries_stm::append_to_self()::$_12&&)>(seastar::future<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > > seastar::future<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > >::handle_exception<raft::replicate_entries_stm::append_to_self()::$_12>(raft::replicate_entries_stm::append_to_self()::$_12&&)::'lambda'(raft::replicate_entries_stm::append_to_self()::$_12&&)&&)::'lambda'(seastar::internal::promise_base_with_type<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > >&&, seastar::future<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > > seastar::future<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > >::handle_exception<raft::replicate_entries_stm::append_to_self()::$_12>(raft::replicate_entries_stm::append_to_self()::$_12&&)::'lambda'(raft::replicate_entries_stm::append_to_self()::$_12&&)&, seastar::future_state<boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > >&&), boost::outcome_v2::basic_result<storage::append_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<storage::append_result, std::__1::error_code, void> > >
   --------
   seastar::internal::coroutine_traits_base<boost::outcome_v2::basic_result<raft::replicate_result, std::__1::error_code, boost::outcome_v2::policy::error_code_throw_as_system_error<raft::replicate_result, std::__1::error_code, void> > >::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::internal::coroutine_traits_base<void>::promise_type
   --------
   seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<auto seastar::internal::invoke_func_with_gate<raft::replicate_batcher::cache_and_wait_for_result(seastar::promise<void>, std::__1::optional<detail::base_named_type<long, model::model_raft_term_id_type, std::__1::integral_constant<bool, true> > >, model::record_batch_reader, raft::consistency_level, std::__1::optional<std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > >)::$_0>(seastar::gate&, raft::replicate_batcher::cache_and_wait_for_result(seastar::promise<void>, std::__1::optional<detail::base_named_type<long, model::model_raft_term_id_type, std::__1::integral_constant<bool, true> > >, model::record_batch_reader, raft::consistency_level, std::__1::optional<std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > >)::$_0&&)::'lambda'(), false>, seastar::futurize<raft::replicate_batcher::cache_and_wait_for_result(seastar::promise<void>, std::__1::optional<detail::base_named_type<long, model::model_raft_term_id_type, std::__1::integral_constant<bool, true> > >, model::record_batch_reader, raft::consistency_level, std::__1::optional<std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > >)::$_0>::type seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::future<void>::finally_body<auto seastar::internal::invoke_func_with_gate<raft::replicate_batcher::cache_and_wait_for_result(seastar::promise<void>, std::__1::optional<detail::base_named_type<long, model::model_raft_term_id_type, std::__1::integral_constant<bool, true> > >, model::record_batch_reader, raft::consistency_level, std::__1::optional<std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > >)::$_0>(seastar::gate&, raft::replicate_batcher::cache_and_wait_for_result(seastar::promise<void>, std::__1::optional<detail::base_named_type<long, model::model_raft_term_id_type, std::__1::integral_constant<bool, true> > >, model::record_batch_reader, raft::consistency_level, std::__1::optional<std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > >)::$_0&&)::'lambda'(), false> >(seastar::future<void>::finally_body<auto seastar::internal::invoke_func_with_gate<raft::replicate_batcher::cache_and_wait_for_result(seastar::promise<void>, std::__1::optional<detail::base_named_type<long, model::model_raft_term_id_type, std::__1::integral_constant<bool, true> > >, model::record_batch_reader, raft::consistency_level, std::__1::optional<std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > >)::$_0>(seastar::gate&, raft::replicate_batcher::cache_and_wait_for_result(seastar::promise<void>, std::__1::optional<detail::base_named_type<long, model::model_raft_term_id_type, std::__1::integral_constant<bool, true> > >, model::record_batch_reader, raft::consistency_level, std::__1::optional<std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > >)::$_0&&)::'lambda'(), false>&&)::'lambda'(seastar::internal::promise_base_with_type<void>&&, seastar::future<void>::finally_body<auto seastar::internal::invoke_func_with_gate<raft::replicate_batcher::cache_and_wait_for_result(seastar::promise<void>, std::__1::optional<detail::base_named_type<long, model::model_raft_term_id_type, std::__1::integral_constant<bool, true> > >, model::record_batch_reader, raft::consistency_level, std::__1::optional<std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > >)::$_0>(seastar::gate&, raft::replicate_batcher::cache_and_wait_for_result(seastar::promise<void>, std::__1::optional<detail::base_named_type<long, model::model_raft_term_id_type, std::__1::integral_constant<bool, true> > >, model::record_batch_reader, raft::consistency_level, std::__1::optional<std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > >)::$_0&&)::'lambda'(), false>&, seastar::future_state<seastar::internal::monostate>&&), void>
ERROR 2023-03-11 00:05:42,760 [shard 0] seastar_memory - Dumping seastar memory diagnostics
Used memory:  1495M
Free memory:  3987M
Total memory: 5G

Small pools:
objsz	spansz	usedobj	memory	unused	wst%
8	4K	2k	112K	94K	83
10	4K	57	8K	7K	93
12	4K	56	16K	15K	95
14	4K	83	8K	7K	85
16	4K	207k	3M	184K	5
32	4K	5k	304K	146K	47
32	4K	16k	1020K	526K	51
32	4K	2k	80K	28K	35
32	4K	6k	2M	1580K	89
48	4K	2k	468K	378K	80
48	4K	2k	2M	1961K	94
64	4K	3k	424K	261K	61
64	4K	68k	21M	17M	80
80	4K	3k	1M	928K	78
96	4K	259k	26M	1882K	7
112	4K	383k	45M	4493K	9
128	4K	369	112K	66K	58
160	4K	18k	3M	276K	8
192	4K	387	204K	131K	64
224	4K	334	164K	91K	55
256	4K	271	512K	444K	86
320	8K	226	304K	233K	76
384	8K	165	208K	146K	70
448	4K	5k	2M	223K	9
512	4K	1k	1004K	340K	33
640	16K	213	272K	139K	51
768	16K	202	640K	488K	76
896	8K	225	544K	347K	63
1024	4K	87	204K	117K	57
1280	32K	2k	3M	1604K	45
1536	32K	73	256K	146K	56
1792	16K	17	352K	322K	91
2048	8K	155	416K	106K	25
2560	64K	83	576K	368K	63
3072	64K	82	512K	264K	51
3584	32K	8	544K	515K	94
4096	16K	70	2M	1M	84
5120	128K	17	768K	680K	88
6144	128K	12	512K	438K	85
7168	64K	6	576K	532K	92
8192	32K	2k	19M	2M	12
10240	64K	6	832K	770K	92
12288	64K	10	960K	840K	87
14336	128K	2	1M	1M	97
16384	64K	35k	1185M	636M	53
Page spans:
index	size	free	used	spans
0	4K	2M	110M	29k
1	8K	736K	1M	276
2	16K	336K	3M	210
3	32K	1M	26M	876
4	64K	320M	1189M	24k
5	128K	881M	17M	7k
6	256K	1G	2M	5k
7	512K	1011M	17M	2k
8	1M	526M	9M	535
9	2M	118M	2M	60
10	4M	0B	8M	2
11	8M	0B	0B	0
12	16M	0B	48M	3
13	32M	0B	0B	0
14	64M	0B	64M	1
15	128M	0B	0B	0
16	256M	0B	0B	0
17	512M	0B	0B	0
18	1G	0B	0B	0
19	2G	0B	0B	0
20	4G	0B	0B	0
21	8G	0B	0B	0
22	16G	0B	0B	0
23	32G	0B	0B	0
24	64G	0B	0B	0
25	128G	0B	0B	0
26	256G	0B	0B	0
27	512G	0B	0B	0
28	1T	0B	0B	0
29	2T	0B	0B	0
30	4T	0B	0B	0
31	8T	0B	0B	0

ERROR 2023-03-11 00:05:42,767 [shard 0] seastar - Failed to allocate 2359296 bytes
Aborting on shard 0.
Backtrace:
  0x51425cf
  0x5195e4f
  linux-vdso.so.1+0x7db
  /opt/redpanda/lib/libc.so.6+0x86067
  /opt/redpanda/lib/libc.so.6+0x3e87f
  /opt/redpanda/lib/libc.so.6+0x2aef7
  0x50becab
  0x50cd477
  0x42287f3
  0x41661ab
  0x416cc9b
  0x4167447
  0x416688f
  0x483170f
  0x486ba33
  0x515bd07
  0x515eb6f
  0x515cc53
  0x50934a3
  0x5091c9f
  0x1d23eaf
  0x53fb0db
  /opt/redpanda/lib/libc.so.6+0x2b1c7
  /opt/redpanda/lib/libc.so.6+0x2b29f
  0x1d1ee6f
@dlex dlex added the kind/bug Something isn't working label Mar 13, 2023
@dlex
Copy link
Contributor Author

dlex commented Mar 14, 2023

The full log of the failed node: redpanda.log.zip

@dlex
Copy link
Contributor Author

dlex commented Mar 14, 2023

@dlex
Copy link
Contributor Author

dlex commented Mar 14, 2023

[Backtrace #0]
void seastar::backtrace<seastar::current_backtrace_tasklocal()::$_3>(seastar::current_backtrace_tasklocal()::$_3&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::current_backtrace_tasklocal() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/util/backtrace.cc:86
 (inlined by) seastar::current_tasktrace() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/util/backtrace.cc:137
 (inlined by) seastar::current_backtrace() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/util/backtrace.cc:170
seastar::memory::maybe_dump_memory_diagnostics(unsigned long, bool) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:1784
 (inlined by) seastar::memory::on_allocation_failure(unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:1817
 (inlined by) seastar::memory::allocate(unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:1410
operator new(unsigned long) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/memory.cc:2079
void* std::__1::__libcpp_operator_new<unsigned long>(unsigned long) at /vectorized/llvm/bin/../include/c++/v1/new:245
 (inlined by) std::__1::__libcpp_allocate(unsigned long, unsigned long) at /vectorized/llvm/bin/../include/c++/v1/new:271
 (inlined by) std::__1::allocator<cluster::rm_stm::seq_entry>::allocate(unsigned long) at /vectorized/llvm/bin/../include/c++/v1/__memory/allocator.h:105
 (inlined by) std::__1::allocator_traits<std::__1::allocator<cluster::rm_stm::seq_entry> >::allocate(std::__1::allocator<cluster::rm_stm::seq_entry>&, unsigned long) at /vectorized/llvm/bin/../include/c++/v1/__memory/allocator_traits.h:262
 (inlined by) __split_buffer at /vectorized/llvm/bin/../include/c++/v1/__split_buffer:306
 (inlined by) void std::__1::vector<cluster::rm_stm::seq_entry, std::__1::allocator<cluster::rm_stm::seq_entry> >::__push_back_slow_path<cluster::rm_stm::seq_entry>(cluster::rm_stm::seq_entry&&) at /vectorized/llvm/bin/../include/c++/v1/vector:1517
 (inlined by) std::__1::vector<cluster::rm_stm::seq_entry, std::__1::allocator<cluster::rm_stm::seq_entry> >::push_back(cluster::rm_stm::seq_entry&&) at /vectorized/llvm/bin/../include/c++/v1/vector:1549
 (inlined by) cluster::rm_stm::take_snapshot() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-042eabf25c6a0b86c-1/redpanda/redpanda/src/v/cluster/rm_stm.cc:2759
cluster::persisted_stm::do_make_snapshot() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-042eabf25c6a0b86c-1/redpanda/redpanda/src/v/cluster/persisted_stm.cc:140
operator() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-042eabf25c6a0b86c-1/redpanda/redpanda/src/v/cluster/persisted_stm.cc:154
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<cluster::persisted_stm::make_snapshot()::$_2::operator()() const::{lambda()#1}>(cluster::persisted_stm::make_snapshot()::$_2::operator()() const::{lambda()#1}&&) at /vectorized/include/seastar/core/future.hh:2149
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<cluster::persisted_stm::make_snapshot()::$_2::operator()() const::{lambda()#1}>(cluster::persisted_stm::make_snapshot()::$_2::operator()() const::{lambda()#1}&&, seastar::internal::monostate) at /vectorized/include/seastar/core/future.hh:1993
 (inlined by) seastar::future<void> seastar::future<void>::then_impl<cluster::persisted_stm::make_snapshot()::$_2::operator()() const::{lambda()#1}, seastar::future<void> >(cluster::persisted_stm::make_snapshot()::$_2::operator()() const::{lambda()#1}&&) at /vectorized/include/seastar/core/future.hh:1615
 (inlined by) seastar::internal::future_result<cluster::persisted_stm::make_snapshot()::$_2::operator()() const::{lambda()#1}, void>::future_type seastar::internal::call_then_impl<seastar::future<void> >::run<cluster::persisted_stm::make_snapshot()::$_2::operator()() const::{lambda()#1}>(seastar::future<void>&, cluster::persisted_stm::make_snapshot()::$_2::operator()() const::{lambda()#1}&&) at /vectorized/include/seastar/core/future.hh:1248
 (inlined by) seastar::future<void> seastar::future<void>::then<cluster::persisted_stm::make_snapshot()::$_2::operator()() const::{lambda()#1}, seastar::future<void> >(cluster::persisted_stm::make_snapshot()::$_2::operator()() const::{lambda()#1}&&) at /vectorized/include/seastar/core/future.hh:1534
 (inlined by) operator() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-042eabf25c6a0b86c-1/redpanda/redpanda/src/v/cluster/persisted_stm.cc:154
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<cluster::persisted_stm::make_snapshot()::$_2>(cluster::persisted_stm::make_snapshot()::$_2&&) at /vectorized/include/seastar/core/future.hh:2149
 (inlined by) auto seastar::futurize_invoke<cluster::persisted_stm::make_snapshot()::$_2>(cluster::persisted_stm::make_snapshot()::$_2&&) at /vectorized/include/seastar/core/future.hh:2180
 (inlined by) operator()<seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock> > at /vectorized/include/seastar/core/semaphore.hh:742
seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<seastar::with_semaphore<seastar::named_semaphore_exception_factory, cluster::persisted_stm::make_snapshot()::$_2, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, cluster::persisted_stm::make_snapshot()::$_2&&)::{lambda(auto:1)#1}, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock> >(seastar::with_semaphore<seastar::named_semaphore_exception_factory, cluster::persisted_stm::make_snapshot()::$_2, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, cluster::persisted_stm::make_snapshot()::$_2&&)::{lambda(auto:1)#1}&&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>&&) at /vectorized/include/seastar/core/future.hh:2149
 (inlined by) seastar::future<void> seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock> >::then_impl<seastar::with_semaphore<seastar::named_semaphore_exception_factory, cluster::persisted_stm::make_snapshot()::$_2, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, cluster::persisted_stm::make_snapshot()::$_2&&)::{lambda(auto:1)#1}, seastar::future<void> >(seastar::with_semaphore<seastar::named_semaphore_exception_factory, cluster::persisted_stm::make_snapshot()::$_2, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, cluster::persisted_stm::make_snapshot()::$_2&&)::{lambda(auto:1)#1}&&) at /vectorized/include/seastar/core/future.hh:1615
 (inlined by) seastar::internal::future_result<seastar::with_semaphore<seastar::named_semaphore_exception_factory, cluster::persisted_stm::make_snapshot()::$_2, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, cluster::persisted_stm::make_snapshot()::$_2&&)::{lambda(auto:1)#1}, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock> >::future_type seastar::internal::call_then_impl<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock> > >::run<seastar::with_semaphore<seastar::named_semaphore_exception_factory, cluster::persisted_stm::make_snapshot()::$_2, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, cluster::persisted_stm::make_snapshot()::$_2&&)::{lambda(auto:1)#1}>(seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock> >&, seastar::with_semaphore<seastar::named_semaphore_exception_factory, cluster::persisted_stm::make_snapshot()::$_2, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, cluster::persisted_stm::make_snapshot()::$_2&&)::{lambda(auto:1)#1}&&) at /vectorized/include/seastar/core/future.hh:1248
 (inlined by) seastar::future<void> seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock> >::then<seastar::with_semaphore<seastar::named_semaphore_exception_factory, cluster::persisted_stm::make_snapshot()::$_2, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, cluster::persisted_stm::make_snapshot()::$_2&&)::{lambda(auto:1)#1}, seastar::future<void> >(seastar::with_semaphore<seastar::named_semaphore_exception_factory, cluster::persisted_stm::make_snapshot()::$_2, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, cluster::persisted_stm::make_snapshot()::$_2&&)::{lambda(auto:1)#1}&&) at /vectorized/include/seastar/core/future.hh:1534
 (inlined by) seastar::futurize<std::__1::invoke_result<cluster::persisted_stm::make_snapshot()::$_2>::type>::type seastar::with_semaphore<seastar::named_semaphore_exception_factory, cluster::persisted_stm::make_snapshot()::$_2, std::__1::chrono::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::__1::chrono::steady_clock>&, unsigned long, cluster::persisted_stm::make_snapshot()::$_2&&) at /vectorized/include/seastar/core/semaphore.hh:741
 (inlined by) auto mutex::with<cluster::persisted_stm::make_snapshot()::$_2>(cluster::persisted_stm::make_snapshot()::$_2&&) at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-042eabf25c6a0b86c-1/redpanda/redpanda/src/v/utils/mutex.h:42
 (inlined by) cluster::persisted_stm::make_snapshot() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-042eabf25c6a0b86c-1/redpanda/redpanda/src/v/cluster/persisted_stm.cc:152
operator() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-042eabf25c6a0b86c-1/redpanda/redpanda/src/v/cluster/persisted_stm.cc:148
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<cluster::persisted_stm::make_snapshot_in_background()::$_4>(cluster::persisted_stm::make_snapshot_in_background()::$_4&&) at /vectorized/include/seastar/core/future.hh:2149
 (inlined by) auto seastar::futurize_invoke<cluster::persisted_stm::make_snapshot_in_background()::$_4>(cluster::persisted_stm::make_snapshot_in_background()::$_4&&) at /vectorized/include/seastar/core/future.hh:2180
 (inlined by) auto seastar::internal::invoke_func_with_gate<cluster::persisted_stm::make_snapshot_in_background()::$_4>(seastar::gate&, cluster::persisted_stm::make_snapshot_in_background()::$_4&&) at /vectorized/include/seastar/core/gate.hh:221
 (inlined by) auto seastar::try_with_gate<cluster::persisted_stm::make_snapshot_in_background()::$_4>(seastar::gate&, cluster::persisted_stm::make_snapshot_in_background()::$_4&&) at /vectorized/include/seastar/core/gate.hh:261
 (inlined by) auto ssx::spawn_with_gate_then<cluster::persisted_stm::make_snapshot_in_background()::$_4>(seastar::gate&, cluster::persisted_stm::make_snapshot_in_background()::$_4&&) at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-042eabf25c6a0b86c-1/redpanda/redpanda/src/v/ssx/future-util.h:282
 (inlined by) void ssx::spawn_with_gate<cluster::persisted_stm::make_snapshot_in_background()::$_4>(seastar::gate&, cluster::persisted_stm::make_snapshot_in_background()::$_4&&) at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-042eabf25c6a0b86c-1/redpanda/redpanda/src/v/ssx/future-util.h:301
 (inlined by) cluster::persisted_stm::make_snapshot_in_background() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-042eabf25c6a0b86c-1/redpanda/redpanda/src/v/cluster/persisted_stm.cc:148
storage::stm_manager::make_snapshot_in_background() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-042eabf25c6a0b86c-1/redpanda/redpanda/src/v/storage/types.h:136
 (inlined by) storage::disk_log_impl::wrote_stm_bytes(unsigned long) at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-042eabf25c6a0b86c-1/redpanda/redpanda/src/v/storage/disk_log_impl.cc:1665
operator() at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-042eabf25c6a0b86c-1/redpanda/redpanda/src/v/storage/disk_log_appender.cc:128
 (inlined by) decltype ((static_cast<storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&>({parm#1}))(static_cast<storage::append_result>({parm#2}))) std::__1::__invoke<storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&, storage::append_result>(storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&, storage::append_result&&) at /vectorized/llvm/bin/../include/c++/v1/type_traits:3640
 (inlined by) std::__1::invoke_result<storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&, storage::append_result>::type std::__1::invoke<storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&, storage::append_result>(storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&, storage::append_result&&) at /vectorized/llvm/bin/../include/c++/v1/__functional/invoke.h:93
 (inlined by) auto seastar::internal::future_invoke<storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&, storage::append_result>(storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&, storage::append_result&&) at /vectorized/include/seastar/core/future.hh:1225
 (inlined by) operator() at /vectorized/include/seastar/core/future.hh:1596
 (inlined by) void seastar::futurize<seastar::bool_class<seastar::stop_iteration_tag> >::satisfy_with_result_of<seastar::future<storage::append_result>::then_impl_nrvo<storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1, seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >(storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&&)::{lambda(seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >&&, storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&, seastar::future_state<storage::append_result>&&)#1}::operator()(seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >&&, storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&, seastar::future_state<storage::append_result>&&) const::{lambda()#1}>(seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >&&, storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&&) at /vectorized/include/seastar/core/future.hh:2136
 (inlined by) operator() at /vectorized/include/seastar/core/future.hh:1589
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >, storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1, seastar::future<storage::append_result>::then_impl_nrvo<storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1, seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >(storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&&)::{lambda(seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag> >&&, storage::disk_log_appender::append_batch_to_segment(model::record_batch const&)::$_1&, seastar::future_state<storage::append_result>&&)#1}, storage::append_result>::run_and_dispose() at /vectorized/include/seastar/core/future.hh:781
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2330
 (inlined by) seastar::reactor::run_some_tasks() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2737
seastar::reactor::do_run() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2906
seastar::reactor::run() at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/reactor.cc:2789
seastar::app_template::run_deprecated(int, char**, std::__1::function<void ()>&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/app-template.cc:265
seastar::app_template::run(int, char**, std::__1::function<seastar::future<int> ()>&&) at /v/build/v_deps_build/seastar-prefix/src/seastar/src/core/app-template.cc:156
application::run(int, char**) at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-042eabf25c6a0b86c-1/redpanda/redpanda/src/v/redpanda/application.cc:326
main at /var/lib/buildkite-agent/builds/buildkite-arm64-builders-i-042eabf25c6a0b86c-1/redpanda/redpanda/src/v/redpanda/main.cc:22
addr2line: '/opt/redpanda/lib/libc.so.6': No such file
/opt/redpanda/lib/libc.so.6 0x2b1c7 
/opt/redpanda/lib/libc.so.6 0x2b29f 
_start at ??:?

@travisdowns
Copy link
Member

This is the "big vector" in cluster::rm_stm::take_snapshot() and seems to be a duplicate of:

#8507

@bharathv
Copy link
Contributor

The test for this case is still in a feature branch. The test is TieredStorageWithLoadTest.test_restarts

@dlex I'm not able to access this link. Is there a way to run this test on #9484. I want to see if it helps.

@dlex
Copy link
Contributor Author

dlex commented Mar 20, 2023

@bharathv I'm sorry the link was to the branch in my fork, I'm giving you the permissions now. However I think the easiest way would be to merge your change and then for me to run the test off a dev CI build after you merge. Also for now, I'm going check if I can do so off a feature branch CI build.

@dlex
Copy link
Contributor Author

dlex commented Mar 20, 2023

Also for now, I'm going check if I can do so off a feature branch CI build.

Apparently CDT can only run against a deb package, so it will be the easiest to merge the PR first and then I will try the CDT against the latest CI build the next day.

Also the link was indeed broken, sorry for that. The branch is PRed now in #9491

@bharathv
Copy link
Contributor

Okay, I merged the patch this morning. So, should be included in the next nightly build.

Apparently CDT can only run against a deb package, so it will be the easiest to merge the PR first and then I will try the CDT against the latest CI build the next day.

Think /cdt on a PR can trigger the cdt run using deb packages from PR artifacts, fwiw.

@dlex
Copy link
Contributor Author

dlex commented Mar 21, 2023

Hmm right, I wonder how because duck.py doesn't seem to support that, and can only do release/rc/nightly.

@bharathv
Copy link
Contributor

It does, more context here https://github.com/redpanda-data/vtools/pull/1329

@dlex
Copy link
Contributor Author

dlex commented Mar 22, 2023

Verified with https://buildkite.com/redpanda/redpanda/builds/25509#01870185-f1d7-44d3-9d48-24857061238e, no crashes out of 4 runs so far.

@bharathv
Copy link
Contributor

Fixed with #9484 #9616

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants