Skip to content

Commit

Permalink
c/balancer_backend: first initialize planner and then call plan
Browse files Browse the repository at this point in the history
This change is a part of an effort to identify and fix rare segmentation
fault in Redpanda that happens after it was suspended with `SIGSTOP`
signal.
According to the C++ standard the temporary should be kept alive until
the expression ends. The crash we are observing indicates the UAF issue.
The only way the variable, that access causes the segfault, can be
deleted is by getting out of scope which in this situation should be
guaranteed.

Given our experience with coroutines and different types of lifecycle
bugs that we found in past this is a poor man's effort to avoid the
issue.

Signed-off-by: Michał Maślanka <[email protected]>
  • Loading branch information
mmaslankaprv committed Apr 26, 2024
1 parent 4182b85 commit a52d0ad
Showing 1 changed file with 20 additions and 18 deletions.
38 changes: 20 additions & 18 deletions src/v/cluster/partition_balancer_backend.cc
Original file line number Diff line number Diff line change
Expand Up @@ -357,24 +357,26 @@ ss::future<> partition_balancer_backend::do_tick() {
// claim node unresponsive it doesn't responded to at least 7
// status requests by default 700ms
auto const node_responsiveness_timeout = _node_status_interval() * 7;
auto plan_data
= co_await partition_balancer_planner(
planner_config{
.mode = _mode(),
.soft_max_disk_usage_ratio = soft_max_disk_usage_ratio,
.hard_max_disk_usage_ratio = hard_max_disk_usage_ratio,
.max_concurrent_actions = _max_concurrent_actions(),
.node_availability_timeout_sec = _availability_timeout(),
.ondemand_rebalance_requested
= _cur_term->_ondemand_rebalance_requested,
.segment_fallocation_step = _segment_fallocation_step(),
.min_partition_size_threshold = get_min_partition_size_threshold(),
.node_responsiveness_timeout = node_responsiveness_timeout,
.topic_aware = _topic_aware(),
},
_state,
_partition_allocator)
.plan_actions(health_report.value(), _tick_in_progress.value());

partition_balancer_planner planner(
planner_config{
.mode = _mode(),
.soft_max_disk_usage_ratio = soft_max_disk_usage_ratio,
.hard_max_disk_usage_ratio = hard_max_disk_usage_ratio,
.max_concurrent_actions = _max_concurrent_actions(),
.node_availability_timeout_sec = _availability_timeout(),
.ondemand_rebalance_requested
= _cur_term->_ondemand_rebalance_requested,
.segment_fallocation_step = _segment_fallocation_step(),
.min_partition_size_threshold = get_min_partition_size_threshold(),
.node_responsiveness_timeout = node_responsiveness_timeout,
.topic_aware = _topic_aware(),
},
_state,
_partition_allocator);

auto plan_data = co_await planner.plan_actions(
health_report.value(), _tick_in_progress.value());

_cur_term->last_tick_time = clock_t::now();
_cur_term->last_violations = std::move(plan_data.violations);
Expand Down

0 comments on commit a52d0ad

Please sign in to comment.