[v22.3.x] cloud_storage: move eviction under remote_partition #9933

andrwng · 2023-04-10T17:02:35Z

Backport of #9590

CONFLICT:

required adding an abort source to remote_partition
the move induced a UAF without pulling in 580c8ac and c10e266

Previously each remote_partition would wait for an eviction barrier to pass through the eviction loop, ensuring all segments are destructed before stopping the partition. Each segment references members of the remote_partition, so it's important the shutdown sequence stops the segments before destructing the remote_partition. At the same time, having each partition wait for another set of partitions to finish flushing can result in a slow shutdown.

This commit moves the eviction loop into the remote_partition, allowing partition shutdown to entirely avoid waiting for any other partition to shut down, while still ensuring that each underlying segment is destructed after the remote_partition.

Without this commit, I witnessed the period of partition shutdown in a heavily loaded server take 30 minutes. With this commit I see a similarly shaped shutdown taking 10 seconds.

Related #9569

(cherry picked from commit 03587d8)

Backports Required

Release Notes

Improvements

The shutdown sequence for partitions that use tiered storage is now faster in clusters with heavy read traffic that hydrates readers from object storage.

BenPope · 2023-04-18T21:00:52Z

@andrwng anything blocking triage of the failures, making this ready for review, and assigning reviewers?

andrwng · 2023-04-18T23:06:03Z

I need to spend a bit more time on this. The failures I'm seeing in CI indicate a real race/unsafe memory access with shutdown.

CONFLICT: - required adding an abort source to remote_partition Previously each remote_partition would wait for an eviction barrier to pass through the eviction loop, ensuring all segments are destructed before stopping the partition. Each segment references members of the remote_partition, so it's important the shutdown sequence stops the segments before destructing the remote_partition. At the same time, having each partition wait for another set of partitions to finish flushing can result in a slow shutdown. This commit moves the eviction loop into the remote_partition, allowing partition shutdown to entirely avoid waiting for any other partition to shut down, while still ensuring that each underlying segment is destructed after the remote_partition. Without this commit, I witnessed the period of partition shutdown in a heavily loaded server take 30 minutes. With this commit I see a similarly shaped shutdown taking 10 seconds. Related redpanda-data#9569 (cherry picked from commit 03587d8)

...and use it in remote_partition::erase. This is necessary because we now require a usable abourt source in all cloud storage paths, and the partition's abort source is already fired once we get to removing the persistent state.

VladLazar · 2023-04-26T10:59:52Z

I don't see anything wrong with the code, but the test_many_partitions_shutdown failure in the release build makes me a bit suspicious.

VladLazar · 2023-04-26T11:01:57Z

/ci-repeat 3
skip-units
dt-repeat=5
tests/rptest/tests/e2e_shadow_indexing_test.py::ShadowIndexingManyPartitionsTest.test_many_partitions_shutdown

VladLazar · 2023-04-26T10:57:33Z

src/v/cloud_storage/remote_partition.h

@@ -186,11 +192,40 @@ class remote_partition
    retry_chain_node _rtc;
    retry_chain_logger _ctxlog;
    ss::gate _gate;
+    ss::abort_source _as;


Would it not have been possible to backport 632676c instead of adding the abort source manually?

Yeah that would have been possible. At the time I hadn't considered cherry-picking individual commits from the backport, but that is a better approach

andrwng · 2023-04-26T16:44:31Z

The CI failures were because #10342 was missing. I triggered a rebuild to rebase, now that that's merged.

github-actions bot added the area/redpanda label Apr 10, 2023

andrwng changed the title ~~cloud_storage: move eviction under remote_partition~~ [v22.3.x] cloud_storage: move eviction under remote_partition Apr 10, 2023

vshtokman modified the milestones: v22.3.x-next, v22.3.16 Apr 11, 2023

BenPope added the kind/backport PRs targeting a stable branch label Apr 13, 2023

BenPope assigned andrwng Apr 18, 2023

piyushredpanda modified the milestones: v22.3.x-next, v22.3.17 Apr 22, 2023

andrwng and others added 2 commits April 25, 2023 01:13

cluster: add abort source to partition manager

c09f0c7

...and use it in remote_partition::erase. This is necessary because we now require a usable abourt source in all cloud storage paths, and the partition's abort source is already fired once we get to removing the persistent state.

andrwng force-pushed the v22.3.x-evict-from-partition branch from 9de1010 to c09f0c7 Compare April 25, 2023 17:03

andrwng marked this pull request as ready for review April 25, 2023 17:03

andrwng requested a review from jcsp April 25, 2023 17:04

VladLazar reviewed Apr 26, 2023

View reviewed changes

VladLazar approved these changes Apr 27, 2023

View reviewed changes

andrwng merged commit 2099e16 into redpanda-data:v22.3.x Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v22.3.x] cloud_storage: move eviction under remote_partition #9933

[v22.3.x] cloud_storage: move eviction under remote_partition #9933

andrwng commented Apr 10, 2023 •

edited

Loading

BenPope commented Apr 18, 2023

andrwng commented Apr 18, 2023

VladLazar commented Apr 26, 2023

VladLazar commented Apr 26, 2023

VladLazar Apr 26, 2023

andrwng Apr 26, 2023

andrwng commented Apr 26, 2023

[v22.3.x] cloud_storage: move eviction under remote_partition #9933

[v22.3.x] cloud_storage: move eviction under remote_partition #9933

Conversation

andrwng commented Apr 10, 2023 • edited Loading

Backports Required

Release Notes

Improvements

BenPope commented Apr 18, 2023

andrwng commented Apr 18, 2023

VladLazar commented Apr 26, 2023

VladLazar commented Apr 26, 2023

VladLazar Apr 26, 2023

Choose a reason for hiding this comment

andrwng Apr 26, 2023

Choose a reason for hiding this comment

andrwng commented Apr 26, 2023

andrwng commented Apr 10, 2023 •

edited

Loading