refactor(barrier): explicitly maintain database barrier state separately in local barrier manager #19556

wenym1 · 2024-11-25T06:52:05Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

To simplify the implementation to support database failure isolation, we should isolate different databases for fields in local barrier worker. Previously, we isolate the barrier injection and collection of different partial graphs. Ideally, it will be great to isolate the fields of local barrier worker by partial graph, so that the local barrier worker won't be aware of the concept of database.

However, after some efforts, I found it complicated to directly support isolation in different partial graphs. In snapshot backfill, during the stage of backfill, the streaming job has its own partial graph, but after the backfill finishes, its own partial graph will be merged to the global partial graph, along with its actors migrating from the own partial graph to the global partial graph. In this case, partial graphs are not completely isolated between each other. If we change to not merging the partial graph and assume that actors always belong to a fixed partial graph, we can easily support isolation in partial graphs, but the related code in global barrier manager will be unnecessarily complicated. In either way, it will be complicated to isolate different partial graphs.

Therefore, in this PR, we will let local barrier worker be aware of the concept of database. Fields in local barrier manager will be maintained mostly in the way of HashMap<DatabaseId, T>. In this way, the logic of database isolation can be easily implemented. Changes are mostly as followed:

the original ManagedBarrierState is renamed to DatabaseManagedBarrierState, to store the barrier state per database. In the streaming control bidi-stream, we will specify the database_id in each request to specify the database_id to operate on.
The LocalBarrierManager, which holds two channel tx for actors to report barrier event of actor error, will become per-database, so that the barrier event or actor error can be reported independently. It is previously stored in the SharedContext. In this PR, we change to store it separately out of SharedContext.
The SharedContext, which is used to store pending exchange channels, will also become per-database. This is based on the assumption that there won't be exchange channel between two different databases. In exchange service, to get the exchange channel, we need to specify the current database_id to locate the correct SharedContext.

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added test labels as necessary. See details.
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
All checks passed in ./risedev check (or alias, ./risedev c)
My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

wenym1 · 2024-11-25T08:06:49Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

kwannoel

After this PR, I suppose we have N streaming graphs, 1 for each DB.
After backfill completes, the partial stream graph will be merged into one of these DB streaming graphs.

kwannoel · 2024-11-27T07:40:58Z

src/stream/src/task/mod.rs

@@ -40,16 +41,23 @@ pub type UpDownActorIds = (ActorId, ActorId);
 pub type UpDownFragmentIds = (FragmentId, FragmentId);

 #[derive(Hash, Eq, PartialEq, Copy, Clone, Debug)]
-struct PartialGraphId(u64);
+pub(crate) struct PartialGraphId(u32);


I recall we used to have database id part of the partial graph id #19173 (comment).

I suppose now that we always have a separate graph per db, this is no longer necessary.

Yes. partial graph id should be unique only under a same database, and does not require to be unique globally.

hzxa21

Generally LGTM

hzxa21 · 2024-11-28T05:56:16Z

proto/stream_service.proto

  repeated uint32 actor_ids_to_collect = 4;
  repeated uint32 table_ids_to_sync = 5;
-  uint64 partial_graph_id = 6;
+  uint32 partial_graph_id = 6;


Is the partial_graph_id globally unique or only unique per database?

nvm. It is answered in your reply to noel's comment: "partial graph id should be unique only under a same database"

BugenZhao · 2024-11-29T06:16:45Z

This is based on the assumption that there won't be exchange channel between two different databases.

Does it mean that we won't support cross-database references in streaming queries (at least for the initial release)?

wenym1 · 2024-11-29T06:20:09Z

This is based on the assumption that there won't be exchange channel between two different databases.

Does it mean that we won't support cross-database references in streaming queries (at least for the initial release)?

As discussed in some offline discussion with @hzxa21 , for the initial version of cross-database streaming queries, we will always read the L0 log store of the upstream table if the table is in other database, so we don't need a streaming dispatcher in this case.

BugenZhao

LGTM

BugenZhao · 2024-11-29T06:27:59Z

src/stream/src/task/barrier_manager.rs

+            for (database_id, futures) in &mut *futures {
+                if let Poll::Ready(Some((partial_graph_id, barrier, result))) =
+                    futures.poll_next_unpin(cx)
+                {
+                    return Poll::Ready((*database_id, partial_graph_id, barrier, result));
+                }
+            }
+            Poll::Pending


If there are a considerable amount of databases, we may consider constructing them into an outer FuturesUnordered.

Might be difficult to do so.

If we create a temporary FuturesUnordered in every select, it may hurt performance somehow since we need to allocate new memory to create new FuturesUnordered. If we store the FuturesUnordered as a field of local barrier worker, the future has to be static and otherwise we will need to self-reference the DatabaseManagedBarrierState stored in local barrier worker. If we make the future to be static, we need to move the ownership of DatabaseManagedBarrierState to the future, and we will not be able to access the DatabaseManagedBarrierState before the future returns.

github-actions bot added type/refactor ci/run-e2e-single-node-tests ci/run-e2e-test-other-backends labels Nov 25, 2024

graphite-app bot requested a review from a team November 25, 2024 07:11

wenym1 requested review from shanicky, hzxa21, yezizp2012, BugenZhao and kwannoel November 25, 2024 08:02

wenym1 mentioned this pull request Nov 25, 2024

feat(meta): decouple barrier collect and sync in global barrier manager #19475

Draft

9 tasks

wenym1 mentioned this pull request Nov 26, 2024

feat(barrier): support database failure isolation (part 2, local) #19579

Merged

9 tasks

wenym1 added the ci/main-cron/run-all label Nov 26, 2024

kwannoel reviewed Nov 27, 2024

View reviewed changes

temp pass partial graph id to actor context

9defad3

wenym1 force-pushed the yiming/isolation-database-actor-failure branch from b5d838a to 9defad3 Compare November 27, 2024 10:14

hzxa21 approved these changes Nov 28, 2024

View reviewed changes

kwannoel approved these changes Nov 28, 2024

View reviewed changes

BugenZhao approved these changes Nov 29, 2024

View reviewed changes

wenym1 added this pull request to the merge queue Nov 29, 2024

Merged via the queue into main with commit 3b3a1c5 Nov 29, 2024
34 of 37 checks passed

wenym1 deleted the yiming/isolation-database-actor-failure branch November 29, 2024 09:15

wenym1 mentioned this pull request Dec 1, 2024

Tracking: support partial checkpoint #14041

Open

25 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(barrier): explicitly maintain database barrier state separately in local barrier manager #19556

refactor(barrier): explicitly maintain database barrier state separately in local barrier manager #19556

wenym1 commented Nov 25, 2024 •

edited

Loading

wenym1 commented Nov 25, 2024 •

edited

Loading

kwannoel left a comment

kwannoel Nov 27, 2024

wenym1 Nov 27, 2024

hzxa21 left a comment

hzxa21 Nov 28, 2024

hzxa21 Nov 28, 2024

BugenZhao commented Nov 29, 2024

wenym1 commented Nov 29, 2024

BugenZhao left a comment

BugenZhao Nov 29, 2024

wenym1 Nov 29, 2024

refactor(barrier): explicitly maintain database barrier state separately in local barrier manager #19556

refactor(barrier): explicitly maintain database barrier state separately in local barrier manager #19556

Conversation

wenym1 commented Nov 25, 2024 • edited Loading

What's changed and what's your intention?

Checklist

Documentation

Release note

wenym1 commented Nov 25, 2024 • edited Loading

kwannoel left a comment

Choose a reason for hiding this comment

kwannoel Nov 27, 2024

Choose a reason for hiding this comment

wenym1 Nov 27, 2024

Choose a reason for hiding this comment

hzxa21 left a comment

Choose a reason for hiding this comment

hzxa21 Nov 28, 2024

Choose a reason for hiding this comment

hzxa21 Nov 28, 2024

Choose a reason for hiding this comment

BugenZhao commented Nov 29, 2024

wenym1 commented Nov 29, 2024

BugenZhao left a comment

Choose a reason for hiding this comment

BugenZhao Nov 29, 2024

Choose a reason for hiding this comment

wenym1 Nov 29, 2024

Choose a reason for hiding this comment

wenym1 commented Nov 25, 2024 •

edited

Loading

wenym1 commented Nov 25, 2024 •

edited

Loading