Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(source): pause source correctly #19148

Merged
merged 4 commits into from
Oct 30, 2024
Merged

fix(source): pause source correctly #19148

merged 4 commits into from
Oct 30, 2024

Conversation

zwang28
Copy link
Contributor

@zwang28 zwang28 commented Oct 28, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Unlike other executor that only pause/resume based on command of barrier, the source executor additionally pauses/resumes based on self_paused. However, there's a corner case where this self_paused method will resume the source executor incorrectly:

  1. A new stream job is created, resulting in 3 barriers: [pause, configchange, resume].
  2. The source executor receives the first [pause] barrier. Somehow the following 2 barriers are delayed.
  3. self_paused pauses the source executor.
  4. The source executor receives the [configchange] barrier, causing self_paused to resume the source executor, incorrectly. The source executor is expected to be resumed by the following [resume] barrier.

This PR fixes this issue.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@github-actions github-actions bot added the type/fix Bug fix label Oct 28, 2024
@@ -564,9 +567,14 @@ impl<S: StateStore> SourceExecutor<S> {

if let Some(mutation) = barrier.mutation.as_deref() {
match mutation {
// XXX: Is it possible that the stream is self_paused, and we have pause mutation now? In this case, it will panic.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pause_stream and resume_stream have been modified to warn instead of panic when seeing unexpected pause/resume.

@zwang28 zwang28 requested a review from hzxa21 October 28, 2024 06:21
@zwang28 zwang28 enabled auto-merge October 28, 2024 06:22
@@ -548,8 +549,10 @@ impl<S: StateStore> SourceExecutor<S> {
last_barrier_time = Instant::now();

if self_paused {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two other places we use self_paused:

It is better to fix them as well.

self_paused = false;
}

let mut split_changed = false;
if let Some(ref mutation) = barrier.mutation.as_deref() {
match mutation {
Mutation::Pause => {
command_paused = true;
pause_reader!();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also ensure pause_reader!() should only be called when only one of command_paused and self_paused is false. Otherwise, the right arm of the backfill stream will be lost. See pause_reader implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If source backfilling is expected to strictly adhere to the barrier pause, it must pause here.
I'll look into pause_reader!.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got your idea. Fixed.

@zwang28 zwang28 disabled auto-merge October 29, 2024 06:55
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix.

@zwang28 zwang28 added this pull request to the merge queue Oct 30, 2024
Merged via the queue into main with commit 0e844fe Oct 30, 2024
28 of 29 checks passed
@zwang28 zwang28 deleted the wangzheng/fix_source_pause branch October 30, 2024 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants