Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-35465 Async replication stops working on Galera async replica node when parallel replication is enabled #456

Closed
wants to merge 1 commit into from

Conversation

temeo
Copy link

@temeo temeo commented Nov 21, 2024

Parallel slave failed to retry in retry_event_group() with error

WSREP: Parallel slave worker failed at wsrep_before_command() hook

Fix wsrep transaction cleanup/restart in retry_event_group() to properly
clean up previous transaction by calling wsrep_after_statement().
Also move call to reset error after call to wsrep_after_statement()
to make sure that it remains effective.

Add a MTR test galera_as_slave_parallel_retry to reproduce the error
when the fix is not present.

Copy link

@janlindstrom janlindstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I asked Ramesh to test this with more demanding load. When submitting to MariaDB maybe should state that fix should be applied to earliest version i.e. most likely in this case 10.5.

…de when parallel replication is enabled

Parallel slave failed to retry in retry_event_group() with error

    WSREP: Parallel slave worker failed at wsrep_before_command() hook

Fix wsrep transaction cleanup/restart in retry_event_group() to properly
clean up previous transaction by calling wsrep_after_statement().
Also move call to reset error after call to wsrep_after_statement()
to make sure that it remains effective.

Add a MTR test galera_as_slave_parallel_retry to reproduce the error
when the fix is not present.

Other issues which were detected when testing with sysbench:

Check if parallel slave is killed for retry before waiting for prior
commits in THD::wsrep_parallel_slave_wait_for_prior_commit(). This
is required with slave-parallel-mode=optimistic to avoid deadlock
when a slave later in commit order manages to reach prepare phase
before a lock conflict is detected.

Suppress wsrep applier specific warning for slave threads.
@temeo
Copy link
Author

temeo commented Nov 22, 2024

Backported to 10.5 and submitted in MariaDB#3655.

@temeo temeo closed this Nov 22, 2024
@temeo temeo deleted the 10.11-MDEV-35465 branch November 22, 2024 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants