Skip to content

Commit

Permalink
BUG#27652526: REJOIN OLD PRIMARY NODE MAY DUPLICATE KEY WHEN RECOVERY
Browse files Browse the repository at this point in the history
Group Replication does implement conflict detection on
multi-primary to avoid write errors on parallel operations.
The conflict detection is also engaged in single-primary mode on the
particular case of primary change and the new primary still has a
backlog to apply. Until the backlog is flushed, conflict detection
is enabled to prevent write errors between the backlog and incoming
transactions.

The conflict detection data, which we name certification info, is
also used to detected dependencies between accepted transactions,
dependencies which will rule the transactions schedule on the
parallel applier.

In order to avoid that the certification info grows forever,
periodically all members exchange their GTID_EXECUTED set, which
full intersection will provide the set of transactions that are
applied on all members. Future transactions cannot conflict with
this set since all members are operating on top of it, so we can
safely remove all write-sets from the certification info that do
belong to those transactions.
More details at WL#6833: Group Replication: Read-set free
Certification Module (DBSM Snapshot Isolation).

Though a corner case was found on which the garbage collection was
purging more data than it should.
The scenario is:
 1) Group with 2 members;
 2) Member1 executes:
      CREATE TABLE t1(a INT, b INT, PRIMARY KEY(a));
      INSERT INTO t1 VALUE(1, 1);
    Both members have a GTID_EXECUTED= UUID:1-4
    Both members certification info has:
       Hash of item in Writeset     snapshot version (Gtid_set)
       xelabs#1                           UUID1:1-4
 3) member1 executes TA
      UPDATE t1 SET b=10 WHERE a=1;
    and blocks immediately before send the transaction to the group.
    This transaction has snapshot_version: UUID:1-4
 4) member2 executes TB
      UPDATE t1 SET b=10 WHERE a=1;
    This transaction has snapshot_version: UUID:1-4
    It goes through the complete patch and it is committed.
    This transaction has GTID: UUID:1000002
    Both members have a GTID_EXECUTED= UUID:1-4:1000002
    Both members certification info has:
       Hash of item in Writeset     snapshot version (Gtid_set)
       xelabs#1                           UUID1:1-4:1000002
 5) member2 becomes extremely slow in processing transactions, we
    simulate that by holding the transaction queue to the GR
    pipeline.
    Transaction delivery is still working, but the transaction will
    be block before certification.
 6) member1 is able to send its TA transaction, lets recall that
    this transaction has snapshot_version: UUID:1-4.
    On conflict detection on member1, it will conflict with xelabs#1,
    since this snapshot_version does not contain the snapshot_version
    of xelabs#1, that is TA was executed on a previous version than TB.
    On member2 the transaction will be delivered and will be put on
    hold before conflict detection.
 7) meanwhile the certification info garbage collection kicks in.
    Both members have a GTID_EXECUTED= UUID:1-4:1000002
    Its intersection is UUID:1-4:1000002
    Both members certification info has:
       Hash of item in Writeset     snapshot version (Gtid_set)
       xelabs#1                           UUID1:1-4:1000002
    The condition to purge write-sets is:
       snapshot_version.is_subset(intersection)
    We have
       "UUID:1-4:1000002".is_subset("UUID:1-4:1000002)
    which is true, so we remove xelabs#1.
    Both members certification info has:
       Hash of item in Writeset     snapshot version (Gtid_set)
       <empty>
 8) member2 gets back to normal, we release transaction TA, lets
    recall that this transaction has snapshot_version: UUID:1-4.
    On conflict detection, since the certification info is empty,
    the transaction will be allowed to proceed, which is incorrect,
    it must rollback (like on member1) since it conflicts with TB.

The problem it is on certification garbage collection, more
precisely on the condition used to purge data, we cannot leave the
certification info empty otherwise this situation can happen.
The condition must be changed to
       snapshot_version.is_subset_not_equals(intersection)
which will always leave a placeholder to detect delayed conflicting
transaction.

So a trace of the solution is (starting on step 7):
 7) meanwhile the certification info garbage collection kicks in.
    Both members have a GTID_EXECUTED= UUID:1-4:1000002
    Its intersection is UUID:1-4:1000002
    Both members certification info has:
       Hash of item in Writeset     snapshot version (Gtid_set)
       xelabs#1                           UUID1:1-4:1000002
    The condition to purge write-sets is:
       snapshot_version.is_subset_not_equals(intersection)
    We have
       "UUID:1-4:1000002".is_subset_not_equals("UUID:1-4:1000002)
    which is false, so we do not remove xelabs#1.
    Both members certification info has:
       Hash of item in Writeset     snapshot version (Gtid_set)
       xelabs#1                           UUID1:1-4:1000002
 8) member2 gets back to normal, we release transaction TA, lets
    recall that this transaction has snapshot_version: UUID:1-4.
    On conflict detection on member2, it will conflict with xelabs#1,
    since this snapshot_version does not contain the snapshot_version
    of xelabs#1, that is TA was executed on a previous version than TB.

This is the same scenario that we see on this bug, though here the
pipeline is being blocked by the distributed recovery procedure,
that is, while the joining member is applying the missing data
through the recovery channel, the incoming data is being queued.
Meanwhile the certification info garbage collection kicks in and
purges more data that it should, the result it is that conflicts are
not being detected.
  • Loading branch information
nacarvalho committed May 18, 2018
1 parent 6e40ff2 commit f63fbd3
Show file tree
Hide file tree
Showing 8 changed files with 239 additions and 12 deletions.
7 changes: 6 additions & 1 deletion rapid/plugin/group_replication/src/applier.cc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
/* Copyright (c) 2014, 2017, Oracle and/or its affiliates. All rights reserved.
/* Copyright (c) 2014, 2018, Oracle and/or its affiliates. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
Expand Down Expand Up @@ -299,6 +299,11 @@ int Applier_module::apply_data_packet(Data_packet *data_packet,
uchar* payload= data_packet->payload;
uchar* payload_end= data_packet->payload + data_packet->len;

DBUG_EXECUTE_IF("group_replication_before_apply_data_packet", {
const char act[] = "now wait_for continue_apply";
DBUG_ASSERT(!debug_sync_set_action(current_thd, STRING_WITH_LEN(act)));
});

if (check_single_primary_queue_status())
return 1; /* purecov: inspected */

Expand Down
4 changes: 2 additions & 2 deletions rapid/plugin/group_replication/src/certifier.cc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
/* Copyright (c) 2014, 2017, Oracle and/or its affiliates. All rights reserved.
/* Copyright (c) 2014, 2018, Oracle and/or its affiliates. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
Expand Down Expand Up @@ -1230,7 +1230,7 @@ void Certifier::garbage_collect()
stable_gtid_set_lock->wrlock();
while (it != certification_info.end())
{
if (it->second->is_subset(stable_gtid_set))
if (it->second->is_subset_not_equals(stable_gtid_set))
{
if (it->second->unlink() == 0)
delete it->second;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
include/group_replication.inc
Warnings:
Note #### Sending passwords in plain text without SSL/TLS is extremely insecure.
Note #### Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
[connection server1]

############################################################
# 1. Create a table on server1.
CREATE TABLE t1(a INT, b INT, PRIMARY KEY(a));
INSERT INTO t1 VALUE(1, 1);
include/rpl_sync.inc

############################################################
# 2. Set a debug sync before broadcast message to group on
# connection server_1.
# Commit a transaction that will be block before broadcast.
[connection server_1]
SET @@GLOBAL.DEBUG='+d,group_replication_before_message_broadcast';
BEGIN;
UPDATE t1 SET b=10 WHERE a=1;
COMMIT;

############################################################
# 3. Wait until server_1 connection reaches the
# group_replication_before_message_broadcast debug sync point.
[connection server1]

############################################################
# 4. Execute a transaction on server2, that will reach first
# certification, since server_1 is blocked before broadcast.
[connection server2]
UPDATE t1 SET b=20 WHERE a=1;

############################################################
# 5. Suspend pipeline on server2.
SET @@GLOBAL.DEBUG='+d,group_replication_before_apply_data_packet';

############################################################
# 6. Resume the transaction on server_1
[connection server1]
SET DEBUG_SYNC='now SIGNAL waiting';
SET @@GLOBAL.DEBUG='-d,group_replication_before_message_broadcast';
[connection server_1]
ERROR HY000: Plugin instructed the server to rollback the current transaction.

############################################################
# 7. Make sure the pipeline is suspended on server2.
[connection server2]

############################################################
# 8. Wait until certification info garbage collector does
# its work.

############################################################
# 9. Resume the pipeline on server2.
SET DEBUG_SYNC='now SIGNAL continue_apply';
SET @@GLOBAL.DEBUG='-d,group_replication_before_apply_data_packet';

############################################################
# 10. Execute a new transaction in order to have a sync point
# to make the test deterministic,
# Validate that data and GTIDs are correct.
[connection server1]
INSERT INTO t1 VALUE(2, 2);
include/rpl_sync.inc
include/assert.inc [GTID_EXECUTED must contain 6 transactions]
[connection server2]
include/assert.inc [GTID_EXECUTED must contain 6 transactions]
include/diff_tables.inc [server1:t1, server2:t1]

############################################################
# 11. Clean up.
DROP TABLE t1;
include/group_replication_end.inc
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ server1
include/assert.inc [The value of member_id should be equal to server UUID after starting group replication]
include/assert.inc [The value of Count_Transactions_checked should be 6 after starting group replication]
include/assert.inc [The value of Count_conflicts_detected should be 0 after starting group replication]
include/assert.inc [The value of Count_Transactions_rows_validating should be 4 after starting group replication]
include/assert.inc [The value of Count_Transactions_rows_validating should be 6 after starting group replication]
include/assert.inc [The value of Transactions_committed_all_members should have server 1 GTIDs before server2 start]
include/assert.inc [The value of Last_Conflict_free_transaction should be the gtid of the last applied transaction.]
SET SESSION sql_log_bin= 0;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,14 +72,14 @@ include/assert.inc ['There is a value 3 in table t2']
# 6. Check that stable set and certification info size are
# properly updated after stable set propagation and
# certification info garbage collection on server 1.
include/assert.inc ['Count_transactions_rows_validating must be 0']
include/assert.inc ['Count_transactions_rows_validating must be 2']
include/assert.inc ['Transactions_committed_all_members must be equal to GTID_EXECUTED']

############################################################
# 7. Check that stable set and certification info size are
# properly updated after stable set propagation and
# certification info garbage collection on server 2.
include/assert.inc ['Count_transactions_rows_validating must be 0']
include/assert.inc ['Count_transactions_rows_validating must be 2']
include/assert.inc ['Transactions_committed_all_members must be equal to GTID_EXECUTED']

############################################################
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
################################################################################
# Validate that certification info garbage collection do not purge more data
# than it should.
#
# Test:
# 0. The test requires two servers: M1 and M2.
# 1. Create a table on server1.
# 2. Set a debug sync before broadcast message to group on
# connection server_1.
# Commit a transaction that will be block before broadcast.
# 3. Wait until server_1 connection reaches the
# group_replication_before_message_broadcast debug sync point.
# 4. Execute a transaction on server2, that will reach first
# certification, since server_1 is blocked before broadcast.
# 5. Suspend pipeline on server2.
# 6. Resume the transaction on server_1
# 7. Make sure the pipeline is suspended on server2.
# 8. Wait until certification info garbage collector does
# its work.
# 9. Resume the pipeline on server2.
# 10. Execute a new transaction in order to have a sync point
# to make the test deterministic,
# Validate that data and GTIDs are correct.
# 11. Clean up.
################################################################################
--source include/have_debug_sync.inc
--source include/big_test.inc
--source ../inc/have_group_replication_plugin.inc
--source ../inc/group_replication.inc

--echo
--echo ############################################################
--echo # 1. Create a table on server1.
CREATE TABLE t1(a INT, b INT, PRIMARY KEY(a));
INSERT INTO t1 VALUE(1, 1);
--source include/rpl_sync.inc

--echo
--echo ############################################################
--echo # 2. Set a debug sync before broadcast message to group on
--echo # connection server_1.
--echo # Commit a transaction that will be block before broadcast.
--let $rpl_connection_name= server_1
--source include/rpl_connection.inc
SET @@GLOBAL.DEBUG='+d,group_replication_before_message_broadcast';
BEGIN;
UPDATE t1 SET b=10 WHERE a=1;
--send COMMIT

--echo
--echo ############################################################
--echo # 3. Wait until server_1 connection reaches the
--echo # group_replication_before_message_broadcast debug sync point.
--let $rpl_connection_name= server1
--source include/rpl_connection.inc
--let $wait_condition=SELECT COUNT(*)=1 FROM INFORMATION_SCHEMA.PROCESSLIST WHERE State = 'debug sync point: now'
--source include/wait_condition.inc

--echo
--echo ############################################################
--echo # 4. Execute a transaction on server2, that will reach first
--echo # certification, since server_1 is blocked before broadcast.
--let $rpl_connection_name= server2
--source include/rpl_connection.inc
UPDATE t1 SET b=20 WHERE a=1;

--echo
--echo ############################################################
--echo # 5. Suspend pipeline on server2.
SET @@GLOBAL.DEBUG='+d,group_replication_before_apply_data_packet';

--echo
--echo ############################################################
--echo # 6. Resume the transaction on server_1
--let $rpl_connection_name= server1
--source include/rpl_connection.inc
SET DEBUG_SYNC='now SIGNAL waiting';
SET @@GLOBAL.DEBUG='-d,group_replication_before_message_broadcast';

--let $rpl_connection_name= server_1
--source include/rpl_connection.inc
--error ER_TRANSACTION_ROLLBACK_DURING_COMMIT
--reap

--echo
--echo ############################################################
--echo # 7. Make sure the pipeline is suspended on server2.
--let $rpl_connection_name= server2
--source include/rpl_connection.inc
--let $wait_condition=SELECT COUNT(*)=1 FROM INFORMATION_SCHEMA.PROCESSLIST WHERE State = 'debug sync point: now'
--source include/wait_condition.inc

--echo
--echo ############################################################
--echo # 8. Wait until certification info garbage collector does
--echo # its work.
--let $gtid_assignment_block_size= `SELECT @@GLOBAL.group_replication_gtid_assignment_block_size;`
--let $expected_gtid_set= $group_replication_group_name:1-4:1000002
if ($gtid_assignment_block_size == 1)
{
--let $expected_gtid_set= $group_replication_group_name:1-5
}
--let $wait_condition= SELECT transactions_committed_all_members = "$expected_gtid_set" from performance_schema.replication_group_member_stats;
--let $wait_timeout= 150
--source include/wait_condition.inc

--echo
--echo ############################################################
--echo # 9. Resume the pipeline on server2.
SET DEBUG_SYNC='now SIGNAL continue_apply';
SET @@GLOBAL.DEBUG='-d,group_replication_before_apply_data_packet';

--echo
--echo ############################################################
--echo # 10. Execute a new transaction in order to have a sync point
--echo # to make the test deterministic,
--echo # Validate that data and GTIDs are correct.
--let $rpl_connection_name= server1
--source include/rpl_connection.inc
INSERT INTO t1 VALUE(2, 2);
--source include/rpl_sync.inc

--let $expected_gtid_set= $group_replication_group_name:1-5:1000002
if ($gtid_assignment_block_size == 1)
{
--let $expected_gtid_set= $group_replication_group_name:1-6
}

--let $assert_text= GTID_EXECUTED must contain 6 transactions
--let $assert_cond= "[SELECT @@GLOBAL.GTID_EXECUTED]" = "$expected_gtid_set";
--source include/assert.inc

--let $rpl_connection_name= server2
--source include/rpl_connection.inc
--let $assert_text= GTID_EXECUTED must contain 6 transactions
--let $assert_cond= "[SELECT @@GLOBAL.GTID_EXECUTED]" = "$expected_gtid_set";
--source include/assert.inc

--let $diff_tables=server1:t1, server2:t1
--source include/diff_tables.inc


--echo
--echo ############################################################
--echo # 11. Clean up.
DROP TABLE t1;

--source ../inc/group_replication_end.inc
Original file line number Diff line number Diff line change
Expand Up @@ -182,8 +182,8 @@ START SLAVE SQL_THREAD FOR CHANNEL "group_replication_applier";
--source include/assert.inc

--let $certification_db_size= query_get_value(SELECT Count_Transactions_rows_validating from performance_schema.replication_group_member_stats, Count_Transactions_rows_validating, 1)
--let $assert_text= The value of Count_Transactions_rows_validating should be 4 after starting group replication
--let $assert_cond= "$certification_db_size" = 4
--let $assert_text= The value of Count_Transactions_rows_validating should be 6 after starting group replication
--let $assert_cond= "$certification_db_size" = 6
--source include/assert.inc

--let $stable_set= query_get_value(SELECT Transactions_committed_all_members from performance_schema.replication_group_member_stats, Transactions_committed_all_members, 1)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -168,8 +168,8 @@ INSERT INTO t2 VALUES (3);
--connection server1

--let $count_transactions_validating= query_get_value(SELECT Count_transactions_rows_validating from performance_schema.replication_group_member_stats, Count_transactions_rows_validating, 1)
--let $assert_text= 'Count_transactions_rows_validating must be 0'
--let $assert_cond= $count_transactions_validating = 0
--let $assert_text= 'Count_transactions_rows_validating must be 2'
--let $assert_cond= $count_transactions_validating = 2
--source include/assert.inc

--let $transactions_committed_all_members= query_get_value(SELECT Transactions_committed_all_members from performance_schema.replication_group_member_stats, Transactions_committed_all_members, 1)
Expand All @@ -186,8 +186,8 @@ INSERT INTO t2 VALUES (3);
--connection server2

--let $count_transactions_validating= query_get_value(SELECT Count_transactions_rows_validating from performance_schema.replication_group_member_stats, Count_transactions_rows_validating, 1)
--let $assert_text= 'Count_transactions_rows_validating must be 0'
--let $assert_cond= $count_transactions_validating = 0
--let $assert_text= 'Count_transactions_rows_validating must be 2'
--let $assert_cond= $count_transactions_validating = 2
--source include/assert.inc

--let $transactions_committed_all_members= query_get_value(SELECT Transactions_committed_all_members from performance_schema.replication_group_member_stats, Transactions_committed_all_members, 1)
Expand Down

0 comments on commit f63fbd3

Please sign in to comment.