BUG#27652526: REJOIN OLD PRIMARY NODE MAY DUPLICATE KEY WHEN RECOVERY

Group Replication does implement conflict detection on multi-primary to avoid write errors on parallel operations. The conflict detection is also engaged in single-primary mode on the particular case of primary change and the new primary still has a backlog to apply. Until the backlog is flushed, conflict detection is enabled to prevent write errors between the backlog and incoming transactions. The conflict detection data, which we name certification info, is also used to detected dependencies between accepted transactions, dependencies which will rule the transactions schedule on the parallel applier. In order to avoid that the certification info grows forever, periodically all members exchange their GTID_EXECUTED set, which full intersection will provide the set of transactions that are applied on all members. Future transactions cannot conflict with this set since all members are operating on top of it, so we can safely remove all write-sets from the certification info that do belong to those transactions. More details at WL#6833: Group Replication: Read-set free Certification Module (DBSM Snapshot Isolation). Though a corner case was found on which the garbage collection was purging more data than it should. The scenario is: 1) Group with 2 members; 2) Member1 executes: CREATE TABLE t1(a INT, b INT, PRIMARY KEY(a)); INSERT INTO t1 VALUE(1, 1); Both members have a GTID_EXECUTED= UUID:1-4 Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) xelabs#1 UUID1:1-4 3) member1 executes TA UPDATE t1 SET b=10 WHERE a=1; and blocks immediately before send the transaction to the group. This transaction has snapshot_version: UUID:1-4 4) member2 executes TB UPDATE t1 SET b=10 WHERE a=1; This transaction has snapshot_version: UUID:1-4 It goes through the complete patch and it is committed. This transaction has GTID: UUID:1000002 Both members have a GTID_EXECUTED= UUID:1-4:1000002 Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) xelabs#1 UUID1:1-4:1000002 5) member2 becomes extremely slow in processing transactions, we simulate that by holding the transaction queue to the GR pipeline. Transaction delivery is still working, but the transaction will be block before certification. 6) member1 is able to send its TA transaction, lets recall that this transaction has snapshot_version: UUID:1-4. On conflict detection on member1, it will conflict with xelabs#1, since this snapshot_version does not contain the snapshot_version of xelabs#1, that is TA was executed on a previous version than TB. On member2 the transaction will be delivered and will be put on hold before conflict detection. 7) meanwhile the certification info garbage collection kicks in. Both members have a GTID_EXECUTED= UUID:1-4:1000002 Its intersection is UUID:1-4:1000002 Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) xelabs#1 UUID1:1-4:1000002 The condition to purge write-sets is: snapshot_version.is_subset(intersection) We have "UUID:1-4:1000002".is_subset("UUID:1-4:1000002) which is true, so we remove xelabs#1. Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) <empty> 8) member2 gets back to normal, we release transaction TA, lets recall that this transaction has snapshot_version: UUID:1-4. On conflict detection, since the certification info is empty, the transaction will be allowed to proceed, which is incorrect, it must rollback (like on member1) since it conflicts with TB. The problem it is on certification garbage collection, more precisely on the condition used to purge data, we cannot leave the certification info empty otherwise this situation can happen. The condition must be changed to snapshot_version.is_subset_not_equals(intersection) which will always leave a placeholder to detect delayed conflicting transaction. So a trace of the solution is (starting on step 7): 7) meanwhile the certification info garbage collection kicks in. Both members have a GTID_EXECUTED= UUID:1-4:1000002 Its intersection is UUID:1-4:1000002 Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) xelabs#1 UUID1:1-4:1000002 The condition to purge write-sets is: snapshot_version.is_subset_not_equals(intersection) We have "UUID:1-4:1000002".is_subset_not_equals("UUID:1-4:1000002) which is false, so we do not remove xelabs#1. Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) xelabs#1 UUID1:1-4:1000002 8) member2 gets back to normal, we release transaction TA, lets recall that this transaction has snapshot_version: UUID:1-4. On conflict detection on member2, it will conflict with xelabs#1, since this snapshot_version does not contain the snapshot_version of xelabs#1, that is TA was executed on a previous version than TB. This is the same scenario that we see on this bug, though here the pipeline is being blocked by the distributed recovery procedure, that is, while the joining member is applying the missing data through the recovery channel, the incoming data is being queued. Meanwhile the certification info garbage collection kicks in and purges more data that it should, the result it is that conflicts are not being detected.
lulabs · May 18, 2018 · f63fbd3 · f63fbd3
1 parent 6e40ff2
commit f63fbd3
Show file tree

Hide file tree

Showing 8 changed files with 239 additions and 12 deletions.
diff --git a/rapid/plugin/group_replication/src/applier.cc b/rapid/plugin/group_replication/src/applier.cc
@@ -1,4 +1,4 @@
-/* Copyright (c) 2014, 2017, Oracle and/or its affiliates. All rights reserved.
+/* Copyright (c) 2014, 2018, Oracle and/or its affiliates. All rights reserved.
 
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
@@ -299,6 +299,11 @@ int Applier_module::apply_data_packet(Data_packet *data_packet,
   uchar* payload= data_packet->payload;
   uchar* payload_end= data_packet->payload + data_packet->len;
 
+  DBUG_EXECUTE_IF("group_replication_before_apply_data_packet", {
+    const char act[] = "now wait_for continue_apply";
+    DBUG_ASSERT(!debug_sync_set_action(current_thd, STRING_WITH_LEN(act)));
+  });
+
   if (check_single_primary_queue_status())
     return 1; /* purecov: inspected */
 

diff --git a/rapid/plugin/group_replication/src/certifier.cc b/rapid/plugin/group_replication/src/certifier.cc
@@ -1,4 +1,4 @@
-/* Copyright (c) 2014, 2017, Oracle and/or its affiliates. All rights reserved.
+/* Copyright (c) 2014, 2018, Oracle and/or its affiliates. All rights reserved.
 
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
@@ -1230,7 +1230,7 @@ void Certifier::garbage_collect()
   stable_gtid_set_lock->wrlock();
   while (it != certification_info.end())
   {
-    if (it->second->is_subset(stable_gtid_set))
+    if (it->second->is_subset_not_equals(stable_gtid_set))
     {
       if (it->second->unlink() == 0)
         delete it->second;

diff --git a/rapid/plugin/group_replication/tests/mtr/r/gr_certifier_garbage_collection2.result b/rapid/plugin/group_replication/tests/mtr/r/gr_certifier_garbage_collection2.result
@@ -0,0 +1,74 @@
+include/group_replication.inc
+Warnings:
+Note	####	Sending passwords in plain text without SSL/TLS is extremely insecure.
+Note	####	Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
+[connection server1]
+
+############################################################
+#  1. Create a table on server1.
+CREATE TABLE t1(a INT, b INT, PRIMARY KEY(a));
+INSERT INTO t1 VALUE(1, 1);
+include/rpl_sync.inc
+
+############################################################
+#  2. Set a debug sync before broadcast message to group on
+#     connection server_1.
+#     Commit a transaction that will be block before broadcast.
+[connection server_1]
+SET @@GLOBAL.DEBUG='+d,group_replication_before_message_broadcast';
+BEGIN;
+UPDATE t1 SET b=10 WHERE a=1;
+COMMIT;
+
+############################################################
+#  3. Wait until server_1 connection reaches the
+#     group_replication_before_message_broadcast debug sync point.
+[connection server1]
+
+############################################################
+#  4. Execute a transaction on server2, that will reach first
+#     certification, since server_1 is blocked before broadcast.
+[connection server2]
+UPDATE t1 SET b=20 WHERE a=1;
+
+############################################################
+#  5. Suspend pipeline on server2.
+SET @@GLOBAL.DEBUG='+d,group_replication_before_apply_data_packet';
+
+############################################################
+#  6. Resume the transaction on server_1
+[connection server1]
+SET DEBUG_SYNC='now SIGNAL waiting';
+SET @@GLOBAL.DEBUG='-d,group_replication_before_message_broadcast';
+[connection server_1]
+ERROR HY000: Plugin instructed the server to rollback the current transaction.
+
+############################################################
+#  7. Make sure the pipeline is suspended on server2.
+[connection server2]
+
+############################################################
+#  8. Wait until certification info garbage collector does
+#     its work.
+
+############################################################
+#  9. Resume the pipeline on server2.
+SET DEBUG_SYNC='now SIGNAL continue_apply';
+SET @@GLOBAL.DEBUG='-d,group_replication_before_apply_data_packet';
+
+############################################################
+# 10. Execute a new transaction in order to have a sync point
+#     to make the test deterministic,
+#     Validate that data and GTIDs are correct.
+[connection server1]
+INSERT INTO t1 VALUE(2, 2);
+include/rpl_sync.inc
+include/assert.inc [GTID_EXECUTED must contain 6 transactions]
+[connection server2]
+include/assert.inc [GTID_EXECUTED must contain 6 transactions]
+include/diff_tables.inc [server1:t1, server2:t1]
+
+############################################################
+# 11. Clean up.
+DROP TABLE t1;
+include/group_replication_end.inc
diff --git a/rapid/plugin/group_replication/tests/mtr/r/gr_perfschema_group_member_stats.result b/rapid/plugin/group_replication/tests/mtr/r/gr_perfschema_group_member_stats.result
@@ -53,7 +53,7 @@ server1
 include/assert.inc [The value of member_id should be equal to server UUID after starting group replication]
 include/assert.inc [The value of Count_Transactions_checked should be 6 after starting group replication]
 include/assert.inc [The value of Count_conflicts_detected should be 0 after starting group replication]
-include/assert.inc [The value of Count_Transactions_rows_validating should be 4 after starting group replication]
+include/assert.inc [The value of Count_Transactions_rows_validating should be 6 after starting group replication]
 include/assert.inc [The value of Transactions_committed_all_members should have server 1 GTIDs before server2 start]
 include/assert.inc [The value of Last_Conflict_free_transaction should be the gtid of the last applied transaction.]
 SET SESSION sql_log_bin= 0;

diff --git a/rapid/plugin/group_replication/tests/mtr/r/gr_set_gtid_next.result b/rapid/plugin/group_replication/tests/mtr/r/gr_set_gtid_next.result
@@ -72,14 +72,14 @@ include/assert.inc ['There is a value 3 in table t2']
 # 6. Check that stable set and certification info size are
 #    properly updated after stable set propagation and
 #    certification info garbage collection on server 1.
-include/assert.inc ['Count_transactions_rows_validating must be 0']
+include/assert.inc ['Count_transactions_rows_validating must be 2']
 include/assert.inc ['Transactions_committed_all_members must be equal to GTID_EXECUTED']
 
 ############################################################
 # 7. Check that stable set and certification info size are
 #    properly updated after stable set propagation and
 #    certification info garbage collection on server 2.
-include/assert.inc ['Count_transactions_rows_validating must be 0']
+include/assert.inc ['Count_transactions_rows_validating must be 2']
 include/assert.inc ['Transactions_committed_all_members must be equal to GTID_EXECUTED']
 
 ############################################################

diff --git a/rapid/plugin/group_replication/tests/mtr/t/gr_certifier_garbage_collection2.test b/rapid/plugin/group_replication/tests/mtr/t/gr_certifier_garbage_collection2.test
@@ -0,0 +1,148 @@
+################################################################################
+# Validate that certification info garbage collection do not purge more data
+# than it should.
+#
+# Test:
+#  0. The test requires two servers: M1 and M2.
+#  1. Create a table on server1.
+#  2. Set a debug sync before broadcast message to group on
+#     connection server_1.
+#     Commit a transaction that will be block before broadcast.
+#  3. Wait until server_1 connection reaches the
+#     group_replication_before_message_broadcast debug sync point.
+#  4. Execute a transaction on server2, that will reach first
+#     certification, since server_1 is blocked before broadcast.
+#  5. Suspend pipeline on server2.
+#  6. Resume the transaction on server_1
+#  7. Make sure the pipeline is suspended on server2.
+#  8. Wait until certification info garbage collector does
+#     its work.
+#  9. Resume the pipeline on server2.
+# 10. Execute a new transaction in order to have a sync point
+#     to make the test deterministic,
+#     Validate that data and GTIDs are correct.
+# 11. Clean up.
+################################################################################
+--source include/have_debug_sync.inc
+--source include/big_test.inc
+--source ../inc/have_group_replication_plugin.inc
+--source ../inc/group_replication.inc
+
+--echo
+--echo ############################################################
+--echo #  1. Create a table on server1.
+CREATE TABLE t1(a INT, b INT, PRIMARY KEY(a));
+INSERT INTO t1 VALUE(1, 1);
+--source include/rpl_sync.inc
+
+--echo
+--echo ############################################################
+--echo #  2. Set a debug sync before broadcast message to group on
+--echo #     connection server_1.
+--echo #     Commit a transaction that will be block before broadcast.
+--let $rpl_connection_name= server_1
+--source include/rpl_connection.inc
+SET @@GLOBAL.DEBUG='+d,group_replication_before_message_broadcast';
+BEGIN;
+UPDATE t1 SET b=10 WHERE a=1;
+--send COMMIT
+
+--echo
+--echo ############################################################
+--echo #  3. Wait until server_1 connection reaches the
+--echo #     group_replication_before_message_broadcast debug sync point.
+--let $rpl_connection_name= server1
+--source include/rpl_connection.inc
+--let $wait_condition=SELECT COUNT(*)=1 FROM INFORMATION_SCHEMA.PROCESSLIST WHERE State = 'debug sync point: now'
+--source include/wait_condition.inc
+
+--echo
+--echo ############################################################
+--echo #  4. Execute a transaction on server2, that will reach first
+--echo #     certification, since server_1 is blocked before broadcast.
+--let $rpl_connection_name= server2
+--source include/rpl_connection.inc
+UPDATE t1 SET b=20 WHERE a=1;
+
+--echo
+--echo ############################################################
+--echo #  5. Suspend pipeline on server2.
+SET @@GLOBAL.DEBUG='+d,group_replication_before_apply_data_packet';
+
+--echo
+--echo ############################################################
+--echo #  6. Resume the transaction on server_1
+--let $rpl_connection_name= server1
+--source include/rpl_connection.inc
+SET DEBUG_SYNC='now SIGNAL waiting';
+SET @@GLOBAL.DEBUG='-d,group_replication_before_message_broadcast';
+
+--let $rpl_connection_name= server_1
+--source include/rpl_connection.inc
+--error ER_TRANSACTION_ROLLBACK_DURING_COMMIT
+--reap
+
+--echo
+--echo ############################################################
+--echo #  7. Make sure the pipeline is suspended on server2.
+--let $rpl_connection_name= server2
+--source include/rpl_connection.inc
+--let $wait_condition=SELECT COUNT(*)=1 FROM INFORMATION_SCHEMA.PROCESSLIST WHERE State = 'debug sync point: now'
+--source include/wait_condition.inc
+
+--echo
+--echo ############################################################
+--echo #  8. Wait until certification info garbage collector does
+--echo #     its work.
+--let $gtid_assignment_block_size= `SELECT @@GLOBAL.group_replication_gtid_assignment_block_size;`
+--let $expected_gtid_set= $group_replication_group_name:1-4:1000002
+if ($gtid_assignment_block_size == 1)
+{
+  --let $expected_gtid_set= $group_replication_group_name:1-5
+}
+--let $wait_condition= SELECT transactions_committed_all_members = "$expected_gtid_set" from performance_schema.replication_group_member_stats;
+--let $wait_timeout= 150
+--source include/wait_condition.inc
+
+--echo
+--echo ############################################################
+--echo #  9. Resume the pipeline on server2.
+SET DEBUG_SYNC='now SIGNAL continue_apply';
+SET @@GLOBAL.DEBUG='-d,group_replication_before_apply_data_packet';
+
+--echo
+--echo ############################################################
+--echo # 10. Execute a new transaction in order to have a sync point
+--echo #     to make the test deterministic,
+--echo #     Validate that data and GTIDs are correct.
+--let $rpl_connection_name= server1
+--source include/rpl_connection.inc
+INSERT INTO t1 VALUE(2, 2);
+--source include/rpl_sync.inc
+
+--let $expected_gtid_set= $group_replication_group_name:1-5:1000002
+if ($gtid_assignment_block_size == 1)
+{
+  --let $expected_gtid_set= $group_replication_group_name:1-6
+}
+
+--let $assert_text= GTID_EXECUTED must contain 6 transactions
+--let $assert_cond= "[SELECT @@GLOBAL.GTID_EXECUTED]" = "$expected_gtid_set";
+--source include/assert.inc
+
+--let $rpl_connection_name= server2
+--source include/rpl_connection.inc
+--let $assert_text= GTID_EXECUTED must contain 6 transactions
+--let $assert_cond= "[SELECT @@GLOBAL.GTID_EXECUTED]" = "$expected_gtid_set";
+--source include/assert.inc
+
+--let $diff_tables=server1:t1, server2:t1
+--source include/diff_tables.inc
+
+
+--echo
+--echo ############################################################
+--echo # 11. Clean up.
+DROP TABLE t1;
+
+--source ../inc/group_replication_end.inc
diff --git a/rapid/plugin/group_replication/tests/mtr/t/gr_perfschema_group_member_stats.test b/rapid/plugin/group_replication/tests/mtr/t/gr_perfschema_group_member_stats.test
@@ -182,8 +182,8 @@ START SLAVE SQL_THREAD FOR CHANNEL "group_replication_applier";
 --source include/assert.inc
 
 --let $certification_db_size= query_get_value(SELECT Count_Transactions_rows_validating from performance_schema.replication_group_member_stats, Count_Transactions_rows_validating, 1)
---let $assert_text= The value of Count_Transactions_rows_validating should be 4 after starting group replication
---let $assert_cond= "$certification_db_size" = 4
+--let $assert_text= The value of Count_Transactions_rows_validating should be 6 after starting group replication
+--let $assert_cond= "$certification_db_size" = 6
 --source include/assert.inc
 
 --let $stable_set= query_get_value(SELECT Transactions_committed_all_members from performance_schema.replication_group_member_stats, Transactions_committed_all_members, 1)

diff --git a/rapid/plugin/group_replication/tests/mtr/t/gr_set_gtid_next.test b/rapid/plugin/group_replication/tests/mtr/t/gr_set_gtid_next.test
@@ -168,8 +168,8 @@ INSERT INTO t2 VALUES (3);
 --connection server1
 
 --let $count_transactions_validating= query_get_value(SELECT Count_transactions_rows_validating from performance_schema.replication_group_member_stats, Count_transactions_rows_validating, 1)
---let $assert_text= 'Count_transactions_rows_validating must be 0'
---let $assert_cond= $count_transactions_validating = 0
+--let $assert_text= 'Count_transactions_rows_validating must be 2'
+--let $assert_cond= $count_transactions_validating = 2
 --source include/assert.inc
 
 --let $transactions_committed_all_members= query_get_value(SELECT Transactions_committed_all_members from performance_schema.replication_group_member_stats, Transactions_committed_all_members, 1)
@@ -186,8 +186,8 @@ INSERT INTO t2 VALUES (3);
 --connection server2
 
 --let $count_transactions_validating= query_get_value(SELECT Count_transactions_rows_validating from performance_schema.replication_group_member_stats, Count_transactions_rows_validating, 1)
---let $assert_text= 'Count_transactions_rows_validating must be 0'
---let $assert_cond= $count_transactions_validating = 0
+--let $assert_text= 'Count_transactions_rows_validating must be 2'
+--let $assert_cond= $count_transactions_validating = 2
 --source include/assert.inc
 
 --let $transactions_committed_all_members= query_get_value(SELECT Transactions_committed_all_members from performance_schema.replication_group_member_stats, Transactions_committed_all_members, 1)