Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
BUG#27652526: REJOIN OLD PRIMARY NODE MAY DUPLICATE KEY WHEN RECOVERY
Group Replication does implement conflict detection on multi-primary to avoid write errors on parallel operations. The conflict detection is also engaged in single-primary mode on the particular case of primary change and the new primary still has a backlog to apply. Until the backlog is flushed, conflict detection is enabled to prevent write errors between the backlog and incoming transactions. The conflict detection data, which we name certification info, is also used to detected dependencies between accepted transactions, dependencies which will rule the transactions schedule on the parallel applier. In order to avoid that the certification info grows forever, periodically all members exchange their GTID_EXECUTED set, which full intersection will provide the set of transactions that are applied on all members. Future transactions cannot conflict with this set since all members are operating on top of it, so we can safely remove all write-sets from the certification info that do belong to those transactions. More details at WL#6833: Group Replication: Read-set free Certification Module (DBSM Snapshot Isolation). Though a corner case was found on which the garbage collection was purging more data than it should. The scenario is: 1) Group with 2 members; 2) Member1 executes: CREATE TABLE t1(a INT, b INT, PRIMARY KEY(a)); INSERT INTO t1 VALUE(1, 1); Both members have a GTID_EXECUTED= UUID:1-4 Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) xelabs#1 UUID1:1-4 3) member1 executes TA UPDATE t1 SET b=10 WHERE a=1; and blocks immediately before send the transaction to the group. This transaction has snapshot_version: UUID:1-4 4) member2 executes TB UPDATE t1 SET b=10 WHERE a=1; This transaction has snapshot_version: UUID:1-4 It goes through the complete patch and it is committed. This transaction has GTID: UUID:1000002 Both members have a GTID_EXECUTED= UUID:1-4:1000002 Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) xelabs#1 UUID1:1-4:1000002 5) member2 becomes extremely slow in processing transactions, we simulate that by holding the transaction queue to the GR pipeline. Transaction delivery is still working, but the transaction will be block before certification. 6) member1 is able to send its TA transaction, lets recall that this transaction has snapshot_version: UUID:1-4. On conflict detection on member1, it will conflict with xelabs#1, since this snapshot_version does not contain the snapshot_version of xelabs#1, that is TA was executed on a previous version than TB. On member2 the transaction will be delivered and will be put on hold before conflict detection. 7) meanwhile the certification info garbage collection kicks in. Both members have a GTID_EXECUTED= UUID:1-4:1000002 Its intersection is UUID:1-4:1000002 Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) xelabs#1 UUID1:1-4:1000002 The condition to purge write-sets is: snapshot_version.is_subset(intersection) We have "UUID:1-4:1000002".is_subset("UUID:1-4:1000002) which is true, so we remove xelabs#1. Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) <empty> 8) member2 gets back to normal, we release transaction TA, lets recall that this transaction has snapshot_version: UUID:1-4. On conflict detection, since the certification info is empty, the transaction will be allowed to proceed, which is incorrect, it must rollback (like on member1) since it conflicts with TB. The problem it is on certification garbage collection, more precisely on the condition used to purge data, we cannot leave the certification info empty otherwise this situation can happen. The condition must be changed to snapshot_version.is_subset_not_equals(intersection) which will always leave a placeholder to detect delayed conflicting transaction. So a trace of the solution is (starting on step 7): 7) meanwhile the certification info garbage collection kicks in. Both members have a GTID_EXECUTED= UUID:1-4:1000002 Its intersection is UUID:1-4:1000002 Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) xelabs#1 UUID1:1-4:1000002 The condition to purge write-sets is: snapshot_version.is_subset_not_equals(intersection) We have "UUID:1-4:1000002".is_subset_not_equals("UUID:1-4:1000002) which is false, so we do not remove xelabs#1. Both members certification info has: Hash of item in Writeset snapshot version (Gtid_set) xelabs#1 UUID1:1-4:1000002 8) member2 gets back to normal, we release transaction TA, lets recall that this transaction has snapshot_version: UUID:1-4. On conflict detection on member2, it will conflict with xelabs#1, since this snapshot_version does not contain the snapshot_version of xelabs#1, that is TA was executed on a previous version than TB. This is the same scenario that we see on this bug, though here the pipeline is being blocked by the distributed recovery procedure, that is, while the joining member is applying the missing data through the recovery channel, the incoming data is being queued. Meanwhile the certification info garbage collection kicks in and purges more data that it should, the result it is that conflicts are not being detected.
- Loading branch information