Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

clusterMaster.SelfBinlogCoordinates might not be the latest binlog after set read_only #1213

Closed
jianhaiqing opened this issue Jul 22, 2020 · 11 comments

Comments

@jianhaiqing
Copy link

demotedMasterSelfBinlogCoordinates := &clusterMaster.SelfBinlogCoordinates

The BinlogCoordinates might not be latest,then the designatedInstance just catch up with clusterMaster.SelfBinlogCoordinates, so cluster master has different gtids with the designatedInstance;
for example:
`` `
2020-07-22 08:34:20 ERROR GracefulMasterTakeover: sanity problem. Demoted master's coordinates changed from mysql-bin.000106:1176521 to mysql-bin.000106:1177259 while supposed to have been frozen

The situation occurs often, since I user pt-heartbeat to monitor the delay between master and replica. why not try to  get the latest the BinlogCoordinates after read_only is set ?
@shlomi-noach
Copy link
Collaborator

shlomi-noach commented Jul 22, 2020

I'm not sure I understand the question.

why not try to get the latest the BinlogCoordinates after read_only is set

log.Infof("GracefulMasterTakeover: Will set %+v as read_only", clusterMaster.Key)
if clusterMaster, err = inst.SetReadOnly(&clusterMaster.Key, true); err != nil {
return nil, nil, err
}
demotedMasterSelfBinlogCoordinates := &clusterMaster.SelfBinlogCoordinates

We do get the latest binlog coordinates after read_only is set.

On an unrelated note, I suspect you run pt-heartbeat with SUPER user; please consider using "UseSuperReadOnly": true in your config file.

@jianhaiqing
Copy link
Author

pt-heartbeat's user doesn't have super privilege, you can see as follows

| GRANT REPLICATION CLIENT ON *.* TO 'sys_heartbeat'@'127.0.0.1'                                         |
| GRANT SELECT, INSERT, UPDATE, DELETE, CREATE ON `heartbeat`.`heartbeat` TO 'sys_heartbeat'@'127.0.0.1' |

clusterMaster is set at the beginning of function GracefulMasterTakeover, It seems that clusterMaster is got before read_only is set, I'm not sure whether i totally understand the code, could help me confirm the logical.

@jianhaiqing
Copy link
Author

Yes, "We do get the latest binlog coordinates after read_only is set."
I will investigate the issue

@jianhaiqing
Copy link
Author

jianhaiqing commented Jul 22, 2020

It's very strange, read_only is set, and clusterMaster is still updated. Since my pt-heartbeat user doesn't have super privilege, as you can see the previous comment.

2020-07-22 08:34:17 INFO GracefulMasterTakeover: Will wait for nodetest-mysql-0.nodetest-mysql.jonathan:3306 to reach master coordinates mysql-bin.000106:1176521

2020-07-22 08:34:20 ERROR GracefulMasterTakeover: sanity problem. Demoted master's coordinates changed from mysql-bin.000106:1176521 to mysql-bin.000106:1177259 while supposed to have been frozen
ash-4.4# orchestrator-client  -c topology -alias nodetest.jonathan
nodetest-mysql-0.nodetest-mysql.jonathan:3306   [0s,ok,5.7.30-33-log,rw,ROW,>>,GTID]
+ nodetest-mysql-1.nodetest-mysql.jonathan:3306 [0s,ok,5.7.30-33-log,ro,ROW,>>,GTID:errant]
+ nodetest-mysql-2.nodetest-mysql.jonathan:3306 [0s,ok,5.7.30-33-log,ro,ROW,>>,GTID]
+ nodetest-mysql-3.nodetest-mysql.jonathan:3306 [0s,ok,5.7.30-33-log,ro,ROW,>>,GTID]
+ nodetest-mysql-4.nodetest-mysql.jonathan:3306 [0s,ok,5.7.30-33-log,ro,ROW,>>,GTID]
+ nodetest-mysql-5.nodetest-mysql.jonathan:3306 [0s,ok,5.7.30-33-log,ro,ROW,>>,GTID]

Do you i will try the general_log, do you have any other clue ?

@shlomi-noach
Copy link
Collaborator

You have the details of the binary log name and position. Can you look into the binary log to see what statement was executed?

@jianhaiqing
Copy link
Author

I have another shell script which is doing replace sql forever in order to see the impact while graceful-master-takeover-auto is done.

| GRANT USAGE ON *.* TO 'sbtest'@'%'                                 |
| GRANT SELECT, INSERT, UPDATE, DELETE ON `sbtest`.* TO 'sbtest'@'%'
#!/bin/bash

while true
do
mysqlhost=$1
mysqlport=$2
mysqluser="sbtest"
mysqlpswd="FDKVu4XIjw"
current=`date`
#output=$(mysql -hali-mysql2tidb-care-master -upmm -P3306  -ppmm@cvte -e "select @@hostname")

output=$(mysql -h ${mysqlhost} -P ${mysqlport}  -u${mysqluser} -p"${mysqlpswd}" -NB -e "select @@hostname,@@port,@@read_only" 2>/dev/null )
echo $output
output=$(mysql -h ${mysqlhost} -P ${mysqlport}  -u${mysqluser} -p"${mysqlpswd}" -NB sbtest -e "select id,pad from sbtest1 where id=1" 2>/dev/null )
echo $output
output=$(mysql -h ${mysqlhost} -P ${mysqlport}  -u${mysqluser} -p"${mysqlpswd}" -NB sbtest -e "replace into sbtest1 (id, pad) values (1,\"${current}\")" 2>/dev/null )
echo $output
echo
sleep 1

done

9f704db9-c806-11ea-90f1-9e1e63295ed3:509525

@jianhaiqing
Copy link
Author

As you can see the output of shellscript, there is another chance set read_only back to false;

nodetest-mysql-1 3306 0
1 Wed Jul 22 16:34:16 CST 2020

nodetest-mysql-1 3306 1
1 Wed Jul 22 16:34:17 CST 2020

nodetest-mysql-1 3306 0
1 Wed Jul 22 16:34:17 CST 2020

1 Wed Jul 22 16:34:17 CST 2020

nodetest-mysql-0 3306 0
1 Wed Jul 22 16:34:20 CST 2020

@jianhaiqing
Copy link
Author

It's changed by my mysql-operator orchestrator controller, which is used to inspect the topology of mysql cluster, and the writable status. Thanks @shlomi-noach

@shlomi-noach
Copy link
Collaborator

ah, cool. Out of curiosity, do you have a link to that mysql-operator? Is it a public operator?

@jianhaiqing
Copy link
Author

presslabs/mysql-operator, I develop new feature sbasing on the project. New features are discussed in bitpoke/mysql-operator#562. I hope someone can discuss with me as early as possible. But the discussion is very slow.

@jianhaiqing
Copy link
Author

The issue is very clear, i gonna close it. If someone has any question, reopen it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants