markReadOnlyNodesInOrc broke gracefulfailover process #566

jianhaiqing · 2020-07-22T15:07:37Z

https://github.com/presslabs/mysql-operator/blob/28ef010d9c7e6acc2518b78ef44ae220040818f0/pkg/controller/orchestrator/orchestrator_reconcile.go#L515

As you can see the following logs from orchestrator, the writable function broke the gracefulfailover process, since master is set read_only as true, and wait for CandidateReplica catch up with given binlog-position.
Can we check if cluster is ready or not before set master writable ?

2020-07-22 08:34:18 INFO topology_recovery: will handle DeadMaster event on nodetest-mysql-1.nodetest-mysql.jonathan:3306
2020-07-22 08:34:18 INFO auditType:recover-dead-master instance:nodetest-mysql-1.nodetest-mysql.jonathan:3306 cluster:nodetest-mysql-1.nodetest-mysql.jonathan:3306 message:problem found; will recover
2020-07-22 08:34:18 INFO topology_recovery: Running 1 PreFailoverProcesses hooks
2020-07-22 08:34:18 INFO topology_recovery: Running PreFailoverProcesses hook 1 of 1: /usr/local/bin/orc-helper failover-in-progress 'nodetest.jonathan' '' || true
2020-07-22 08:34:18 INFO CommandRun(/usr/local/bin/orc-helper failover-in-progress 'nodetest.jonathan' '' || true,[])
2020-07-22 08:34:18 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-033210013
[martini] Completed 200 OK in 11.10911ms
[martini] Started GET /api/audit-recovery/nodetest.jonathan for 172.29.1.36:35640
[martini] Completed 200 OK in 6.968061ms
[martini] Started GET /api/cluster/nodetest.jonathan for 172.29.1.36:35640
[martini] Completed 200 OK in 13.440635ms
[martini] Started GET /api/master/nodetest.jonathan for 172.29.1.36:35640
[martini] Completed 200 OK in 12.028423ms
## failover process is broken by the logical. 
[martini] Started GET /api/set-writeable/nodetest-mysql-1.nodetest-mysql.jonathan/3306 for 172.29.1.36:35640

some discussion about the issue in orchestrator. openark/orchestrator#1213

The text was updated successfully, but these errors were encountered:

AMecea · 2020-07-22T16:38:28Z

Hi @jianhaiqing, thank you for investigating this and reporting it!

Cluster condition FailoverInProgress which is set using the Orchestrator pre failover hook should prevent this. Maybe the cluster is not updated in time during reconciliation, we need to investigate this a little bit more.

jianhaiqing · 2020-07-23T00:37:59Z

As you know failover process of orchestrator is very fast. reconcile process and failover process are totally isolated, so you can't guarantee the executing sequence.
What's more, I think there's no need to set writable and readable for master and replica repeatedly, since cluster is ready.

func (ou *orcUpdater) markReadOnlyNodesInOrc(insts InstancesSet, master *orc.Instance) {
// If there is an in-progress failover, we will not interfere with readable/writable status on this iteration.
	fip := ou.cluster.GetClusterCondition(api.ClusterConditionFailoverInProgress)
	if fip != nil && fip.Status == core.ConditionTrue {
		ou.log.Info("cluster has a failover in progress, will delay setting readable/writeable status until failover is complete",
			"since", fip.LastTransitionTime)
		return
	}
	// If there is an graceful failover, we will not interfere with readable/writable status on this iteration.
	// so I think it's nice to check the cluster status before set-writable repeatedly.
	if ou.cluster.IsClusterReady() {
		return
	}
...
}

jianhaiqing · 2020-07-23T03:37:02Z

let me elaborate the executing sequence of these two process: orchestrator and orchestrator controller

1. orchestrator failover: topology before failover

master-A
 + replica-B
 + replica-C

orchestrator will pick B as candidate master according to the binlog executed of B and C
move C below B
set A read_only true , and read binlog-position of A named as currentPosition，
wait for B to reach currentPosition.
stop replica , reset replica of B. at the moment, A and B is totally isolated.
set B read_only=false
change A to B ; At the moment, the topology of mysqlcluster is as follows

Master-B
 + replica-C
 + replica-A

2. orchestrator_controller, just repeatedly set master writable or replica read-only .

And pre-failover will set cluster in fail-over state, and suppose to prevent master and replica read_only is set
but, as you know, orchestrator and orchestrator_controller are totally isolated,
orchestrator_controller might just pass the failover check, and orchestrator failover is executed, and executed in phase 5, and orchestrator_controller continue to set master writable.Then the gtid-set of A is larger than that of B.

AMecea · 2020-07-24T17:55:55Z

A solution can be to let the orchestrator set nodes in read-only as @dougfales proposes in #522. But then we can't make use of the read-only mode properly, because, in case of a failover the cluster will end up being writable and this is what we wanted to avoid making operator setting the read-only state.

Please let me know what is your propose.
Thank you!

dougfales · 2020-07-27T21:12:16Z

Yes, @AMecea is right, our change does trade the cluster readOnly mode for a promise that the operator won't touch the readable/writable status of the master, during failover or any other time. It's a trade-off, but we didn't need the read-only mode, so it was worth it for us. I think we may have seen the scenario you describe @jianhaiqing, during some of our testing before we switched to the approach in #522 .

AMecea mentioned this issue Aug 10, 2020

Add option to disable read-only/writable setting in operator #522

Open

1 task

AMecea mentioned this issue Nov 3, 2020

Let the orchestrator handle the read-only/writable of nodes #627

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

markReadOnlyNodesInOrc broke gracefulfailover process #566

markReadOnlyNodesInOrc broke gracefulfailover process #566

jianhaiqing commented Jul 22, 2020 •

edited

Loading

AMecea commented Jul 22, 2020

jianhaiqing commented Jul 23, 2020

jianhaiqing commented Jul 23, 2020 •

edited

Loading

AMecea commented Jul 24, 2020

dougfales commented Jul 27, 2020

markReadOnlyNodesInOrc broke gracefulfailover process #566

markReadOnlyNodesInOrc broke gracefulfailover process #566

Comments

jianhaiqing commented Jul 22, 2020 • edited Loading

AMecea commented Jul 22, 2020

jianhaiqing commented Jul 23, 2020

jianhaiqing commented Jul 23, 2020 • edited Loading

1. orchestrator failover: topology before failover

2. orchestrator_controller, just repeatedly set master writable or replica read-only .

AMecea commented Jul 24, 2020

dougfales commented Jul 27, 2020

jianhaiqing commented Jul 22, 2020 •

edited

Loading

jianhaiqing commented Jul 23, 2020 •

edited

Loading