orchestrator
is able to recover from a set of failure scenarios. Notably, it can recover a failed master or a failed intermediate master.
orchestrator
supports:
- Automated recovery (takes action on unexpected failures).
- Graceful, planned, master promotion.
- Manual recovery.
- Manual, forced/panic failovers.
To run any kind of failovers, your topologies must support either:
- Oracle GTID (with
MASTER_AUTO_POSITION=1
) - MariaDB GTID
- Pseudo GTID
- Binlog Servers
See MySQL Configuration for more details.
Automated recovery is opt in. Please consider recovery configuration.
Based on a failure detection, a sequence of events compose a recovery:
- Pre-recovery hooks (external processes execution)
- Healing of topology
- Post-recovery hooks
Where:
- Pre-recovery hooks are configured by the user.
- Executed sequentially
- Failure of any of these hooks (nonzero exit code) will abort the failover.
- Topology healing is managed by
orchestrator
and is state-based rather than configuration-based.orchestrator
tries to make the best out of a bad situation, taking into consideration the existing topology, versions, server configurations, etc. - Post-recovery hooks are, again, configured by the user.
The following highlights some of the complexity of a recovery.
A "simple" recovery case is that of a DeadIntermediateMaster
. Its replicas are orphaned, but when
using GTID or Pseudo-GTID they can still be re-connected to the topology. We might choose to:
- Find a sibling of the dead intermediate master, and move orphaned replicas below said sibling
- Promote a replica from among the orphaned replicas, make it intermediate master of its siblings, then connect promoted replica up the topology
- relocate all orphaned replicas
- Combine parts of the above
The exact implementation greatly depends on the topology setup (which instances have log-slave-updates
? Are instances lagging? Do they have replication filters? Which versions of MySQL? etc.). It is very (very) likely your topology will support at least one of the above (in particular, matching-up the replicas is a trivial solution, unless replication filters are in place).
Recovering from a dead master is a much more complex operation, for various reasons:
-
There is outage implied and recovery is expected to be as fast as possible.
-
Some servers may be lost in the process.
orchestrator
must determine which, if any. -
State of the topology may be such that the user wishes to prevent the recovery.
-
Master service discovery must take place: the app must be able to communicate with the new master (potentially be told that the master has changed).
-
Find the best replica to promote.
- A naive approach would be to pick the most up-to-date replica, but that may not always be the right choice.
- It may so happen that the most up-to-date replica will not have the necessary configuration to act as master to other replicas (e.g. binlog format, MySQL versioning, replication filters and more). By blindly promoting the most up-to-date replica one may lose replica capacity.
orchestrator
attempts to promote a replica that will retain the most serving capacity.
-
Promote said replica, taking over its siblings.
-
Bring siblings up to date
-
Possibly, do a 2nd phase promotion; the user may have tagged specific servers to be promoted if possible (see
register-candidate
command). -
Call upon hooks (read further)
Master service discovery is largely the user's responsibility to implement. Common solutions are:
- DNS based discovery;
orchestrator
will need to invoke a hook that modifies DNS entries. - ZooKeeper/Consul KV/etcd/other key-value based discovery;
orchestrator
has built-in support for Consul KV, otherwise an external hook must update KV stores - Proxy based discovery;
orchestrator
will call upon external hook that modifies proxy config, or will update Consul/Zk/etcd as above, which will itself trigger an update to proxy configuration. - Other solutions...
orchestrator
attempts to be a generic solution hence takes no stance on your service discovery method.
Opt-in. Automated recovery may be applied to all ("*"
) clusters or specific ones.
Recovery follows detection, and assuming nothing blocks recovery (read further below)
For greater resolution, different configuration applies for master recovery and for intermediate-master recovery. Detailed breakdown of recovery-related configuration follows.
The analysis mechanism runs at all times, and checks periodically for failure/recovery scenarios. It will make an automated recovery for:
- An actionable type of scenario
- For an instance that is not downtimed
- For an instance belonging to a cluster for which recovery is explicitly enabled via configuration
- For an instance in a cluster that has not recently been recovered, unless such recent recoveries were acknowledged
- Where global recoveries are enabled
a.k.a graceful master takeover
TL;DR use this for replacing a master in an orderly, planned operation.
Typically for purposes of upgrade, host maintenance, etc., you may wish to replace an existing master with another one. This is a graceful promotion (aka graceful master takeover). It's essentially a rotation of roles: an existing replica turns into a new master, the old master turns into a replica.
In a graceful takeover:
- either the user or
orchestrator
choose an existing replica as the designated new master. orchestrator
ensures the designated replica takes over its siblings as intermediate masterorchestrator
turns the master to beread-only
(possibly alsosuper-read-only
)orchestrator
makes sure your designated server is caught up with replication.orchestrator
promotes your designated server as the new master.orchestrator
turns promoted server to be writable.orchestrator
demotes the old master and places it as a direct replica of the new master- if possible,
orchestrator
sets replication user/password for the demoted master - in the
graceful-master-takeover-auto
variant (see following),orchestrator
starts replication on demoted master.
The operation can take a few seconds, during which time your app is expected to complain, seeing that the master is read-only
.
orchestrator
provides you with specialized hooks to run a graceful takeover:
PreGracefulTakeoverProcesses
PostGracefulTakeoverProcesses
These hooks run in addition to the standard hooks. orchestrator
will run PreGracefulTakeoverProcesses
, then go through a DeadMaster
flow, running the normal pre-, post- hooks for DeadMaster
, and at last follows up with PostGracefulTakeoverProcesses
.
We find that some operations are similar between graceful-takeover and real failover, and some are different. The PreGracefulTakeoverProcesses
and PostGracefulTakeoverProcesses
hooks can be used, for example, to silence down alerts. You may want to disable the pager for the duration of a planned failover. Advanced usage may include stalling traffic at proxy layer.
From within the normal pre-, post- failover processes, you may use the {command}
placeholder, or ORC_COMMAND
environment variable to check whether this is a graceful takeover. You will see the value graceful-master-takeover
.
There are two variations of graceful takeover:
graceful-master-takeover
:- The user must indicate the designated replica to be promoted
- or, setup topology such that the master only has a single direct replica, which implicitly makes it the designated replica
- the demoted master is placed as replica to the new master, but
orchestrator
does not start replication on the server.
graceful-master-takeover-auto
:- The user may indicate the designated replica to be promoted, which
orchestrator
must respect - or, the user may omit the identity of designated replica, in which case
orchestrator
picks the best replica to promote - the demoted master is placed as replica to the new master, and
orchestrator
starts replication on the server.
- The user may indicate the designated replica to be promoted, which
Invoke graceful takeover via:
-
Command line; examples:
orchestrator-client -c graceful-master-takeover -alias mycluster -d designated.master.to.promote:3306
: indicate designated replica;orchestrator
does not start replication on demoted masterorchestrator-client -c graceful-master-takeover-auto -alias mycluster -d designated.master.to.promote:3306
: indicate designated replica;orchestrator
starts replication on demoted masterorchestrator-client -c graceful-master-takeover-auto -alias mycluster
: letorchestrator
choose replica to promote;orchestrator
starts replication on demoted master
-
Web API; examples:
/api/graceful-master-takeover/:clusterHint/:designatedHost/:designatedPort
: gracefully promote a new master (planned failover), indicating the designated master to promote./api/graceful-master-takeover/:clusterHint
: gracefully promote a new master (planned failover). Designated server not indicated, works when the master has exactly one direct replica./api/graceful-master-takeover-auto/:clusterHint
: gracefully promote a new master (planned failover).orchestrator
picks replica to promote.orchestrator
starts replication on demoted master:
-
Web interface: drag a direct master's replica onto the left half of the master's box. The web interface uses the
graceful-master-takeover
variation; the replication on demoted master will not kick in.
TL;DR use this when an instance is recognized as failed but where auto-recovery is disabled or blocked.
You may choose to ask orchestrator
to recover a failure by providing a specific instance that is failed.
This instance must be recognized as having a failure. It is possible to request recovery for an instance that is downtimed (as this is manual recovery it overrides automatic assumptions).
Recover via:
- Command line:
orchestrator-client -c recover -i dead.instance.com:3306 --debug
- Web API:
/api/recover/dead.instance.com/:3306
- Web: instance is colored black; click the
Recover
button
Manual recoveries don't block on RecoveryPeriodBlockSeconds
(read more in next section). They also override RecoverMasterClusterFilters
and RecoverIntermediateMasterClusterFilters
. Thus, a human can always invoke a recovery by demand. A recovery may only block on yet another recovery running at that time on the same database instance.
TL;DR force master failover right now regardless of what orchestrator
thinks.
Perhaps orchestrator
doesn't see that the instance is failed, or you have some app-logic that requires the master must change right now, or perhaps the type of failure is such that orchestrator
is unsure about. You wish to kick a master failover right now. You will run:
-
Command line:
orchestrator-client -c force-master-failover --alias mycluster
or
orchestrator-client -c force-master-failover -i instance.in.that.cluster
-
Web API:
/api/force-master-failover/mycluster
or
/api/force-master-failover/instance.in.that.cluster/3306
Recoveries are audited via:
/web/audit-recovery
/api/audit-recovery
/api/audit-recovery-steps/:uid
Nuance auditing and control available via:
/api/blocked-recoveries
: see blocked recoveries/api/ack-recovery/cluster/:clusterHint
: acknowledge a recovery on a given cluster/api/ack-all-recoveries
: acknowledge all recoveries/api/disable-global-recoveries
: global switch to disableorchestrator
from running any recoveries/api/enable-global-recoveries
: re-enable recoveries/api/check-global-recoveries
: check is global recoveries are enabled
Running manual recoveries (see next sections):
/api/recover/:host/:port
: recover specific host, assumingorchestrator
agrees there is failure./api/recover-lite/:host/:port
: same, do not invoke external hooks (can be useful for testing)/api/graceful-master-takeover/:clusterHint/:designatedHost/:designatedPort
: gracefully promote a new master (planned failover), indicating the designated master to promote./api/graceful-master-takeover/:clusterHint
: gracefully promote a new master (planned failover). Designated server not indicated, works when the master has exactly one direct replica./api/force-master-failover/:clusterHint
: panic, force master failover for given cluster
Some corresponding command line invocations:
orchestrator-client -c recover -i some.instance:3306
orchestrator-client -c graceful-master-takeover -i some.instance.in.somecluster:3306
orchestrator-client -c graceful-master-takeover -alias somecluster
orchestrator-client -c force-master-takeover -alias somecluster
orchestrator-client -c ack-cluster-recoveries -alias somecluster
orchestrator-client -c ack-all-recoveries
orchestrator-client -c disable-global-recoveries
orchestrator-client -c enable-global-recoveries
orchestrator-client -c check-global-recoveries
orchestrator
avoid flapping (cascading failures causing continuous outage and elimination of resources) by introducing a block period, where on any given cluster, orchesrartor
will not kick in automated recovery on an interval smaller than said period, unless cleared to do so by a human.
The block period is indicated by RecoveryPeriodBlockSeconds
. It only applies to recoveries on same cluster. There is nothing to prevent concurrent recoveries running on different clusters.
Pending recoveries are unblocked either once RecoveryPeriodBlockSeconds
has passed or such a recovery has been acknowledged.
Acknowledging a recovery is possible either via web API/interface (see audit/recovery page) or via command line interface (orchestrator-client -c ack-cluster-recoveries -alias somealias
).
Note that manual recovery (e.g. orchestrator-client -c recover
or orchstrator-client -c force-master-failover
) ignores the blocking period.
Some servers are better candidate for promotion in the event of failovers. Some servers aren't good picks. Examples:
- A server has weaker hardware configuration. You prefer to not promote it.
- A server is in a remote data center and you don't want to promote it.
- A server is used as your backup source and has LVM snapshots open at all times. You don't want to promote it.
- A server has a good setup and is ideal as candidate. You prefer to promote it.
- A server is OK, and you don't have any particular opinion.
You will announce your preference for a given server to orchestrator
in the following way:
orchestrator-client -c register-candidate -i ${::fqdn} --promotion-rule ${promotion_rule}
Supported promotion rules are:
prefer
neutral
prefer_not
must_not
Promotion rules expire after an hour. That's the dynamic nature of orchestrator
. You will want to setup a cron job that will announce the promotion rule for a server:
*/2 * * * * root "/usr/bin/perl -le 'sleep rand 10' && /usr/bin/orchestrator-client -c register-candidate -i this.hostname.com --promotion-rule prefer"
This setup comes from production environments. The cron entries get updated by puppet
to reflect the appropriate promotion_rule
. A server may have prefer
at this time, and prefer_not
in 5 minutes from now. Integrate your own service discovery method, your own scripting, to provide with your up-to-date promotion-rule
.
All failure/recovery scenarios are analyzed. However also taken into consideration is the downtime status of
an instance. An instance can be downtimed (via orchestrator-client -c begin-downtime
) and this is noted in the analysis summary. When considering automated recovery, downtimed servers are skipped.
Downtime was, in fact, explicitly created for this very purpose, and allows the DBA a way to suppress automated failover and a specific server.
Note that manual recovery (e.g. orchestrator-client -c recover
) overrides downtime.
orchestrator
supports hooks -- external scripts invoked through the recovery process. These are arrays of commands invoked via shell, in particular bash
. See hook configuration details in recovery configuration
OnFailureDetectionProcesses
: described in failure detection.PreGracefulTakeoverProcesses
: invoked ongraceful-master-takeover
command, before master goesread-only
.PreFailoverProcesses
PostMasterFailoverProcesses
PostIntermediateMasterFailoverProcesses
PostFailoverProcesses
PostUnsuccessfulFailoverProcesses
PostGracefulTakeoverProcesses
: executed on planned, graceful master takeover, after the old master is positioned under the newly promoted master.