Implementation of https://blueprints.launchpad.net/percona-server/+spec/backup-safe-binlog-info #122

akopytov · 2015-07-28T16:11:19Z

One inefficiency of the backup locks feature is that even though LOCK
TABLES FOR BACKUP as a light-weight FTWRL alternative does not affect
DML statements updating InnoDB tables, LOCK BINLOG FOR BACKUP does
affect them by blocking commits.

XtraBackup uses LOCK BINLOG FOR BACKUP to:

retrieve consistent binary log coordinates with SHOW MASTER
STATUS. More precisely, binary log coordinates must be consistent
with the REDO log copy and non-transactional tables. Therefore, no
updates can be done to non-transactional tables (this is achieved by
an active LOCK TABLES FOR BACKUP lock), and no commits can be
performed between SHOW MASTER STATUS and finalizing the redo log
copy, which is achieved by LOCK BINLOG FOR BACKUP.
retrieve consistent master connection information for a replication
slave. More precisely, the binary log coordinates on the master as
reported by SHOW SLAVE STATUS must be consistent with the REDO log
copy, so LOCK BINLOG FOR BACKUP also block the I/O replication thread.
For a GTID-enabled PXC node, the last binary log file must be
included into an SST snapshot. Which is a rather artificial
limitation on the WSREP side, but still XtraBackup obeys it by
blocking commits with LOCK BINLOG FOR BACKUP to ensure the integrity
of the binary log file copy.

Depending on the write rate on the server, finalizing the REDO log copy
may take a long time, so blocking commits for that duration may still
affect server availability considerably.

This task is to make the necessary server-side change to make it
possible for XtraBackup to avoid LOCK BINLOG FOR BINLOG in case #1, when
cases #2 and #3 do not apply, i.e. when no --slave-info is requested by
the XtraBackup options and the server is not a GTID-enabled PXC node.

Lifting limitations for cases #2 and #3 is also possible, but that is
outside the scope of this task.

The idea of the optimization is that even though InnoDB provides a
transactional storage for the binary log information (i.e. current file
name and offset), it cannot be fully trusted by XtraBackup, because that
information is only updated on an InnoDB commit operation. Which means
if the last operation before LOCK TABLES FOR BACKUP was an update to a
non-transactional storage engine, and no InnoDB commits occur before the
backup is finalized by XtraBackup, the InnoDB system header will contain
stale binary log coordinates.

One way to fix that would be to force binlog coordinates update in the
InnoDB system header on each update, regardless of the involved storage
engine(s). This is what a Galera node does to ensure XID consistency
which is stored in the same way as binary log coordinates: it forces XID
update in the InnoDB system header on each TOI operation, in particular
on each non-transactional update.

Another approach is less invasive: XtraBackup blocks all
non-transactional updates with LOCK TABLES FOR BACKUP anyway, so instead
of having all non-transactional updates flush binlog coordinates to
InnoDB unconditionally, LTFB can be modified to flush (and redo-log) the
current binlog coordinates to InnoDB. In which case binlog coordinates
provided by InnoDB will be consistent with REDO log under any
circumstances.

This patch implements the latter approach.

https://blueprints.launchpad.net/percona-server/+spec/backup-safe-binlog-info One inefficiency of the backup locks feature is that even though LOCK TABLES FOR BACKUP as a light-weight FTWRL alternative does not affect DML statements updating InnoDB tables, LOCK BINLOG FOR BACKUP does affect them by blocking commits. XtraBackup uses LOCK BINLOG FOR BACKUP to: 1. retrieve consistent binary log coordinates with SHOW MASTER STATUS. More precisely, binary log coordinates must be consistent with the REDO log copy and non-transactional tables. Therefore, no updates can be done to non-transactional tables (this is achieved by an active LOCK TABLES FOR BACKUP lock), and no commits can be performed between SHOW MASTER STATUS and finalizing the redo log copy, which is achieved by LOCK BINLOG FOR BACKUP. 2. retrieve consistent master connection information for a replication slave. More precisely, the binary log coordinates on the master as reported by SHOW SLAVE STATUS must be consistent with the REDO log copy, so LOCK BINLOG FOR BACKUP also block the I/O replication thread. 3. For a GTID-enabled PXC node, the last binary log file must be included into an SST snapshot. Which is a rather artificial limitation on the WSREP side, but still XtraBackup obeys it by blocking commits with LOCK BINLOG FOR BACKUP to ensure the integrity of the binary log file copy. Depending on the write rate on the server, finalizing the REDO log copy may take a long time, so blocking commits for that duration may still affect server availability considerably. This task is to make the necessary server-side change to make it possible for XtraBackup to avoid LOCK BINLOG FOR BINLOG in case percona#1, when cases percona#2 and percona#3 do not apply, i.e. when no --slave-info is requested by the XtraBackup options and the server is not a GTID-enabled PXC node. Lifting limitations for cases percona#2 and percona#3 is also possible, but that is outside the scope of this task. The idea of the optimization is that even though InnoDB provides a transactional storage for the binary log information (i.e. current file name and offset), it cannot be fully trusted by XtraBackup, because that information is only updated on an InnoDB commit operation. Which means if the last operation before LOCK TABLES FOR BACKUP was an update to a non-transactional storage engine, and no InnoDB commits occur before the backup is finalized by XtraBackup, the InnoDB system header will contain stale binary log coordinates. One way to fix that would be to force binlog coordinates update in the InnoDB system header on each update, regardless of the involved storage engine(s). This is what a Galera node does to ensure XID consistency which is stored in the same way as binary log coordinates: it forces XID update in the InnoDB system header on each TOI operation, in particular on each non-transactional update. Another approach is less invasive: XtraBackup blocks all non-transactional updates with LOCK TABLES FOR BACKUP anyway, so instead of having all non-transactional updates flush binlog coordinates to InnoDB unconditionally, LTFB can be modified to flush (and redo-log) the current binlog coordinates to InnoDB. In which case binlog coordinates provided by InnoDB will be consistent with REDO log under any circumstances. This patch implements the latter approach.

Implementation of https://blueprints.launchpad.net/percona-server/+spec/backup-safe-binlog-info

akopytov · 2015-07-28T16:14:11Z

http://jenkins.percona.com/view/PS%205.6/job/percona-server-5.6-param/903/

Summary: RocksDB has some static object dependencies, and without following dependencies, it hits SIGABRT. This diff has two fixes. 1. Updating RocksDB revision. D51789 in RocksDB changes the problematic mutex from global variable to local (function-level) static variable. 2. Changing MyRocks no to call ddl_manager.persist_stats() by kill_server_thread (which calls rocksdb_done_func()). By calling ddl_manager.persist_stats(), rocksdb::ThreadLocalPtr::StaticMeta::OnThreadExit() (which calls pthread_mutex_lock on "static port::Mutex mutex") is called at the end of the thread. On the other hand, there is a race between main thread and kill_server_thread. main thread calls pthread_mutex_destroy for the mutex. To guarantee not to call pthread_mutex_lock() after pthread_mutex_destroy(), it is necessary not to call ddl_manager.persist_stats() from kill_server_thread. This diff fixes the issue. @update-submodule: rocksdb Test Plan: mtr cardinality --repeat=100 Reviewers: hermanlee4, spetrunia, jkedgar Reviewed By: jkedgar Subscribers: yhchiang, webscalesql-eng Differential Revision: https://reviews.facebook.net/D51933

Summary: RocksDB has some static object dependencies, and without following dependencies, it hits SIGABRT. This diff has two fixes. 1. Updating RocksDB revision. D51789 in RocksDB changes the problematic mutex from global variable to local (function-level) static variable. 2. Changing MyRocks no to call ddl_manager.persist_stats() by kill_server_thread (which calls rocksdb_done_func()). By calling ddl_manager.persist_stats(), rocksdb::ThreadLocalPtr::StaticMeta::OnThreadExit() (which calls pthread_mutex_lock on "static port::Mutex mutex") is called at the end of the thread. On the other hand, there is a race between main thread and kill_server_thread. main thread calls pthread_mutex_destroy for the mutex. To guarantee not to call pthread_mutex_lock() after pthread_mutex_destroy(), it is necessary not to call ddl_manager.persist_stats() from kill_server_thread. This diff fixes the issue. update-submodule: rocksdb Reviewed By: jkedgar Differential Revision: https://reviews.facebook.net/D51933 fbshipit-source-id: ca6e0722d88

Summary: RocksDB has some static object dependencies, and without following dependencies, it hits SIGABRT. This diff has two fixes. 1. Updating RocksDB revision. D51789 in RocksDB changes the problematic mutex from global variable to local (function-level) static variable. 2. Changing MyRocks no to call ddl_manager.persist_stats() by kill_server_thread (which calls rocksdb_done_func()). By calling ddl_manager.persist_stats(), rocksdb::ThreadLocalPtr::StaticMeta::OnThreadExit() (which calls pthread_mutex_lock on "static port::Mutex mutex") is called at the end of the thread. On the other hand, there is a race between main thread and kill_server_thread. main thread calls pthread_mutex_destroy for the mutex. To guarantee not to call pthread_mutex_lock() after pthread_mutex_destroy(), it is necessary not to call ddl_manager.persist_stats() from kill_server_thread. This diff fixes the issue. update-submodule: rocksdb Reviewed By: jkedgar Differential Revision: https://reviews.facebook.net/D51933 fbshipit-source-id: e1317bab787

Summary: RocksDB has some static object dependencies, and without following dependencies, it hits SIGABRT. This diff has two fixes. 1. Updating RocksDB revision. D51789 in RocksDB changes the problematic mutex from global variable to local (function-level) static variable. 2. Changing MyRocks no to call ddl_manager.persist_stats() by kill_server_thread (which calls rocksdb_done_func()). By calling ddl_manager.persist_stats(), rocksdb::ThreadLocalPtr::StaticMeta::OnThreadExit() (which calls pthread_mutex_lock on "static port::Mutex mutex") is called at the end of the thread. On the other hand, there is a race between main thread and kill_server_thread. main thread calls pthread_mutex_destroy for the mutex. To guarantee not to call pthread_mutex_lock() after pthread_mutex_destroy(), it is necessary not to call ddl_manager.persist_stats() from kill_server_thread. This diff fixes the issue. @update-submodule: rocksdb Reviewed By: jkedgar Differential Revision: https://reviews.facebook.net/D51933

akopytov added a commit that referenced this pull request Jul 28, 2015

Merge pull request #122 from akopytov/backup_safe_binlog_info

1426d2b

Implementation of https://blueprints.launchpad.net/percona-server/+spec/backup-safe-binlog-info

akopytov merged commit 1426d2b into percona:5.6 Jul 28, 2015

akopytov deleted the backup_safe_binlog_info branch July 28, 2015 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of https://blueprints.launchpad.net/percona-server/+spec/backup-safe-binlog-info #122

Implementation of https://blueprints.launchpad.net/percona-server/+spec/backup-safe-binlog-info #122

akopytov commented Jul 28, 2015

akopytov commented Jul 28, 2015

Implementation of https://blueprints.launchpad.net/percona-server/+spec/backup-safe-binlog-info #122

Implementation of https://blueprints.launchpad.net/percona-server/+spec/backup-safe-binlog-info #122

Conversation

akopytov commented Jul 28, 2015

akopytov commented Jul 28, 2015