Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FB8-276 - Fix test tmp_table_bytes_written #9

Closed
wants to merge 1 commit into from

Conversation

ldonoso
Copy link

@ldonoso ldonoso commented Mar 15, 2022


Problem:

The select ... group by ... orders generate a tmp table due to the
group by order does not match the table order.

https://dev.mysql.com/doc/refman/8.0/en/group-by-optimization.html

Solution:

Create an index in the table so the rows are ordered by the same
criteria as the group by clause. This way no tmp table is generated.

***

Problem:

The `select ... group by ...` orders generate a tmp table due to the
`group by` order does not match the table order.

https://dev.mysql.com/doc/refman/8.0/en/group-by-optimization.html

Solution:

Create an index in the table so the rows are ordered by the same
criteria as the `group by` clause. This way no tmp table is generated.
@inikep
Copy link
Owner

inikep commented Mar 15, 2022

@inikep inikep closed this Mar 15, 2022
inikep pushed a commit that referenced this pull request Mar 30, 2022
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990
inikep pushed a commit that referenced this pull request Mar 30, 2022
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990
inikep pushed a commit that referenced this pull request Mar 30, 2022
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990
inikep pushed a commit that referenced this pull request Mar 31, 2022
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990
inikep pushed a commit that referenced this pull request Mar 31, 2022
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990
inikep added a commit that referenced this pull request Mar 31, 2022
recover raft logs by removing partial trxs

Summary:
Port D24628821

mysqld removes partial trxs in the tail of trx log (named binary-logs on
primaries and apply-logs on secondaries) during startup. However, relay logs
were not of much importance since it was anyways discarded and a new one would
be created.
However, with raft, this is not ideal. Relay logs are raft logs on secondaries
and have to be kept around (and kept sane and consistent). This diff adds the
ability to remove partial trxs from raft/relay logs.
Much of the code to open the last relay log (based on relay log index) and
identify partial trxs is borrowed from existing logic in
MYSQL_BIN_LOG::open_binlog() and binlog_recover()

Reviewed By: Pushapgl

Differential Revision: D26447448

---------------------------------------------------------------------

Checking for inited bool to make sure global_init_info was successful

Summary:
Port D25584004

A master.info and relay.info file can be present
but needs to be properly inited for use. We were bypassing the inited
check which could lead to issues in Raft.
In case there is an error in global_init_info, Raft will do a
raft_reset_slave and make another attempt at it. If both recourses
fail, the init of the plugin would fail.

Reviewed By: Pushapgl

Differential Revision: D26447457

---------------------------------------------------------------------

Support for dumping raft logs to vanilla async replicas

Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

---------------------------------------------------------------------

Fixed the flaky raft test suite

Summary:
First clean run of entire raft test suite :)

**Changes**
* Reset apply_logs on raft secondaries before the start of every test
* Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit.

Reviewed By: luqun

Differential Revision: D26651257

---------------------------------------------------------------------

return error from Raft_replication_delegate when plugin is not available

Summary:
Port D23065441 (facebook@b9067f7)

The new macro is used to call into raft plugin. If plugin gets unloaded
accidentally when enable_raft_plugin is ON, then this STRICT version returns
failure. This is to be called only by raft plugin currently

Reviewed By: Pushapgl

Differential Revision: D26447523

---------------------------------------------------------------------

Adding timestamps for raft rotates which happen in the context of listener thread

Summary:
Port D25572614

The timestamp of a binlog event is picked up from the when field in
the event. In most cases of rotation, the when is left unpopulated
during rotation for the top 3 events (fd, pgtid, metadata). However
in such a situation, a normal rotate (flush binary logs) still manages
to get a valid timestamp, since the thread in which the flush binary
logs happens has a valid start time.
Now enter Raft relay log rotations. In those cases and in the case
of config change rotate, the rotations are happening in the context
of a raft listener queue thread. In that context, the when and start
time of the thread are both 0. The diff handles this case by populating
the when field appropriately.

Reviewed By: bhatvinay

Differential Revision: D26194612

---------------------------------------------------------------------

Raft abrupt stepdown and trim binlog file / gtid test

Summary: binlog file should get trimmed for abrupt stepdown

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D26169975

---------------------------------------------------------------------

Port: Fixes around raft log truncation.

Summary:
**Notes**
* New functions to block and unblock dump threads for plugin to use
during raft log truncation.

Below check is already done in raft plugin as part of raft plugin in D26866429.
* Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of
just seeking to the beginning to handle the case when raft log is
truncated before starting the applier

Reviewed By: luqun

Differential Revision: D26759813

---------------------------------------------------------------------

rebase due to relay log Format_description_event event size difference

Summary:
WL#3549: Binlog Compression add extra 1 bytes data into Format_description_event

  3298 @@ -134,6 +137,7 @@ Format_description_event::Format_description_event(uint8_t binlog_ver,
  3299            VIEW_CHANGE_HEADER_LEN,
  3300            XA_PREPARE_HEADER_LEN,
  3301            ROWS_HEADER_LEN_V2,
  3302 +          TRANSACTION_PAYLOAD_EVENT,
  3303        };

Reviewed By: Pushapgl

Differential Revision: D27145095

---------------------------------------------------------------------

fix raft change string metadata event

Summary:
Saw a bunch stage-1 replicaset instance failed due to config change string failure during add/remove instance
```
I0401 00:16:10.894434 1823842 IoCacheUtils.cpp:333] getConfigChangeString eventType = ^G
E0401 00:16:10.894451 1823842 IoCacheUtils.cpp:363] Failed to read metadata event body from cache
E0401 00:16:10.894456 1823842 MysqlRaft.cpp:659] [replicate] Failed to get config change string from iocache
2021-04-01T00:16:10.894464-07:00 8307 [ERROR] [MY-010207] [Repl] Run function 'before_flush' in plugin 'RPL_RAFT' failed
2021-04-01T00:16:10.894478-07:00 8307 [ERROR] [MY-000000] [Server] Failed to rotate binary log
```

After some investigation, the issue is caused is that calculate metadata event length with config change string but forgot write config change string into event body.

Reviewed By: Pushapgl

Differential Revision: D27504157

---------------------------------------------------------------------

Tells raft mode in binlog

Summary:
WIn-Win: Tell whether this binlog is generated while this host is in raft mode.

We already has the bit in FD event, and this diff just speaks it out.

Reviewed By: mpercy

Differential Revision: D28664300

---------------------------------------------------------------------

fix flaky raft testcase

Summary:
There are multiple issues for MTR:
1. in sql/rpl_binlog_sender.cc, if secondaries IO thread receives fatal_error,
   it will quit IO thread instead of reconnect. use unknown error so that
   secondary IO thread can try to reconnect
2. mtr.add_suppression() call: mtr.add_suppression() will execute an insert
   mysql statement into mtr.test_suppressions table and mtr.test_suppressions
   table doesn't contain primary key, thus during idempotent recovery, secondary
   will fail to execute mtr.add_suppression() due to missing PK. Try to move all
   mtr.add_suppression() at the end of testcase to workaround  idempotent recovery failure.
3. When promotion, use raft_promote_to_leader.inc instead of `set rpl_raft_new_leader_uuid`,
   since raft_promote_to_leader will wait the new primary state becomes writeable
4. pass specific warning instead of '.*' to mtr.add_supression
5. etc

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D28774820

---------------------------------------------------------------------

Using locks in Dump_log methods only when raft is enabled

Summary:
We don't need to take locks in non-raft mode since the
underlying MYSQL_BIN_LOG obj will never change. To handle race between
updation of enable_raft_plugin var and using its values in Dump_log we
kill and block all dump threads while updating the var and unblock them
once the var is updated.

Reviewed By: bhatvinay

Differential Revision: D28905522

---------------------------------------------------------------------

leader election to be a sync point in the new leader

Summary:
Port of D27582002 (facebook@39c70ca) from 5.6.35 to 8.0

Newly elected raft leader makes sure that all trxs from the previous
leader is committed by sql appliers. It then switches the server's trx
logs from apply-log-* to binary-log-*. To other part of the system this
looks like a rotation, but the necessary sync calls are not made here.
So, if the server (or os) restarts, then the storage engine could lose
the commit markers of the last batch of trxs. This will result in silent
data drift.
This diff fixes the problem by making an explicit call to
ha_flush_logs() before switching the server's trx logs

Reviewed By: luqun

Differential Revision: D28880407

---------------------------------------------------------------------

Error out SQL thread on out of order opids in raft logs

Summary:
OpIds should always be in order in the raft logs. Added a check
in SQL threads that thows an error and stops the applier when out of
order opids are detected.

Reviewed By: li-chi

Differential Revision: D28810840

---------------------------------------------------------------------

add GET_COMMITTED_GTIDS for raft

Summary:
During recovery Raft Log::Init needs to check with server
what gtids have been committed. Before doing that it finds the entire
set of trxs in the last raft log. The difference between logged -
committed are the pending opids.

Reviewed By: Pushapgl

Differential Revision: D29622786

---------------------------------------------------------------------

raft: skip mts_recovery_groups during start slave

Summary:
During MySQL8+Raft DMP, some instance fail to switch to Leader or start slave

```
2021-06-24T17:56:38.627423-07:00 431 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name /data/mysql/3127/bls-unittestdb658.frc2-3305-mysql.replicaset.180021/binary-logs-3727.000033, event_master_log_pos 1129.
2021-06-24T17:56:38.627473-07:00 431 [ERROR] [MY-010575] [Repl] Error looking for file after /binlogs/binary-logs-3307.000120.
2021-06-24T17:56:38.627516-07:00 431 [ERROR] [MY-000000] [Repl] load_mi_and_rli_from_repositories: rli_init_info returned error
```

similar to 5.6, we don't need to run mts_recovery_groups due to GTID_MODE is always enabled.

Reviewed By: Pushapgl

Differential Revision: D29520066

---------------------------------------------------------------------

update RaftListenerCallbackArg struct

Summary: Due to contract between raft plugin and mysql change, update RaftListenerCallbackArg struct to add  master_uuid field

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D29981349

---------------------------------------------------------------------

always check mi during raft_reset_slave

Summary: add similar nullptr check for raft_reset_slave as raft_stop_sql_thread

Reviewed By: bhatvinay

Differential Revision: D29967915

---------------------------------------------------------------------

raft mtr: update rpl_end_raft.inc for disable-raft

Summary:
rpl_end_raft.inc will unregister raft plugin which will call reset slaves when online disable-raft is enabled. move rpl_end_raft.inc after rpl_stop_slaves.inc to stop slave correctly first.

after stop slaves, call mtr.add_suppression("") won't replicate to slaves. just call mtr.add_suppression("") for all instances(replicas).

Reviewed By: yizhang82

Differential Revision: D30062236

---------------------------------------------------------------------

fix master_uuid

Summary: fix master_uuid in raft mode.

Reviewed By: luqun

Differential Revision: D30261465

---------------------------------------------------------------------

incorrect File_size value in show raft logs result

Summary:
in show raft logs result, the File_size for latest logs file isn't updated correctly.

such as show raft logs;

```
+-------------------------+------------
| Log_name                | File_size  |
| binary-logs-3304.000670 |       1529 |
```

in fact
```
-rw-r----- 1 mysql backup 1669180487 Aug  4 14:49 binary-logs-3304.000670
-rw-r----- 1 mysql backup 1723994154 Aug  4 14:49 binary-logs-3304.000670
```
the  file_size from show raft logs is always 1529 but the real file_size is 1669180487.

The issue is related when writing IO_CACHE directly, its wrapper ostream's position isn't updated.

Reviewed By: yizhang82

Differential Revision: D30116345

---------------------------------------------------------------------

Add gtid_purged_for_tailing

Summary:
Add a new global variable gtid_purged_for_tailing.

It shows the purged GTID for binlog in non-raft mode, and purged GTID for raft log in raft mode.

Reviewed By: abhinav04sharma

Differential Revision: D29372776

---------------------------------------------------------------------

disable skip_setup_replica_if_unnecessary

Summary: After sync with latest official raft plugin, most of MTR failed due to skip_setup_replica_if_unnecessary optimize.

Reviewed By: yizhang82

Differential Revision: D30821648

---------------------------------------------------------------------

latency histograms for raft trx wait

Summary: Port of support added for the same in mysql 5.6. Should help monitor latencies in 8.0 + raft stage -1 replicasets.

Reviewed By: luqun

Differential Revision: D31064764

---------------------------------------------------------------------

handle cases where clean shutdown in raft aborts trxs

Summary: This is a port of the feature in 5.6.

Reviewed By: anirbanr-fb

Differential Revision: D31070593

---------------------------------------------------------------------

fix crash during binlog purging

Summary:
Any error returned by the plugin during binlog purging results in a
crash in mysql8 as the server tries to execute
binlog_error_action_abort. We need to differentiate explicitly between a
plugin error and other error (such as error related to doing disk IO
etc). In thsi particular case, the crash is because of trying to purge a
file that does not exist (i.e which is already purged previosuly) and
raft cannot find it in its index chunk (so it returns a error).

Reviewed By: anirbanr-fb

Differential Revision: D31149997

---------------------------------------------------------------------

update flaky rpl_raft_purged_gtids_dump_threads

Summary:
rpl_raft_purged_gtids_dump_threads MTR failed due to "Cannot replicate because the master purged required binary logs" after server4 will tail server2.

Try to sync server4 before switch to tail server2.

Reviewed By: bhatvinay

Differential Revision: D30818078

---------------------------------------------------------------------

fix various porting issues in mysql8 (raft) crash recovery

Summary:
1. Trimming of the binlog is left to raft plugin (based on the current
    leader's log). Server should skip this step as part of recovery. This
    essentially means setting 'valid_pos' to the last successfully parsed
    trx in the trx log (instead of the engine's view of the trx log) in
    MYSQL_BIN_LOG::recover()
  2. executed_gtid_set should be initialized based on the engine's view of
the trx log file cordinates. So, during startup appropriate flags need
to be passed into MYSQL_BIN_LOG::init_gtid_sets(). init_gtid_sets() is
already changed to handle this, but the flag was not set correctly
during server startup
3. Another fix is in MYSQL_BIN_LOG::init_gtid_sets()to corretly set the position
to read and calculate executed-gtid-set (based on the file name read from the
engine)

Reviewed By: anirbanr-fb

Differential Revision: D31284902

---------------------------------------------------------------------

show_raft_status should take LOCK_status

Summary:
This is a straight port of a 5.6 patch to 8.0

SHOW RAFT STATUS and SHOW GLOBAL STATUS go to the
same functions in the plugin and access shared data structures.
These functions return internal char * pointers to plugin global
variables. They need to be serialized with LOCK_status otherwise
it leads to race conditions.

Reviewed By: li-chi

Differential Revision: D31318340

---------------------------------------------------------------------

Disallowing reset master on raft replicasets

Summary:
This is a straight port of similar patch on 5.6 raft

reset master is inherently not raft compliant.
Its get rid of all binlogs and make an instance appear like
a fresh single instance database without consulting the ring.
We need to block it.

However under certain circumstances (in the future ) e.g. during
first time replicaset build, or when we can guarantee that instance
is single node raft ring, we can potentially do a reset master
followed by rebootstrap of raft. This can also be achieved by
disabling raft, reset master and then re-enabling raft, however
to keep open the door, I have left a force option and will thread
a mechanism from the plugin to call reset_master(force=true)

Reviewed By: li-chi

Differential Revision: D31299214

---------------------------------------------------------------------

Setting topology config in MTR

Summary:
Setting topology config in MTR so that feature that use
topology config can be tested easily. Setting rpl_raft_skip_smc_updates
to avoid unnecessary calls to SMC (even though we supply a dummy
replicaset name).

Reviewed By: li-chi

Differential Revision: D31543877

---------------------------------------------------------------------

Do not allow sql thread start when raft is doing a stop->{}->start transition

Summary:
This is a port of an existing patch made to 5.6 for Raft.

Raft will do a stop of the SQL thread during StopAllWrites.
Then it will repoint the binlog files and during that action, the
SQL threads have to remain stopped. We block it out in this diff by
keeping an atomic bool which can be checked from other functions.

This only applies to raft mode i.e. enable_raft_plugin = true.

Reviewed By: li-chi

Differential Revision: D31319499

---------------------------------------------------------------------

Handle printing of FD event generated by slave SQL thread

Summary:
Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries.

The original commit which added this is d048c0f (P173872135)

To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'.

Reviewed By: anirbanr-fb

Differential Revision: D26359417

---------------------------------------------------------------------

Add exponential backoff for smart restart

Summary:
RAFT Instance crash and Tx Logs filling up.

rocksdb can fill up txlogs.
we should stop restarting mysqld if we have restarted many times in a day

Reviewed By: anirbanr-fb

Differential Revision: D27372750

---------------------------------------------------------------------

always release data_lock mutex to avoid deadlock

Summary: in stage-1 replicaset, when kill a secondary instance, sometime the instance will run into deadlock due to process_raft_queue thread forgot to release its acquired mutex in raft_change_master

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D27602667

---------------------------------------------------------------------

Gracefully exit mysqld_safe loop during backoff

Summary:
Currentt systemctl stop [email protected] can take 10 mins when mysqld_safe is in backoff period.

D28517599 adds a interrupt to sleep in mysql_stop, and mysqld_safe immediately break the retry loop if sleep is interruptted.

Reviewed By: anirbanr-fb

Differential Revision: D28606439

---------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

---------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Apr 1, 2022
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990
inikep added a commit that referenced this pull request Apr 1, 2022
recover raft logs by removing partial trxs

Summary:
Port D24628821

mysqld removes partial trxs in the tail of trx log (named binary-logs on
primaries and apply-logs on secondaries) during startup. However, relay logs
were not of much importance since it was anyways discarded and a new one would
be created.
However, with raft, this is not ideal. Relay logs are raft logs on secondaries
and have to be kept around (and kept sane and consistent). This diff adds the
ability to remove partial trxs from raft/relay logs.
Much of the code to open the last relay log (based on relay log index) and
identify partial trxs is borrowed from existing logic in
MYSQL_BIN_LOG::open_binlog() and binlog_recover()

Reviewed By: Pushapgl

Differential Revision: D26447448

---------------------------------------------------------------------

Checking for inited bool to make sure global_init_info was successful

Summary:
Port D25584004

A master.info and relay.info file can be present
but needs to be properly inited for use. We were bypassing the inited
check which could lead to issues in Raft.
In case there is an error in global_init_info, Raft will do a
raft_reset_slave and make another attempt at it. If both recourses
fail, the init of the plugin would fail.

Reviewed By: Pushapgl

Differential Revision: D26447457

---------------------------------------------------------------------

Support for dumping raft logs to vanilla async replicas

Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

---------------------------------------------------------------------

Fixed the flaky raft test suite

Summary:
First clean run of entire raft test suite :)

**Changes**
* Reset apply_logs on raft secondaries before the start of every test
* Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit.

Reviewed By: luqun

Differential Revision: D26651257

---------------------------------------------------------------------

return error from Raft_replication_delegate when plugin is not available

Summary:
Port D23065441 (facebook@b9067f7)

The new macro is used to call into raft plugin. If plugin gets unloaded
accidentally when enable_raft_plugin is ON, then this STRICT version returns
failure. This is to be called only by raft plugin currently

Reviewed By: Pushapgl

Differential Revision: D26447523

---------------------------------------------------------------------

Adding timestamps for raft rotates which happen in the context of listener thread

Summary:
Port D25572614

The timestamp of a binlog event is picked up from the when field in
the event. In most cases of rotation, the when is left unpopulated
during rotation for the top 3 events (fd, pgtid, metadata). However
in such a situation, a normal rotate (flush binary logs) still manages
to get a valid timestamp, since the thread in which the flush binary
logs happens has a valid start time.
Now enter Raft relay log rotations. In those cases and in the case
of config change rotate, the rotations are happening in the context
of a raft listener queue thread. In that context, the when and start
time of the thread are both 0. The diff handles this case by populating
the when field appropriately.

Reviewed By: bhatvinay

Differential Revision: D26194612

---------------------------------------------------------------------

Raft abrupt stepdown and trim binlog file / gtid test

Summary: binlog file should get trimmed for abrupt stepdown

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D26169975

---------------------------------------------------------------------

Port: Fixes around raft log truncation.

Summary:
**Notes**
* New functions to block and unblock dump threads for plugin to use
during raft log truncation.

Below check is already done in raft plugin as part of raft plugin in D26866429.
* Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of
just seeking to the beginning to handle the case when raft log is
truncated before starting the applier

Reviewed By: luqun

Differential Revision: D26759813

---------------------------------------------------------------------

rebase due to relay log Format_description_event event size difference

Summary:
WL#3549: Binlog Compression add extra 1 bytes data into Format_description_event

  3298 @@ -134,6 +137,7 @@ Format_description_event::Format_description_event(uint8_t binlog_ver,
  3299            VIEW_CHANGE_HEADER_LEN,
  3300            XA_PREPARE_HEADER_LEN,
  3301            ROWS_HEADER_LEN_V2,
  3302 +          TRANSACTION_PAYLOAD_EVENT,
  3303        };

Reviewed By: Pushapgl

Differential Revision: D27145095

---------------------------------------------------------------------

fix raft change string metadata event

Summary:
Saw a bunch stage-1 replicaset instance failed due to config change string failure during add/remove instance
```
I0401 00:16:10.894434 1823842 IoCacheUtils.cpp:333] getConfigChangeString eventType = ^G
E0401 00:16:10.894451 1823842 IoCacheUtils.cpp:363] Failed to read metadata event body from cache
E0401 00:16:10.894456 1823842 MysqlRaft.cpp:659] [replicate] Failed to get config change string from iocache
2021-04-01T00:16:10.894464-07:00 8307 [ERROR] [MY-010207] [Repl] Run function 'before_flush' in plugin 'RPL_RAFT' failed
2021-04-01T00:16:10.894478-07:00 8307 [ERROR] [MY-000000] [Server] Failed to rotate binary log
```

After some investigation, the issue is caused is that calculate metadata event length with config change string but forgot write config change string into event body.

Reviewed By: Pushapgl

Differential Revision: D27504157

---------------------------------------------------------------------

Tells raft mode in binlog

Summary:
WIn-Win: Tell whether this binlog is generated while this host is in raft mode.

We already has the bit in FD event, and this diff just speaks it out.

Reviewed By: mpercy

Differential Revision: D28664300

---------------------------------------------------------------------

fix flaky raft testcase

Summary:
There are multiple issues for MTR:
1. in sql/rpl_binlog_sender.cc, if secondaries IO thread receives fatal_error,
   it will quit IO thread instead of reconnect. use unknown error so that
   secondary IO thread can try to reconnect
2. mtr.add_suppression() call: mtr.add_suppression() will execute an insert
   mysql statement into mtr.test_suppressions table and mtr.test_suppressions
   table doesn't contain primary key, thus during idempotent recovery, secondary
   will fail to execute mtr.add_suppression() due to missing PK. Try to move all
   mtr.add_suppression() at the end of testcase to workaround  idempotent recovery failure.
3. When promotion, use raft_promote_to_leader.inc instead of `set rpl_raft_new_leader_uuid`,
   since raft_promote_to_leader will wait the new primary state becomes writeable
4. pass specific warning instead of '.*' to mtr.add_supression
5. etc

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D28774820

---------------------------------------------------------------------

Using locks in Dump_log methods only when raft is enabled

Summary:
We don't need to take locks in non-raft mode since the
underlying MYSQL_BIN_LOG obj will never change. To handle race between
updation of enable_raft_plugin var and using its values in Dump_log we
kill and block all dump threads while updating the var and unblock them
once the var is updated.

Reviewed By: bhatvinay

Differential Revision: D28905522

---------------------------------------------------------------------

leader election to be a sync point in the new leader

Summary:
Port of D27582002 (facebook@39c70ca) from 5.6.35 to 8.0

Newly elected raft leader makes sure that all trxs from the previous
leader is committed by sql appliers. It then switches the server's trx
logs from apply-log-* to binary-log-*. To other part of the system this
looks like a rotation, but the necessary sync calls are not made here.
So, if the server (or os) restarts, then the storage engine could lose
the commit markers of the last batch of trxs. This will result in silent
data drift.
This diff fixes the problem by making an explicit call to
ha_flush_logs() before switching the server's trx logs

Reviewed By: luqun

Differential Revision: D28880407

---------------------------------------------------------------------

Error out SQL thread on out of order opids in raft logs

Summary:
OpIds should always be in order in the raft logs. Added a check
in SQL threads that thows an error and stops the applier when out of
order opids are detected.

Reviewed By: li-chi

Differential Revision: D28810840

---------------------------------------------------------------------

add GET_COMMITTED_GTIDS for raft

Summary:
During recovery Raft Log::Init needs to check with server
what gtids have been committed. Before doing that it finds the entire
set of trxs in the last raft log. The difference between logged -
committed are the pending opids.

Reviewed By: Pushapgl

Differential Revision: D29622786

---------------------------------------------------------------------

raft: skip mts_recovery_groups during start slave

Summary:
During MySQL8+Raft DMP, some instance fail to switch to Leader or start slave

```
2021-06-24T17:56:38.627423-07:00 431 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name /data/mysql/3127/bls-unittestdb658.frc2-3305-mysql.replicaset.180021/binary-logs-3727.000033, event_master_log_pos 1129.
2021-06-24T17:56:38.627473-07:00 431 [ERROR] [MY-010575] [Repl] Error looking for file after /binlogs/binary-logs-3307.000120.
2021-06-24T17:56:38.627516-07:00 431 [ERROR] [MY-000000] [Repl] load_mi_and_rli_from_repositories: rli_init_info returned error
```

similar to 5.6, we don't need to run mts_recovery_groups due to GTID_MODE is always enabled.

Reviewed By: Pushapgl

Differential Revision: D29520066

---------------------------------------------------------------------

update RaftListenerCallbackArg struct

Summary: Due to contract between raft plugin and mysql change, update RaftListenerCallbackArg struct to add  master_uuid field

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D29981349

---------------------------------------------------------------------

always check mi during raft_reset_slave

Summary: add similar nullptr check for raft_reset_slave as raft_stop_sql_thread

Reviewed By: bhatvinay

Differential Revision: D29967915

---------------------------------------------------------------------

raft mtr: update rpl_end_raft.inc for disable-raft

Summary:
rpl_end_raft.inc will unregister raft plugin which will call reset slaves when online disable-raft is enabled. move rpl_end_raft.inc after rpl_stop_slaves.inc to stop slave correctly first.

after stop slaves, call mtr.add_suppression("") won't replicate to slaves. just call mtr.add_suppression("") for all instances(replicas).

Reviewed By: yizhang82

Differential Revision: D30062236

---------------------------------------------------------------------

fix master_uuid

Summary: fix master_uuid in raft mode.

Reviewed By: luqun

Differential Revision: D30261465

---------------------------------------------------------------------

incorrect File_size value in show raft logs result

Summary:
in show raft logs result, the File_size for latest logs file isn't updated correctly.

such as show raft logs;

```
+-------------------------+------------
| Log_name                | File_size  |
| binary-logs-3304.000670 |       1529 |
```

in fact
```
-rw-r----- 1 mysql backup 1669180487 Aug  4 14:49 binary-logs-3304.000670
-rw-r----- 1 mysql backup 1723994154 Aug  4 14:49 binary-logs-3304.000670
```
the  file_size from show raft logs is always 1529 but the real file_size is 1669180487.

The issue is related when writing IO_CACHE directly, its wrapper ostream's position isn't updated.

Reviewed By: yizhang82

Differential Revision: D30116345

---------------------------------------------------------------------

Add gtid_purged_for_tailing

Summary:
Add a new global variable gtid_purged_for_tailing.

It shows the purged GTID for binlog in non-raft mode, and purged GTID for raft log in raft mode.

Reviewed By: abhinav04sharma

Differential Revision: D29372776

---------------------------------------------------------------------

disable skip_setup_replica_if_unnecessary

Summary: After sync with latest official raft plugin, most of MTR failed due to skip_setup_replica_if_unnecessary optimize.

Reviewed By: yizhang82

Differential Revision: D30821648

---------------------------------------------------------------------

latency histograms for raft trx wait

Summary: Port of support added for the same in mysql 5.6. Should help monitor latencies in 8.0 + raft stage -1 replicasets.

Reviewed By: luqun

Differential Revision: D31064764

---------------------------------------------------------------------

handle cases where clean shutdown in raft aborts trxs

Summary: This is a port of the feature in 5.6.

Reviewed By: anirbanr-fb

Differential Revision: D31070593

---------------------------------------------------------------------

fix crash during binlog purging

Summary:
Any error returned by the plugin during binlog purging results in a
crash in mysql8 as the server tries to execute
binlog_error_action_abort. We need to differentiate explicitly between a
plugin error and other error (such as error related to doing disk IO
etc). In thsi particular case, the crash is because of trying to purge a
file that does not exist (i.e which is already purged previosuly) and
raft cannot find it in its index chunk (so it returns a error).

Reviewed By: anirbanr-fb

Differential Revision: D31149997

---------------------------------------------------------------------

update flaky rpl_raft_purged_gtids_dump_threads

Summary:
rpl_raft_purged_gtids_dump_threads MTR failed due to "Cannot replicate because the master purged required binary logs" after server4 will tail server2.

Try to sync server4 before switch to tail server2.

Reviewed By: bhatvinay

Differential Revision: D30818078

---------------------------------------------------------------------

fix various porting issues in mysql8 (raft) crash recovery

Summary:
1. Trimming of the binlog is left to raft plugin (based on the current
    leader's log). Server should skip this step as part of recovery. This
    essentially means setting 'valid_pos' to the last successfully parsed
    trx in the trx log (instead of the engine's view of the trx log) in
    MYSQL_BIN_LOG::recover()
  2. executed_gtid_set should be initialized based on the engine's view of
the trx log file cordinates. So, during startup appropriate flags need
to be passed into MYSQL_BIN_LOG::init_gtid_sets(). init_gtid_sets() is
already changed to handle this, but the flag was not set correctly
during server startup
3. Another fix is in MYSQL_BIN_LOG::init_gtid_sets()to corretly set the position
to read and calculate executed-gtid-set (based on the file name read from the
engine)

Reviewed By: anirbanr-fb

Differential Revision: D31284902

---------------------------------------------------------------------

show_raft_status should take LOCK_status

Summary:
This is a straight port of a 5.6 patch to 8.0

SHOW RAFT STATUS and SHOW GLOBAL STATUS go to the
same functions in the plugin and access shared data structures.
These functions return internal char * pointers to plugin global
variables. They need to be serialized with LOCK_status otherwise
it leads to race conditions.

Reviewed By: li-chi

Differential Revision: D31318340

---------------------------------------------------------------------

Disallowing reset master on raft replicasets

Summary:
This is a straight port of similar patch on 5.6 raft

reset master is inherently not raft compliant.
Its get rid of all binlogs and make an instance appear like
a fresh single instance database without consulting the ring.
We need to block it.

However under certain circumstances (in the future ) e.g. during
first time replicaset build, or when we can guarantee that instance
is single node raft ring, we can potentially do a reset master
followed by rebootstrap of raft. This can also be achieved by
disabling raft, reset master and then re-enabling raft, however
to keep open the door, I have left a force option and will thread
a mechanism from the plugin to call reset_master(force=true)

Reviewed By: li-chi

Differential Revision: D31299214

---------------------------------------------------------------------

Setting topology config in MTR

Summary:
Setting topology config in MTR so that feature that use
topology config can be tested easily. Setting rpl_raft_skip_smc_updates
to avoid unnecessary calls to SMC (even though we supply a dummy
replicaset name).

Reviewed By: li-chi

Differential Revision: D31543877

---------------------------------------------------------------------

Do not allow sql thread start when raft is doing a stop->{}->start transition

Summary:
This is a port of an existing patch made to 5.6 for Raft.

Raft will do a stop of the SQL thread during StopAllWrites.
Then it will repoint the binlog files and during that action, the
SQL threads have to remain stopped. We block it out in this diff by
keeping an atomic bool which can be checked from other functions.

This only applies to raft mode i.e. enable_raft_plugin = true.

Reviewed By: li-chi

Differential Revision: D31319499

---------------------------------------------------------------------

Handle printing of FD event generated by slave SQL thread

Summary:
Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries.

The original commit which added this is d048c0f (P173872135)

To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'.

Reviewed By: anirbanr-fb

Differential Revision: D26359417

---------------------------------------------------------------------

Add exponential backoff for smart restart

Summary:
RAFT Instance crash and Tx Logs filling up.

rocksdb can fill up txlogs.
we should stop restarting mysqld if we have restarted many times in a day

Reviewed By: anirbanr-fb

Differential Revision: D27372750

---------------------------------------------------------------------

always release data_lock mutex to avoid deadlock

Summary: in stage-1 replicaset, when kill a secondary instance, sometime the instance will run into deadlock due to process_raft_queue thread forgot to release its acquired mutex in raft_change_master

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D27602667

---------------------------------------------------------------------

Gracefully exit mysqld_safe loop during backoff

Summary:
Currentt systemctl stop [email protected] can take 10 mins when mysqld_safe is in backoff period.

D28517599 adds a interrupt to sleep in mysql_stop, and mysqld_safe immediately break the retry loop if sleep is interruptted.

Reviewed By: anirbanr-fb

Differential Revision: D28606439

---------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

---------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Apr 1, 2022
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990
inikep added a commit that referenced this pull request Apr 1, 2022
recover raft logs by removing partial trxs

Summary:
Port D24628821

mysqld removes partial trxs in the tail of trx log (named binary-logs on
primaries and apply-logs on secondaries) during startup. However, relay logs
were not of much importance since it was anyways discarded and a new one would
be created.
However, with raft, this is not ideal. Relay logs are raft logs on secondaries
and have to be kept around (and kept sane and consistent). This diff adds the
ability to remove partial trxs from raft/relay logs.
Much of the code to open the last relay log (based on relay log index) and
identify partial trxs is borrowed from existing logic in
MYSQL_BIN_LOG::open_binlog() and binlog_recover()

Reviewed By: Pushapgl

Differential Revision: D26447448

---------------------------------------------------------------------

Checking for inited bool to make sure global_init_info was successful

Summary:
Port D25584004

A master.info and relay.info file can be present
but needs to be properly inited for use. We were bypassing the inited
check which could lead to issues in Raft.
In case there is an error in global_init_info, Raft will do a
raft_reset_slave and make another attempt at it. If both recourses
fail, the init of the plugin would fail.

Reviewed By: Pushapgl

Differential Revision: D26447457

---------------------------------------------------------------------

Support for dumping raft logs to vanilla async replicas

Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

---------------------------------------------------------------------

Fixed the flaky raft test suite

Summary:
First clean run of entire raft test suite :)

**Changes**
* Reset apply_logs on raft secondaries before the start of every test
* Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit.

Reviewed By: luqun

Differential Revision: D26651257

---------------------------------------------------------------------

return error from Raft_replication_delegate when plugin is not available

Summary:
Port D23065441 (facebook@b9067f7)

The new macro is used to call into raft plugin. If plugin gets unloaded
accidentally when enable_raft_plugin is ON, then this STRICT version returns
failure. This is to be called only by raft plugin currently

Reviewed By: Pushapgl

Differential Revision: D26447523

---------------------------------------------------------------------

Adding timestamps for raft rotates which happen in the context of listener thread

Summary:
Port D25572614

The timestamp of a binlog event is picked up from the when field in
the event. In most cases of rotation, the when is left unpopulated
during rotation for the top 3 events (fd, pgtid, metadata). However
in such a situation, a normal rotate (flush binary logs) still manages
to get a valid timestamp, since the thread in which the flush binary
logs happens has a valid start time.
Now enter Raft relay log rotations. In those cases and in the case
of config change rotate, the rotations are happening in the context
of a raft listener queue thread. In that context, the when and start
time of the thread are both 0. The diff handles this case by populating
the when field appropriately.

Reviewed By: bhatvinay

Differential Revision: D26194612

---------------------------------------------------------------------

Raft abrupt stepdown and trim binlog file / gtid test

Summary: binlog file should get trimmed for abrupt stepdown

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D26169975

---------------------------------------------------------------------

Port: Fixes around raft log truncation.

Summary:
**Notes**
* New functions to block and unblock dump threads for plugin to use
during raft log truncation.

Below check is already done in raft plugin as part of raft plugin in D26866429.
* Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of
just seeking to the beginning to handle the case when raft log is
truncated before starting the applier

Reviewed By: luqun

Differential Revision: D26759813

---------------------------------------------------------------------

rebase due to relay log Format_description_event event size difference

Summary:
WL#3549: Binlog Compression add extra 1 bytes data into Format_description_event

  3298 @@ -134,6 +137,7 @@ Format_description_event::Format_description_event(uint8_t binlog_ver,
  3299            VIEW_CHANGE_HEADER_LEN,
  3300            XA_PREPARE_HEADER_LEN,
  3301            ROWS_HEADER_LEN_V2,
  3302 +          TRANSACTION_PAYLOAD_EVENT,
  3303        };

Reviewed By: Pushapgl

Differential Revision: D27145095

---------------------------------------------------------------------

fix raft change string metadata event

Summary:
Saw a bunch stage-1 replicaset instance failed due to config change string failure during add/remove instance
```
I0401 00:16:10.894434 1823842 IoCacheUtils.cpp:333] getConfigChangeString eventType = ^G
E0401 00:16:10.894451 1823842 IoCacheUtils.cpp:363] Failed to read metadata event body from cache
E0401 00:16:10.894456 1823842 MysqlRaft.cpp:659] [replicate] Failed to get config change string from iocache
2021-04-01T00:16:10.894464-07:00 8307 [ERROR] [MY-010207] [Repl] Run function 'before_flush' in plugin 'RPL_RAFT' failed
2021-04-01T00:16:10.894478-07:00 8307 [ERROR] [MY-000000] [Server] Failed to rotate binary log
```

After some investigation, the issue is caused is that calculate metadata event length with config change string but forgot write config change string into event body.

Reviewed By: Pushapgl

Differential Revision: D27504157

---------------------------------------------------------------------

Tells raft mode in binlog

Summary:
WIn-Win: Tell whether this binlog is generated while this host is in raft mode.

We already has the bit in FD event, and this diff just speaks it out.

Reviewed By: mpercy

Differential Revision: D28664300

---------------------------------------------------------------------

fix flaky raft testcase

Summary:
There are multiple issues for MTR:
1. in sql/rpl_binlog_sender.cc, if secondaries IO thread receives fatal_error,
   it will quit IO thread instead of reconnect. use unknown error so that
   secondary IO thread can try to reconnect
2. mtr.add_suppression() call: mtr.add_suppression() will execute an insert
   mysql statement into mtr.test_suppressions table and mtr.test_suppressions
   table doesn't contain primary key, thus during idempotent recovery, secondary
   will fail to execute mtr.add_suppression() due to missing PK. Try to move all
   mtr.add_suppression() at the end of testcase to workaround  idempotent recovery failure.
3. When promotion, use raft_promote_to_leader.inc instead of `set rpl_raft_new_leader_uuid`,
   since raft_promote_to_leader will wait the new primary state becomes writeable
4. pass specific warning instead of '.*' to mtr.add_supression
5. etc

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D28774820

---------------------------------------------------------------------

Using locks in Dump_log methods only when raft is enabled

Summary:
We don't need to take locks in non-raft mode since the
underlying MYSQL_BIN_LOG obj will never change. To handle race between
updation of enable_raft_plugin var and using its values in Dump_log we
kill and block all dump threads while updating the var and unblock them
once the var is updated.

Reviewed By: bhatvinay

Differential Revision: D28905522

---------------------------------------------------------------------

leader election to be a sync point in the new leader

Summary:
Port of D27582002 (facebook@39c70ca) from 5.6.35 to 8.0

Newly elected raft leader makes sure that all trxs from the previous
leader is committed by sql appliers. It then switches the server's trx
logs from apply-log-* to binary-log-*. To other part of the system this
looks like a rotation, but the necessary sync calls are not made here.
So, if the server (or os) restarts, then the storage engine could lose
the commit markers of the last batch of trxs. This will result in silent
data drift.
This diff fixes the problem by making an explicit call to
ha_flush_logs() before switching the server's trx logs

Reviewed By: luqun

Differential Revision: D28880407

---------------------------------------------------------------------

Error out SQL thread on out of order opids in raft logs

Summary:
OpIds should always be in order in the raft logs. Added a check
in SQL threads that thows an error and stops the applier when out of
order opids are detected.

Reviewed By: li-chi

Differential Revision: D28810840

---------------------------------------------------------------------

add GET_COMMITTED_GTIDS for raft

Summary:
During recovery Raft Log::Init needs to check with server
what gtids have been committed. Before doing that it finds the entire
set of trxs in the last raft log. The difference between logged -
committed are the pending opids.

Reviewed By: Pushapgl

Differential Revision: D29622786

---------------------------------------------------------------------

raft: skip mts_recovery_groups during start slave

Summary:
During MySQL8+Raft DMP, some instance fail to switch to Leader or start slave

```
2021-06-24T17:56:38.627423-07:00 431 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name /data/mysql/3127/bls-unittestdb658.frc2-3305-mysql.replicaset.180021/binary-logs-3727.000033, event_master_log_pos 1129.
2021-06-24T17:56:38.627473-07:00 431 [ERROR] [MY-010575] [Repl] Error looking for file after /binlogs/binary-logs-3307.000120.
2021-06-24T17:56:38.627516-07:00 431 [ERROR] [MY-000000] [Repl] load_mi_and_rli_from_repositories: rli_init_info returned error
```

similar to 5.6, we don't need to run mts_recovery_groups due to GTID_MODE is always enabled.

Reviewed By: Pushapgl

Differential Revision: D29520066

---------------------------------------------------------------------

update RaftListenerCallbackArg struct

Summary: Due to contract between raft plugin and mysql change, update RaftListenerCallbackArg struct to add  master_uuid field

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D29981349

---------------------------------------------------------------------

always check mi during raft_reset_slave

Summary: add similar nullptr check for raft_reset_slave as raft_stop_sql_thread

Reviewed By: bhatvinay

Differential Revision: D29967915

---------------------------------------------------------------------

raft mtr: update rpl_end_raft.inc for disable-raft

Summary:
rpl_end_raft.inc will unregister raft plugin which will call reset slaves when online disable-raft is enabled. move rpl_end_raft.inc after rpl_stop_slaves.inc to stop slave correctly first.

after stop slaves, call mtr.add_suppression("") won't replicate to slaves. just call mtr.add_suppression("") for all instances(replicas).

Reviewed By: yizhang82

Differential Revision: D30062236

---------------------------------------------------------------------

fix master_uuid

Summary: fix master_uuid in raft mode.

Reviewed By: luqun

Differential Revision: D30261465

---------------------------------------------------------------------

incorrect File_size value in show raft logs result

Summary:
in show raft logs result, the File_size for latest logs file isn't updated correctly.

such as show raft logs;

```
+-------------------------+------------
| Log_name                | File_size  |
| binary-logs-3304.000670 |       1529 |
```

in fact
```
-rw-r----- 1 mysql backup 1669180487 Aug  4 14:49 binary-logs-3304.000670
-rw-r----- 1 mysql backup 1723994154 Aug  4 14:49 binary-logs-3304.000670
```
the  file_size from show raft logs is always 1529 but the real file_size is 1669180487.

The issue is related when writing IO_CACHE directly, its wrapper ostream's position isn't updated.

Reviewed By: yizhang82

Differential Revision: D30116345

---------------------------------------------------------------------

Add gtid_purged_for_tailing

Summary:
Add a new global variable gtid_purged_for_tailing.

It shows the purged GTID for binlog in non-raft mode, and purged GTID for raft log in raft mode.

Reviewed By: abhinav04sharma

Differential Revision: D29372776

---------------------------------------------------------------------

disable skip_setup_replica_if_unnecessary

Summary: After sync with latest official raft plugin, most of MTR failed due to skip_setup_replica_if_unnecessary optimize.

Reviewed By: yizhang82

Differential Revision: D30821648

---------------------------------------------------------------------

latency histograms for raft trx wait

Summary: Port of support added for the same in mysql 5.6. Should help monitor latencies in 8.0 + raft stage -1 replicasets.

Reviewed By: luqun

Differential Revision: D31064764

---------------------------------------------------------------------

handle cases where clean shutdown in raft aborts trxs

Summary: This is a port of the feature in 5.6.

Reviewed By: anirbanr-fb

Differential Revision: D31070593

---------------------------------------------------------------------

fix crash during binlog purging

Summary:
Any error returned by the plugin during binlog purging results in a
crash in mysql8 as the server tries to execute
binlog_error_action_abort. We need to differentiate explicitly between a
plugin error and other error (such as error related to doing disk IO
etc). In thsi particular case, the crash is because of trying to purge a
file that does not exist (i.e which is already purged previosuly) and
raft cannot find it in its index chunk (so it returns a error).

Reviewed By: anirbanr-fb

Differential Revision: D31149997

---------------------------------------------------------------------

update flaky rpl_raft_purged_gtids_dump_threads

Summary:
rpl_raft_purged_gtids_dump_threads MTR failed due to "Cannot replicate because the master purged required binary logs" after server4 will tail server2.

Try to sync server4 before switch to tail server2.

Reviewed By: bhatvinay

Differential Revision: D30818078

---------------------------------------------------------------------

fix various porting issues in mysql8 (raft) crash recovery

Summary:
1. Trimming of the binlog is left to raft plugin (based on the current
    leader's log). Server should skip this step as part of recovery. This
    essentially means setting 'valid_pos' to the last successfully parsed
    trx in the trx log (instead of the engine's view of the trx log) in
    MYSQL_BIN_LOG::recover()
  2. executed_gtid_set should be initialized based on the engine's view of
the trx log file cordinates. So, during startup appropriate flags need
to be passed into MYSQL_BIN_LOG::init_gtid_sets(). init_gtid_sets() is
already changed to handle this, but the flag was not set correctly
during server startup
3. Another fix is in MYSQL_BIN_LOG::init_gtid_sets()to corretly set the position
to read and calculate executed-gtid-set (based on the file name read from the
engine)

Reviewed By: anirbanr-fb

Differential Revision: D31284902

---------------------------------------------------------------------

show_raft_status should take LOCK_status

Summary:
This is a straight port of a 5.6 patch to 8.0

SHOW RAFT STATUS and SHOW GLOBAL STATUS go to the
same functions in the plugin and access shared data structures.
These functions return internal char * pointers to plugin global
variables. They need to be serialized with LOCK_status otherwise
it leads to race conditions.

Reviewed By: li-chi

Differential Revision: D31318340

---------------------------------------------------------------------

Disallowing reset master on raft replicasets

Summary:
This is a straight port of similar patch on 5.6 raft

reset master is inherently not raft compliant.
Its get rid of all binlogs and make an instance appear like
a fresh single instance database without consulting the ring.
We need to block it.

However under certain circumstances (in the future ) e.g. during
first time replicaset build, or when we can guarantee that instance
is single node raft ring, we can potentially do a reset master
followed by rebootstrap of raft. This can also be achieved by
disabling raft, reset master and then re-enabling raft, however
to keep open the door, I have left a force option and will thread
a mechanism from the plugin to call reset_master(force=true)

Reviewed By: li-chi

Differential Revision: D31299214

---------------------------------------------------------------------

Setting topology config in MTR

Summary:
Setting topology config in MTR so that feature that use
topology config can be tested easily. Setting rpl_raft_skip_smc_updates
to avoid unnecessary calls to SMC (even though we supply a dummy
replicaset name).

Reviewed By: li-chi

Differential Revision: D31543877

---------------------------------------------------------------------

Do not allow sql thread start when raft is doing a stop->{}->start transition

Summary:
This is a port of an existing patch made to 5.6 for Raft.

Raft will do a stop of the SQL thread during StopAllWrites.
Then it will repoint the binlog files and during that action, the
SQL threads have to remain stopped. We block it out in this diff by
keeping an atomic bool which can be checked from other functions.

This only applies to raft mode i.e. enable_raft_plugin = true.

Reviewed By: li-chi

Differential Revision: D31319499

---------------------------------------------------------------------

Handle printing of FD event generated by slave SQL thread

Summary:
Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries.

The original commit which added this is d048c0f (P173872135)

To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'.

Reviewed By: anirbanr-fb

Differential Revision: D26359417

---------------------------------------------------------------------

Add exponential backoff for smart restart

Summary:
RAFT Instance crash and Tx Logs filling up.

rocksdb can fill up txlogs.
we should stop restarting mysqld if we have restarted many times in a day

Reviewed By: anirbanr-fb

Differential Revision: D27372750

---------------------------------------------------------------------

always release data_lock mutex to avoid deadlock

Summary: in stage-1 replicaset, when kill a secondary instance, sometime the instance will run into deadlock due to process_raft_queue thread forgot to release its acquired mutex in raft_change_master

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D27602667

---------------------------------------------------------------------

Gracefully exit mysqld_safe loop during backoff

Summary:
Currentt systemctl stop [email protected] can take 10 mins when mysqld_safe is in backoff period.

D28517599 adds a interrupt to sleep in mysql_stop, and mysqld_safe immediately break the retry loop if sleep is interruptted.

Reviewed By: anirbanr-fb

Differential Revision: D28606439

---------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

---------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Apr 4, 2022
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990
inikep added a commit that referenced this pull request Apr 4, 2022
recover raft logs by removing partial trxs

Summary:
Port D24628821

mysqld removes partial trxs in the tail of trx log (named binary-logs on
primaries and apply-logs on secondaries) during startup. However, relay logs
were not of much importance since it was anyways discarded and a new one would
be created.
However, with raft, this is not ideal. Relay logs are raft logs on secondaries
and have to be kept around (and kept sane and consistent). This diff adds the
ability to remove partial trxs from raft/relay logs.
Much of the code to open the last relay log (based on relay log index) and
identify partial trxs is borrowed from existing logic in
MYSQL_BIN_LOG::open_binlog() and binlog_recover()

Reviewed By: Pushapgl

Differential Revision: D26447448

---------------------------------------------------------------------

Checking for inited bool to make sure global_init_info was successful

Summary:
Port D25584004

A master.info and relay.info file can be present
but needs to be properly inited for use. We were bypassing the inited
check which could lead to issues in Raft.
In case there is an error in global_init_info, Raft will do a
raft_reset_slave and make another attempt at it. If both recourses
fail, the init of the plugin would fail.

Reviewed By: Pushapgl

Differential Revision: D26447457

---------------------------------------------------------------------

Support for dumping raft logs to vanilla async replicas

Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

---------------------------------------------------------------------

Fixed the flaky raft test suite

Summary:
First clean run of entire raft test suite :)

**Changes**
* Reset apply_logs on raft secondaries before the start of every test
* Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit.

Reviewed By: luqun

Differential Revision: D26651257

---------------------------------------------------------------------

return error from Raft_replication_delegate when plugin is not available

Summary:
Port D23065441 (facebook@b9067f7)

The new macro is used to call into raft plugin. If plugin gets unloaded
accidentally when enable_raft_plugin is ON, then this STRICT version returns
failure. This is to be called only by raft plugin currently

Reviewed By: Pushapgl

Differential Revision: D26447523

---------------------------------------------------------------------

Adding timestamps for raft rotates which happen in the context of listener thread

Summary:
Port D25572614

The timestamp of a binlog event is picked up from the when field in
the event. In most cases of rotation, the when is left unpopulated
during rotation for the top 3 events (fd, pgtid, metadata). However
in such a situation, a normal rotate (flush binary logs) still manages
to get a valid timestamp, since the thread in which the flush binary
logs happens has a valid start time.
Now enter Raft relay log rotations. In those cases and in the case
of config change rotate, the rotations are happening in the context
of a raft listener queue thread. In that context, the when and start
time of the thread are both 0. The diff handles this case by populating
the when field appropriately.

Reviewed By: bhatvinay

Differential Revision: D26194612

---------------------------------------------------------------------

Raft abrupt stepdown and trim binlog file / gtid test

Summary: binlog file should get trimmed for abrupt stepdown

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D26169975

---------------------------------------------------------------------

Port: Fixes around raft log truncation.

Summary:
**Notes**
* New functions to block and unblock dump threads for plugin to use
during raft log truncation.

Below check is already done in raft plugin as part of raft plugin in D26866429.
* Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of
just seeking to the beginning to handle the case when raft log is
truncated before starting the applier

Reviewed By: luqun

Differential Revision: D26759813

---------------------------------------------------------------------

rebase due to relay log Format_description_event event size difference

Summary:
WL#3549: Binlog Compression add extra 1 bytes data into Format_description_event

  3298 @@ -134,6 +137,7 @@ Format_description_event::Format_description_event(uint8_t binlog_ver,
  3299            VIEW_CHANGE_HEADER_LEN,
  3300            XA_PREPARE_HEADER_LEN,
  3301            ROWS_HEADER_LEN_V2,
  3302 +          TRANSACTION_PAYLOAD_EVENT,
  3303        };

Reviewed By: Pushapgl

Differential Revision: D27145095

---------------------------------------------------------------------

fix raft change string metadata event

Summary:
Saw a bunch stage-1 replicaset instance failed due to config change string failure during add/remove instance
```
I0401 00:16:10.894434 1823842 IoCacheUtils.cpp:333] getConfigChangeString eventType = ^G
E0401 00:16:10.894451 1823842 IoCacheUtils.cpp:363] Failed to read metadata event body from cache
E0401 00:16:10.894456 1823842 MysqlRaft.cpp:659] [replicate] Failed to get config change string from iocache
2021-04-01T00:16:10.894464-07:00 8307 [ERROR] [MY-010207] [Repl] Run function 'before_flush' in plugin 'RPL_RAFT' failed
2021-04-01T00:16:10.894478-07:00 8307 [ERROR] [MY-000000] [Server] Failed to rotate binary log
```

After some investigation, the issue is caused is that calculate metadata event length with config change string but forgot write config change string into event body.

Reviewed By: Pushapgl

Differential Revision: D27504157

---------------------------------------------------------------------

Tells raft mode in binlog

Summary:
WIn-Win: Tell whether this binlog is generated while this host is in raft mode.

We already has the bit in FD event, and this diff just speaks it out.

Reviewed By: mpercy

Differential Revision: D28664300

---------------------------------------------------------------------

fix flaky raft testcase

Summary:
There are multiple issues for MTR:
1. in sql/rpl_binlog_sender.cc, if secondaries IO thread receives fatal_error,
   it will quit IO thread instead of reconnect. use unknown error so that
   secondary IO thread can try to reconnect
2. mtr.add_suppression() call: mtr.add_suppression() will execute an insert
   mysql statement into mtr.test_suppressions table and mtr.test_suppressions
   table doesn't contain primary key, thus during idempotent recovery, secondary
   will fail to execute mtr.add_suppression() due to missing PK. Try to move all
   mtr.add_suppression() at the end of testcase to workaround  idempotent recovery failure.
3. When promotion, use raft_promote_to_leader.inc instead of `set rpl_raft_new_leader_uuid`,
   since raft_promote_to_leader will wait the new primary state becomes writeable
4. pass specific warning instead of '.*' to mtr.add_supression
5. etc

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D28774820

---------------------------------------------------------------------

Using locks in Dump_log methods only when raft is enabled

Summary:
We don't need to take locks in non-raft mode since the
underlying MYSQL_BIN_LOG obj will never change. To handle race between
updation of enable_raft_plugin var and using its values in Dump_log we
kill and block all dump threads while updating the var and unblock them
once the var is updated.

Reviewed By: bhatvinay

Differential Revision: D28905522

---------------------------------------------------------------------

leader election to be a sync point in the new leader

Summary:
Port of D27582002 (facebook@39c70ca) from 5.6.35 to 8.0

Newly elected raft leader makes sure that all trxs from the previous
leader is committed by sql appliers. It then switches the server's trx
logs from apply-log-* to binary-log-*. To other part of the system this
looks like a rotation, but the necessary sync calls are not made here.
So, if the server (or os) restarts, then the storage engine could lose
the commit markers of the last batch of trxs. This will result in silent
data drift.
This diff fixes the problem by making an explicit call to
ha_flush_logs() before switching the server's trx logs

Reviewed By: luqun

Differential Revision: D28880407

---------------------------------------------------------------------

Error out SQL thread on out of order opids in raft logs

Summary:
OpIds should always be in order in the raft logs. Added a check
in SQL threads that thows an error and stops the applier when out of
order opids are detected.

Reviewed By: li-chi

Differential Revision: D28810840

---------------------------------------------------------------------

add GET_COMMITTED_GTIDS for raft

Summary:
During recovery Raft Log::Init needs to check with server
what gtids have been committed. Before doing that it finds the entire
set of trxs in the last raft log. The difference between logged -
committed are the pending opids.

Reviewed By: Pushapgl

Differential Revision: D29622786

---------------------------------------------------------------------

raft: skip mts_recovery_groups during start slave

Summary:
During MySQL8+Raft DMP, some instance fail to switch to Leader or start slave

```
2021-06-24T17:56:38.627423-07:00 431 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name /data/mysql/3127/bls-unittestdb658.frc2-3305-mysql.replicaset.180021/binary-logs-3727.000033, event_master_log_pos 1129.
2021-06-24T17:56:38.627473-07:00 431 [ERROR] [MY-010575] [Repl] Error looking for file after /binlogs/binary-logs-3307.000120.
2021-06-24T17:56:38.627516-07:00 431 [ERROR] [MY-000000] [Repl] load_mi_and_rli_from_repositories: rli_init_info returned error
```

similar to 5.6, we don't need to run mts_recovery_groups due to GTID_MODE is always enabled.

Reviewed By: Pushapgl

Differential Revision: D29520066

---------------------------------------------------------------------

update RaftListenerCallbackArg struct

Summary: Due to contract between raft plugin and mysql change, update RaftListenerCallbackArg struct to add  master_uuid field

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D29981349

---------------------------------------------------------------------

always check mi during raft_reset_slave

Summary: add similar nullptr check for raft_reset_slave as raft_stop_sql_thread

Reviewed By: bhatvinay

Differential Revision: D29967915

---------------------------------------------------------------------

raft mtr: update rpl_end_raft.inc for disable-raft

Summary:
rpl_end_raft.inc will unregister raft plugin which will call reset slaves when online disable-raft is enabled. move rpl_end_raft.inc after rpl_stop_slaves.inc to stop slave correctly first.

after stop slaves, call mtr.add_suppression("") won't replicate to slaves. just call mtr.add_suppression("") for all instances(replicas).

Reviewed By: yizhang82

Differential Revision: D30062236

---------------------------------------------------------------------

fix master_uuid

Summary: fix master_uuid in raft mode.

Reviewed By: luqun

Differential Revision: D30261465

---------------------------------------------------------------------

incorrect File_size value in show raft logs result

Summary:
in show raft logs result, the File_size for latest logs file isn't updated correctly.

such as show raft logs;

```
+-------------------------+------------
| Log_name                | File_size  |
| binary-logs-3304.000670 |       1529 |
```

in fact
```
-rw-r----- 1 mysql backup 1669180487 Aug  4 14:49 binary-logs-3304.000670
-rw-r----- 1 mysql backup 1723994154 Aug  4 14:49 binary-logs-3304.000670
```
the  file_size from show raft logs is always 1529 but the real file_size is 1669180487.

The issue is related when writing IO_CACHE directly, its wrapper ostream's position isn't updated.

Reviewed By: yizhang82

Differential Revision: D30116345

---------------------------------------------------------------------

Add gtid_purged_for_tailing

Summary:
Add a new global variable gtid_purged_for_tailing.

It shows the purged GTID for binlog in non-raft mode, and purged GTID for raft log in raft mode.

Reviewed By: abhinav04sharma

Differential Revision: D29372776

---------------------------------------------------------------------

disable skip_setup_replica_if_unnecessary

Summary: After sync with latest official raft plugin, most of MTR failed due to skip_setup_replica_if_unnecessary optimize.

Reviewed By: yizhang82

Differential Revision: D30821648

---------------------------------------------------------------------

latency histograms for raft trx wait

Summary: Port of support added for the same in mysql 5.6. Should help monitor latencies in 8.0 + raft stage -1 replicasets.

Reviewed By: luqun

Differential Revision: D31064764

---------------------------------------------------------------------

handle cases where clean shutdown in raft aborts trxs

Summary: This is a port of the feature in 5.6.

Reviewed By: anirbanr-fb

Differential Revision: D31070593

---------------------------------------------------------------------

fix crash during binlog purging

Summary:
Any error returned by the plugin during binlog purging results in a
crash in mysql8 as the server tries to execute
binlog_error_action_abort. We need to differentiate explicitly between a
plugin error and other error (such as error related to doing disk IO
etc). In thsi particular case, the crash is because of trying to purge a
file that does not exist (i.e which is already purged previosuly) and
raft cannot find it in its index chunk (so it returns a error).

Reviewed By: anirbanr-fb

Differential Revision: D31149997

---------------------------------------------------------------------

update flaky rpl_raft_purged_gtids_dump_threads

Summary:
rpl_raft_purged_gtids_dump_threads MTR failed due to "Cannot replicate because the master purged required binary logs" after server4 will tail server2.

Try to sync server4 before switch to tail server2.

Reviewed By: bhatvinay

Differential Revision: D30818078

---------------------------------------------------------------------

fix various porting issues in mysql8 (raft) crash recovery

Summary:
1. Trimming of the binlog is left to raft plugin (based on the current
    leader's log). Server should skip this step as part of recovery. This
    essentially means setting 'valid_pos' to the last successfully parsed
    trx in the trx log (instead of the engine's view of the trx log) in
    MYSQL_BIN_LOG::recover()
  2. executed_gtid_set should be initialized based on the engine's view of
the trx log file cordinates. So, during startup appropriate flags need
to be passed into MYSQL_BIN_LOG::init_gtid_sets(). init_gtid_sets() is
already changed to handle this, but the flag was not set correctly
during server startup
3. Another fix is in MYSQL_BIN_LOG::init_gtid_sets()to corretly set the position
to read and calculate executed-gtid-set (based on the file name read from the
engine)

Reviewed By: anirbanr-fb

Differential Revision: D31284902

---------------------------------------------------------------------

show_raft_status should take LOCK_status

Summary:
This is a straight port of a 5.6 patch to 8.0

SHOW RAFT STATUS and SHOW GLOBAL STATUS go to the
same functions in the plugin and access shared data structures.
These functions return internal char * pointers to plugin global
variables. They need to be serialized with LOCK_status otherwise
it leads to race conditions.

Reviewed By: li-chi

Differential Revision: D31318340

---------------------------------------------------------------------

Disallowing reset master on raft replicasets

Summary:
This is a straight port of similar patch on 5.6 raft

reset master is inherently not raft compliant.
Its get rid of all binlogs and make an instance appear like
a fresh single instance database without consulting the ring.
We need to block it.

However under certain circumstances (in the future ) e.g. during
first time replicaset build, or when we can guarantee that instance
is single node raft ring, we can potentially do a reset master
followed by rebootstrap of raft. This can also be achieved by
disabling raft, reset master and then re-enabling raft, however
to keep open the door, I have left a force option and will thread
a mechanism from the plugin to call reset_master(force=true)

Reviewed By: li-chi

Differential Revision: D31299214

---------------------------------------------------------------------

Setting topology config in MTR

Summary:
Setting topology config in MTR so that feature that use
topology config can be tested easily. Setting rpl_raft_skip_smc_updates
to avoid unnecessary calls to SMC (even though we supply a dummy
replicaset name).

Reviewed By: li-chi

Differential Revision: D31543877

---------------------------------------------------------------------

Do not allow sql thread start when raft is doing a stop->{}->start transition

Summary:
This is a port of an existing patch made to 5.6 for Raft.

Raft will do a stop of the SQL thread during StopAllWrites.
Then it will repoint the binlog files and during that action, the
SQL threads have to remain stopped. We block it out in this diff by
keeping an atomic bool which can be checked from other functions.

This only applies to raft mode i.e. enable_raft_plugin = true.

Reviewed By: li-chi

Differential Revision: D31319499

---------------------------------------------------------------------

Handle printing of FD event generated by slave SQL thread

Summary:
Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries.

The original commit which added this is d048c0f (P173872135)

To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'.

Reviewed By: anirbanr-fb

Differential Revision: D26359417

---------------------------------------------------------------------

Add exponential backoff for smart restart

Summary:
RAFT Instance crash and Tx Logs filling up.

rocksdb can fill up txlogs.
we should stop restarting mysqld if we have restarted many times in a day

Reviewed By: anirbanr-fb

Differential Revision: D27372750

---------------------------------------------------------------------

always release data_lock mutex to avoid deadlock

Summary: in stage-1 replicaset, when kill a secondary instance, sometime the instance will run into deadlock due to process_raft_queue thread forgot to release its acquired mutex in raft_change_master

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D27602667

---------------------------------------------------------------------

Gracefully exit mysqld_safe loop during backoff

Summary:
Currentt systemctl stop [email protected] can take 10 mins when mysqld_safe is in backoff period.

D28517599 adds a interrupt to sleep in mysql_stop, and mysqld_safe immediately break the retry loop if sleep is interruptted.

Reviewed By: anirbanr-fb

Differential Revision: D28606439

---------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

---------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep added a commit that referenced this pull request Apr 4, 2022
recover raft logs by removing partial trxs

Summary:
Port D24628821

mysqld removes partial trxs in the tail of trx log (named binary-logs on
primaries and apply-logs on secondaries) during startup. However, relay logs
were not of much importance since it was anyways discarded and a new one would
be created.
However, with raft, this is not ideal. Relay logs are raft logs on secondaries
and have to be kept around (and kept sane and consistent). This diff adds the
ability to remove partial trxs from raft/relay logs.
Much of the code to open the last relay log (based on relay log index) and
identify partial trxs is borrowed from existing logic in
MYSQL_BIN_LOG::open_binlog() and binlog_recover()

Reviewed By: Pushapgl

Differential Revision: D26447448

---------------------------------------------------------------------

Checking for inited bool to make sure global_init_info was successful

Summary:
Port D25584004

A master.info and relay.info file can be present
but needs to be properly inited for use. We were bypassing the inited
check which could lead to issues in Raft.
In case there is an error in global_init_info, Raft will do a
raft_reset_slave and make another attempt at it. If both recourses
fail, the init of the plugin would fail.

Reviewed By: Pushapgl

Differential Revision: D26447457

---------------------------------------------------------------------

Support for dumping raft logs to vanilla async replicas

Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

---------------------------------------------------------------------

Fixed the flaky raft test suite

Summary:
First clean run of entire raft test suite :)

**Changes**
* Reset apply_logs on raft secondaries before the start of every test
* Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit.

Reviewed By: luqun

Differential Revision: D26651257

---------------------------------------------------------------------

return error from Raft_replication_delegate when plugin is not available

Summary:
Port D23065441 (facebook@b9067f7)

The new macro is used to call into raft plugin. If plugin gets unloaded
accidentally when enable_raft_plugin is ON, then this STRICT version returns
failure. This is to be called only by raft plugin currently

Reviewed By: Pushapgl

Differential Revision: D26447523

---------------------------------------------------------------------

Adding timestamps for raft rotates which happen in the context of listener thread

Summary:
Port D25572614

The timestamp of a binlog event is picked up from the when field in
the event. In most cases of rotation, the when is left unpopulated
during rotation for the top 3 events (fd, pgtid, metadata). However
in such a situation, a normal rotate (flush binary logs) still manages
to get a valid timestamp, since the thread in which the flush binary
logs happens has a valid start time.
Now enter Raft relay log rotations. In those cases and in the case
of config change rotate, the rotations are happening in the context
of a raft listener queue thread. In that context, the when and start
time of the thread are both 0. The diff handles this case by populating
the when field appropriately.

Reviewed By: bhatvinay

Differential Revision: D26194612

---------------------------------------------------------------------

Raft abrupt stepdown and trim binlog file / gtid test

Summary: binlog file should get trimmed for abrupt stepdown

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D26169975

---------------------------------------------------------------------

Port: Fixes around raft log truncation.

Summary:
**Notes**
* New functions to block and unblock dump threads for plugin to use
during raft log truncation.

Below check is already done in raft plugin as part of raft plugin in D26866429.
* Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of
just seeking to the beginning to handle the case when raft log is
truncated before starting the applier

Reviewed By: luqun

Differential Revision: D26759813

---------------------------------------------------------------------

rebase due to relay log Format_description_event event size difference

Summary:
WL#3549: Binlog Compression add extra 1 bytes data into Format_description_event

  3298 @@ -134,6 +137,7 @@ Format_description_event::Format_description_event(uint8_t binlog_ver,
  3299            VIEW_CHANGE_HEADER_LEN,
  3300            XA_PREPARE_HEADER_LEN,
  3301            ROWS_HEADER_LEN_V2,
  3302 +          TRANSACTION_PAYLOAD_EVENT,
  3303        };

Reviewed By: Pushapgl

Differential Revision: D27145095

---------------------------------------------------------------------

fix raft change string metadata event

Summary:
Saw a bunch stage-1 replicaset instance failed due to config change string failure during add/remove instance
```
I0401 00:16:10.894434 1823842 IoCacheUtils.cpp:333] getConfigChangeString eventType = ^G
E0401 00:16:10.894451 1823842 IoCacheUtils.cpp:363] Failed to read metadata event body from cache
E0401 00:16:10.894456 1823842 MysqlRaft.cpp:659] [replicate] Failed to get config change string from iocache
2021-04-01T00:16:10.894464-07:00 8307 [ERROR] [MY-010207] [Repl] Run function 'before_flush' in plugin 'RPL_RAFT' failed
2021-04-01T00:16:10.894478-07:00 8307 [ERROR] [MY-000000] [Server] Failed to rotate binary log
```

After some investigation, the issue is caused is that calculate metadata event length with config change string but forgot write config change string into event body.

Reviewed By: Pushapgl

Differential Revision: D27504157

---------------------------------------------------------------------

Tells raft mode in binlog

Summary:
WIn-Win: Tell whether this binlog is generated while this host is in raft mode.

We already has the bit in FD event, and this diff just speaks it out.

Reviewed By: mpercy

Differential Revision: D28664300

---------------------------------------------------------------------

fix flaky raft testcase

Summary:
There are multiple issues for MTR:
1. in sql/rpl_binlog_sender.cc, if secondaries IO thread receives fatal_error,
   it will quit IO thread instead of reconnect. use unknown error so that
   secondary IO thread can try to reconnect
2. mtr.add_suppression() call: mtr.add_suppression() will execute an insert
   mysql statement into mtr.test_suppressions table and mtr.test_suppressions
   table doesn't contain primary key, thus during idempotent recovery, secondary
   will fail to execute mtr.add_suppression() due to missing PK. Try to move all
   mtr.add_suppression() at the end of testcase to workaround  idempotent recovery failure.
3. When promotion, use raft_promote_to_leader.inc instead of `set rpl_raft_new_leader_uuid`,
   since raft_promote_to_leader will wait the new primary state becomes writeable
4. pass specific warning instead of '.*' to mtr.add_supression
5. etc

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D28774820

---------------------------------------------------------------------

Using locks in Dump_log methods only when raft is enabled

Summary:
We don't need to take locks in non-raft mode since the
underlying MYSQL_BIN_LOG obj will never change. To handle race between
updation of enable_raft_plugin var and using its values in Dump_log we
kill and block all dump threads while updating the var and unblock them
once the var is updated.

Reviewed By: bhatvinay

Differential Revision: D28905522

---------------------------------------------------------------------

leader election to be a sync point in the new leader

Summary:
Port of D27582002 (facebook@39c70ca) from 5.6.35 to 8.0

Newly elected raft leader makes sure that all trxs from the previous
leader is committed by sql appliers. It then switches the server's trx
logs from apply-log-* to binary-log-*. To other part of the system this
looks like a rotation, but the necessary sync calls are not made here.
So, if the server (or os) restarts, then the storage engine could lose
the commit markers of the last batch of trxs. This will result in silent
data drift.
This diff fixes the problem by making an explicit call to
ha_flush_logs() before switching the server's trx logs

Reviewed By: luqun

Differential Revision: D28880407

---------------------------------------------------------------------

Error out SQL thread on out of order opids in raft logs

Summary:
OpIds should always be in order in the raft logs. Added a check
in SQL threads that thows an error and stops the applier when out of
order opids are detected.

Reviewed By: li-chi

Differential Revision: D28810840

---------------------------------------------------------------------

add GET_COMMITTED_GTIDS for raft

Summary:
During recovery Raft Log::Init needs to check with server
what gtids have been committed. Before doing that it finds the entire
set of trxs in the last raft log. The difference between logged -
committed are the pending opids.

Reviewed By: Pushapgl

Differential Revision: D29622786

---------------------------------------------------------------------

raft: skip mts_recovery_groups during start slave

Summary:
During MySQL8+Raft DMP, some instance fail to switch to Leader or start slave

```
2021-06-24T17:56:38.627423-07:00 431 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name /data/mysql/3127/bls-unittestdb658.frc2-3305-mysql.replicaset.180021/binary-logs-3727.000033, event_master_log_pos 1129.
2021-06-24T17:56:38.627473-07:00 431 [ERROR] [MY-010575] [Repl] Error looking for file after /binlogs/binary-logs-3307.000120.
2021-06-24T17:56:38.627516-07:00 431 [ERROR] [MY-000000] [Repl] load_mi_and_rli_from_repositories: rli_init_info returned error
```

similar to 5.6, we don't need to run mts_recovery_groups due to GTID_MODE is always enabled.

Reviewed By: Pushapgl

Differential Revision: D29520066

---------------------------------------------------------------------

update RaftListenerCallbackArg struct

Summary: Due to contract between raft plugin and mysql change, update RaftListenerCallbackArg struct to add  master_uuid field

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D29981349

---------------------------------------------------------------------

always check mi during raft_reset_slave

Summary: add similar nullptr check for raft_reset_slave as raft_stop_sql_thread

Reviewed By: bhatvinay

Differential Revision: D29967915

---------------------------------------------------------------------

raft mtr: update rpl_end_raft.inc for disable-raft

Summary:
rpl_end_raft.inc will unregister raft plugin which will call reset slaves when online disable-raft is enabled. move rpl_end_raft.inc after rpl_stop_slaves.inc to stop slave correctly first.

after stop slaves, call mtr.add_suppression("") won't replicate to slaves. just call mtr.add_suppression("") for all instances(replicas).

Reviewed By: yizhang82

Differential Revision: D30062236

---------------------------------------------------------------------

fix master_uuid

Summary: fix master_uuid in raft mode.

Reviewed By: luqun

Differential Revision: D30261465

---------------------------------------------------------------------

incorrect File_size value in show raft logs result

Summary:
in show raft logs result, the File_size for latest logs file isn't updated correctly.

such as show raft logs;

```
+-------------------------+------------
| Log_name                | File_size  |
| binary-logs-3304.000670 |       1529 |
```

in fact
```
-rw-r----- 1 mysql backup 1669180487 Aug  4 14:49 binary-logs-3304.000670
-rw-r----- 1 mysql backup 1723994154 Aug  4 14:49 binary-logs-3304.000670
```
the  file_size from show raft logs is always 1529 but the real file_size is 1669180487.

The issue is related when writing IO_CACHE directly, its wrapper ostream's position isn't updated.

Reviewed By: yizhang82

Differential Revision: D30116345

---------------------------------------------------------------------

Add gtid_purged_for_tailing

Summary:
Add a new global variable gtid_purged_for_tailing.

It shows the purged GTID for binlog in non-raft mode, and purged GTID for raft log in raft mode.

Reviewed By: abhinav04sharma

Differential Revision: D29372776

---------------------------------------------------------------------

disable skip_setup_replica_if_unnecessary

Summary: After sync with latest official raft plugin, most of MTR failed due to skip_setup_replica_if_unnecessary optimize.

Reviewed By: yizhang82

Differential Revision: D30821648

---------------------------------------------------------------------

latency histograms for raft trx wait

Summary: Port of support added for the same in mysql 5.6. Should help monitor latencies in 8.0 + raft stage -1 replicasets.

Reviewed By: luqun

Differential Revision: D31064764

---------------------------------------------------------------------

handle cases where clean shutdown in raft aborts trxs

Summary: This is a port of the feature in 5.6.

Reviewed By: anirbanr-fb

Differential Revision: D31070593

---------------------------------------------------------------------

fix crash during binlog purging

Summary:
Any error returned by the plugin during binlog purging results in a
crash in mysql8 as the server tries to execute
binlog_error_action_abort. We need to differentiate explicitly between a
plugin error and other error (such as error related to doing disk IO
etc). In thsi particular case, the crash is because of trying to purge a
file that does not exist (i.e which is already purged previosuly) and
raft cannot find it in its index chunk (so it returns a error).

Reviewed By: anirbanr-fb

Differential Revision: D31149997

---------------------------------------------------------------------

update flaky rpl_raft_purged_gtids_dump_threads

Summary:
rpl_raft_purged_gtids_dump_threads MTR failed due to "Cannot replicate because the master purged required binary logs" after server4 will tail server2.

Try to sync server4 before switch to tail server2.

Reviewed By: bhatvinay

Differential Revision: D30818078

---------------------------------------------------------------------

fix various porting issues in mysql8 (raft) crash recovery

Summary:
1. Trimming of the binlog is left to raft plugin (based on the current
    leader's log). Server should skip this step as part of recovery. This
    essentially means setting 'valid_pos' to the last successfully parsed
    trx in the trx log (instead of the engine's view of the trx log) in
    MYSQL_BIN_LOG::recover()
  2. executed_gtid_set should be initialized based on the engine's view of
the trx log file cordinates. So, during startup appropriate flags need
to be passed into MYSQL_BIN_LOG::init_gtid_sets(). init_gtid_sets() is
already changed to handle this, but the flag was not set correctly
during server startup
3. Another fix is in MYSQL_BIN_LOG::init_gtid_sets()to corretly set the position
to read and calculate executed-gtid-set (based on the file name read from the
engine)

Reviewed By: anirbanr-fb

Differential Revision: D31284902

---------------------------------------------------------------------

show_raft_status should take LOCK_status

Summary:
This is a straight port of a 5.6 patch to 8.0

SHOW RAFT STATUS and SHOW GLOBAL STATUS go to the
same functions in the plugin and access shared data structures.
These functions return internal char * pointers to plugin global
variables. They need to be serialized with LOCK_status otherwise
it leads to race conditions.

Reviewed By: li-chi

Differential Revision: D31318340

---------------------------------------------------------------------

Disallowing reset master on raft replicasets

Summary:
This is a straight port of similar patch on 5.6 raft

reset master is inherently not raft compliant.
Its get rid of all binlogs and make an instance appear like
a fresh single instance database without consulting the ring.
We need to block it.

However under certain circumstances (in the future ) e.g. during
first time replicaset build, or when we can guarantee that instance
is single node raft ring, we can potentially do a reset master
followed by rebootstrap of raft. This can also be achieved by
disabling raft, reset master and then re-enabling raft, however
to keep open the door, I have left a force option and will thread
a mechanism from the plugin to call reset_master(force=true)

Reviewed By: li-chi

Differential Revision: D31299214

---------------------------------------------------------------------

Setting topology config in MTR

Summary:
Setting topology config in MTR so that feature that use
topology config can be tested easily. Setting rpl_raft_skip_smc_updates
to avoid unnecessary calls to SMC (even though we supply a dummy
replicaset name).

Reviewed By: li-chi

Differential Revision: D31543877

---------------------------------------------------------------------

Do not allow sql thread start when raft is doing a stop->{}->start transition

Summary:
This is a port of an existing patch made to 5.6 for Raft.

Raft will do a stop of the SQL thread during StopAllWrites.
Then it will repoint the binlog files and during that action, the
SQL threads have to remain stopped. We block it out in this diff by
keeping an atomic bool which can be checked from other functions.

This only applies to raft mode i.e. enable_raft_plugin = true.

Reviewed By: li-chi

Differential Revision: D31319499

---------------------------------------------------------------------

Handle printing of FD event generated by slave SQL thread

Summary:
Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries.

The original commit which added this is d048c0f (P173872135)

To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'.

Reviewed By: anirbanr-fb

Differential Revision: D26359417

---------------------------------------------------------------------

Add exponential backoff for smart restart

Summary:
RAFT Instance crash and Tx Logs filling up.

rocksdb can fill up txlogs.
we should stop restarting mysqld if we have restarted many times in a day

Reviewed By: anirbanr-fb

Differential Revision: D27372750

---------------------------------------------------------------------

always release data_lock mutex to avoid deadlock

Summary: in stage-1 replicaset, when kill a secondary instance, sometime the instance will run into deadlock due to process_raft_queue thread forgot to release its acquired mutex in raft_change_master

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D27602667

---------------------------------------------------------------------

Gracefully exit mysqld_safe loop during backoff

Summary:
Currentt systemctl stop [email protected] can take 10 mins when mysqld_safe is in backoff period.

D28517599 adds a interrupt to sleep in mysql_stop, and mysqld_safe immediately break the retry loop if sleep is interruptted.

Reviewed By: anirbanr-fb

Differential Revision: D28606439

---------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

---------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep added a commit that referenced this pull request Apr 4, 2022
recover raft logs by removing partial trxs

Summary:
Port D24628821

mysqld removes partial trxs in the tail of trx log (named binary-logs on
primaries and apply-logs on secondaries) during startup. However, relay logs
were not of much importance since it was anyways discarded and a new one would
be created.
However, with raft, this is not ideal. Relay logs are raft logs on secondaries
and have to be kept around (and kept sane and consistent). This diff adds the
ability to remove partial trxs from raft/relay logs.
Much of the code to open the last relay log (based on relay log index) and
identify partial trxs is borrowed from existing logic in
MYSQL_BIN_LOG::open_binlog() and binlog_recover()

Reviewed By: Pushapgl

Differential Revision: D26447448

---------------------------------------------------------------------

Checking for inited bool to make sure global_init_info was successful

Summary:
Port D25584004

A master.info and relay.info file can be present
but needs to be properly inited for use. We were bypassing the inited
check which could lead to issues in Raft.
In case there is an error in global_init_info, Raft will do a
raft_reset_slave and make another attempt at it. If both recourses
fail, the init of the plugin would fail.

Reviewed By: Pushapgl

Differential Revision: D26447457

---------------------------------------------------------------------

Support for dumping raft logs to vanilla async replicas

Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

---------------------------------------------------------------------

Fixed the flaky raft test suite

Summary:
First clean run of entire raft test suite :)

**Changes**
* Reset apply_logs on raft secondaries before the start of every test
* Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit.

Reviewed By: luqun

Differential Revision: D26651257

---------------------------------------------------------------------

return error from Raft_replication_delegate when plugin is not available

Summary:
Port D23065441 (facebook@b9067f7)

The new macro is used to call into raft plugin. If plugin gets unloaded
accidentally when enable_raft_plugin is ON, then this STRICT version returns
failure. This is to be called only by raft plugin currently

Reviewed By: Pushapgl

Differential Revision: D26447523

---------------------------------------------------------------------

Adding timestamps for raft rotates which happen in the context of listener thread

Summary:
Port D25572614

The timestamp of a binlog event is picked up from the when field in
the event. In most cases of rotation, the when is left unpopulated
during rotation for the top 3 events (fd, pgtid, metadata). However
in such a situation, a normal rotate (flush binary logs) still manages
to get a valid timestamp, since the thread in which the flush binary
logs happens has a valid start time.
Now enter Raft relay log rotations. In those cases and in the case
of config change rotate, the rotations are happening in the context
of a raft listener queue thread. In that context, the when and start
time of the thread are both 0. The diff handles this case by populating
the when field appropriately.

Reviewed By: bhatvinay

Differential Revision: D26194612

---------------------------------------------------------------------

Raft abrupt stepdown and trim binlog file / gtid test

Summary: binlog file should get trimmed for abrupt stepdown

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D26169975

---------------------------------------------------------------------

Port: Fixes around raft log truncation.

Summary:
**Notes**
* New functions to block and unblock dump threads for plugin to use
during raft log truncation.

Below check is already done in raft plugin as part of raft plugin in D26866429.
* Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of
just seeking to the beginning to handle the case when raft log is
truncated before starting the applier

Reviewed By: luqun

Differential Revision: D26759813

---------------------------------------------------------------------

rebase due to relay log Format_description_event event size difference

Summary:
WL#3549: Binlog Compression add extra 1 bytes data into Format_description_event

  3298 @@ -134,6 +137,7 @@ Format_description_event::Format_description_event(uint8_t binlog_ver,
  3299            VIEW_CHANGE_HEADER_LEN,
  3300            XA_PREPARE_HEADER_LEN,
  3301            ROWS_HEADER_LEN_V2,
  3302 +          TRANSACTION_PAYLOAD_EVENT,
  3303        };

Reviewed By: Pushapgl

Differential Revision: D27145095

---------------------------------------------------------------------

fix raft change string metadata event

Summary:
Saw a bunch stage-1 replicaset instance failed due to config change string failure during add/remove instance
```
I0401 00:16:10.894434 1823842 IoCacheUtils.cpp:333] getConfigChangeString eventType = ^G
E0401 00:16:10.894451 1823842 IoCacheUtils.cpp:363] Failed to read metadata event body from cache
E0401 00:16:10.894456 1823842 MysqlRaft.cpp:659] [replicate] Failed to get config change string from iocache
2021-04-01T00:16:10.894464-07:00 8307 [ERROR] [MY-010207] [Repl] Run function 'before_flush' in plugin 'RPL_RAFT' failed
2021-04-01T00:16:10.894478-07:00 8307 [ERROR] [MY-000000] [Server] Failed to rotate binary log
```

After some investigation, the issue is caused is that calculate metadata event length with config change string but forgot write config change string into event body.

Reviewed By: Pushapgl

Differential Revision: D27504157

---------------------------------------------------------------------

Tells raft mode in binlog

Summary:
WIn-Win: Tell whether this binlog is generated while this host is in raft mode.

We already has the bit in FD event, and this diff just speaks it out.

Reviewed By: mpercy

Differential Revision: D28664300

---------------------------------------------------------------------

fix flaky raft testcase

Summary:
There are multiple issues for MTR:
1. in sql/rpl_binlog_sender.cc, if secondaries IO thread receives fatal_error,
   it will quit IO thread instead of reconnect. use unknown error so that
   secondary IO thread can try to reconnect
2. mtr.add_suppression() call: mtr.add_suppression() will execute an insert
   mysql statement into mtr.test_suppressions table and mtr.test_suppressions
   table doesn't contain primary key, thus during idempotent recovery, secondary
   will fail to execute mtr.add_suppression() due to missing PK. Try to move all
   mtr.add_suppression() at the end of testcase to workaround  idempotent recovery failure.
3. When promotion, use raft_promote_to_leader.inc instead of `set rpl_raft_new_leader_uuid`,
   since raft_promote_to_leader will wait the new primary state becomes writeable
4. pass specific warning instead of '.*' to mtr.add_supression
5. etc

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D28774820

---------------------------------------------------------------------

Using locks in Dump_log methods only when raft is enabled

Summary:
We don't need to take locks in non-raft mode since the
underlying MYSQL_BIN_LOG obj will never change. To handle race between
updation of enable_raft_plugin var and using its values in Dump_log we
kill and block all dump threads while updating the var and unblock them
once the var is updated.

Reviewed By: bhatvinay

Differential Revision: D28905522

---------------------------------------------------------------------

leader election to be a sync point in the new leader

Summary:
Port of D27582002 (facebook@39c70ca) from 5.6.35 to 8.0

Newly elected raft leader makes sure that all trxs from the previous
leader is committed by sql appliers. It then switches the server's trx
logs from apply-log-* to binary-log-*. To other part of the system this
looks like a rotation, but the necessary sync calls are not made here.
So, if the server (or os) restarts, then the storage engine could lose
the commit markers of the last batch of trxs. This will result in silent
data drift.
This diff fixes the problem by making an explicit call to
ha_flush_logs() before switching the server's trx logs

Reviewed By: luqun

Differential Revision: D28880407

---------------------------------------------------------------------

Error out SQL thread on out of order opids in raft logs

Summary:
OpIds should always be in order in the raft logs. Added a check
in SQL threads that thows an error and stops the applier when out of
order opids are detected.

Reviewed By: li-chi

Differential Revision: D28810840

---------------------------------------------------------------------

add GET_COMMITTED_GTIDS for raft

Summary:
During recovery Raft Log::Init needs to check with server
what gtids have been committed. Before doing that it finds the entire
set of trxs in the last raft log. The difference between logged -
committed are the pending opids.

Reviewed By: Pushapgl

Differential Revision: D29622786

---------------------------------------------------------------------

raft: skip mts_recovery_groups during start slave

Summary:
During MySQL8+Raft DMP, some instance fail to switch to Leader or start slave

```
2021-06-24T17:56:38.627423-07:00 431 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name /data/mysql/3127/bls-unittestdb658.frc2-3305-mysql.replicaset.180021/binary-logs-3727.000033, event_master_log_pos 1129.
2021-06-24T17:56:38.627473-07:00 431 [ERROR] [MY-010575] [Repl] Error looking for file after /binlogs/binary-logs-3307.000120.
2021-06-24T17:56:38.627516-07:00 431 [ERROR] [MY-000000] [Repl] load_mi_and_rli_from_repositories: rli_init_info returned error
```

similar to 5.6, we don't need to run mts_recovery_groups due to GTID_MODE is always enabled.

Reviewed By: Pushapgl

Differential Revision: D29520066

---------------------------------------------------------------------

update RaftListenerCallbackArg struct

Summary: Due to contract between raft plugin and mysql change, update RaftListenerCallbackArg struct to add  master_uuid field

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D29981349

---------------------------------------------------------------------

always check mi during raft_reset_slave

Summary: add similar nullptr check for raft_reset_slave as raft_stop_sql_thread

Reviewed By: bhatvinay

Differential Revision: D29967915

---------------------------------------------------------------------

raft mtr: update rpl_end_raft.inc for disable-raft

Summary:
rpl_end_raft.inc will unregister raft plugin which will call reset slaves when online disable-raft is enabled. move rpl_end_raft.inc after rpl_stop_slaves.inc to stop slave correctly first.

after stop slaves, call mtr.add_suppression("") won't replicate to slaves. just call mtr.add_suppression("") for all instances(replicas).

Reviewed By: yizhang82

Differential Revision: D30062236

---------------------------------------------------------------------

fix master_uuid

Summary: fix master_uuid in raft mode.

Reviewed By: luqun

Differential Revision: D30261465

---------------------------------------------------------------------

incorrect File_size value in show raft logs result

Summary:
in show raft logs result, the File_size for latest logs file isn't updated correctly.

such as show raft logs;

```
+-------------------------+------------
| Log_name                | File_size  |
| binary-logs-3304.000670 |       1529 |
```

in fact
```
-rw-r----- 1 mysql backup 1669180487 Aug  4 14:49 binary-logs-3304.000670
-rw-r----- 1 mysql backup 1723994154 Aug  4 14:49 binary-logs-3304.000670
```
the  file_size from show raft logs is always 1529 but the real file_size is 1669180487.

The issue is related when writing IO_CACHE directly, its wrapper ostream's position isn't updated.

Reviewed By: yizhang82

Differential Revision: D30116345

---------------------------------------------------------------------

Add gtid_purged_for_tailing

Summary:
Add a new global variable gtid_purged_for_tailing.

It shows the purged GTID for binlog in non-raft mode, and purged GTID for raft log in raft mode.

Reviewed By: abhinav04sharma

Differential Revision: D29372776

---------------------------------------------------------------------

disable skip_setup_replica_if_unnecessary

Summary: After sync with latest official raft plugin, most of MTR failed due to skip_setup_replica_if_unnecessary optimize.

Reviewed By: yizhang82

Differential Revision: D30821648

---------------------------------------------------------------------

latency histograms for raft trx wait

Summary: Port of support added for the same in mysql 5.6. Should help monitor latencies in 8.0 + raft stage -1 replicasets.

Reviewed By: luqun

Differential Revision: D31064764

---------------------------------------------------------------------

handle cases where clean shutdown in raft aborts trxs

Summary: This is a port of the feature in 5.6.

Reviewed By: anirbanr-fb

Differential Revision: D31070593

---------------------------------------------------------------------

fix crash during binlog purging

Summary:
Any error returned by the plugin during binlog purging results in a
crash in mysql8 as the server tries to execute
binlog_error_action_abort. We need to differentiate explicitly between a
plugin error and other error (such as error related to doing disk IO
etc). In thsi particular case, the crash is because of trying to purge a
file that does not exist (i.e which is already purged previosuly) and
raft cannot find it in its index chunk (so it returns a error).

Reviewed By: anirbanr-fb

Differential Revision: D31149997

---------------------------------------------------------------------

update flaky rpl_raft_purged_gtids_dump_threads

Summary:
rpl_raft_purged_gtids_dump_threads MTR failed due to "Cannot replicate because the master purged required binary logs" after server4 will tail server2.

Try to sync server4 before switch to tail server2.

Reviewed By: bhatvinay

Differential Revision: D30818078

---------------------------------------------------------------------

fix various porting issues in mysql8 (raft) crash recovery

Summary:
1. Trimming of the binlog is left to raft plugin (based on the current
    leader's log). Server should skip this step as part of recovery. This
    essentially means setting 'valid_pos' to the last successfully parsed
    trx in the trx log (instead of the engine's view of the trx log) in
    MYSQL_BIN_LOG::recover()
  2. executed_gtid_set should be initialized based on the engine's view of
the trx log file cordinates. So, during startup appropriate flags need
to be passed into MYSQL_BIN_LOG::init_gtid_sets(). init_gtid_sets() is
already changed to handle this, but the flag was not set correctly
during server startup
3. Another fix is in MYSQL_BIN_LOG::init_gtid_sets()to corretly set the position
to read and calculate executed-gtid-set (based on the file name read from the
engine)

Reviewed By: anirbanr-fb

Differential Revision: D31284902

---------------------------------------------------------------------

show_raft_status should take LOCK_status

Summary:
This is a straight port of a 5.6 patch to 8.0

SHOW RAFT STATUS and SHOW GLOBAL STATUS go to the
same functions in the plugin and access shared data structures.
These functions return internal char * pointers to plugin global
variables. They need to be serialized with LOCK_status otherwise
it leads to race conditions.

Reviewed By: li-chi

Differential Revision: D31318340

---------------------------------------------------------------------

Disallowing reset master on raft replicasets

Summary:
This is a straight port of similar patch on 5.6 raft

reset master is inherently not raft compliant.
Its get rid of all binlogs and make an instance appear like
a fresh single instance database without consulting the ring.
We need to block it.

However under certain circumstances (in the future ) e.g. during
first time replicaset build, or when we can guarantee that instance
is single node raft ring, we can potentially do a reset master
followed by rebootstrap of raft. This can also be achieved by
disabling raft, reset master and then re-enabling raft, however
to keep open the door, I have left a force option and will thread
a mechanism from the plugin to call reset_master(force=true)

Reviewed By: li-chi

Differential Revision: D31299214

---------------------------------------------------------------------

Setting topology config in MTR

Summary:
Setting topology config in MTR so that feature that use
topology config can be tested easily. Setting rpl_raft_skip_smc_updates
to avoid unnecessary calls to SMC (even though we supply a dummy
replicaset name).

Reviewed By: li-chi

Differential Revision: D31543877

---------------------------------------------------------------------

Do not allow sql thread start when raft is doing a stop->{}->start transition

Summary:
This is a port of an existing patch made to 5.6 for Raft.

Raft will do a stop of the SQL thread during StopAllWrites.
Then it will repoint the binlog files and during that action, the
SQL threads have to remain stopped. We block it out in this diff by
keeping an atomic bool which can be checked from other functions.

This only applies to raft mode i.e. enable_raft_plugin = true.

Reviewed By: li-chi

Differential Revision: D31319499

---------------------------------------------------------------------

Handle printing of FD event generated by slave SQL thread

Summary:
Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries.

The original commit which added this is d048c0f (P173872135)

To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'.

Reviewed By: anirbanr-fb

Differential Revision: D26359417

---------------------------------------------------------------------

Add exponential backoff for smart restart

Summary:
RAFT Instance crash and Tx Logs filling up.

rocksdb can fill up txlogs.
we should stop restarting mysqld if we have restarted many times in a day

Reviewed By: anirbanr-fb

Differential Revision: D27372750

---------------------------------------------------------------------

always release data_lock mutex to avoid deadlock

Summary: in stage-1 replicaset, when kill a secondary instance, sometime the instance will run into deadlock due to process_raft_queue thread forgot to release its acquired mutex in raft_change_master

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D27602667

---------------------------------------------------------------------

Gracefully exit mysqld_safe loop during backoff

Summary:
Currentt systemctl stop [email protected] can take 10 mins when mysqld_safe is in backoff period.

D28517599 adds a interrupt to sleep in mysql_stop, and mysqld_safe immediately break the retry loop if sleep is interruptted.

Reviewed By: anirbanr-fb

Differential Revision: D28606439

---------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

---------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep added a commit that referenced this pull request Apr 5, 2022
recover raft logs by removing partial trxs

Summary:
Port D24628821

mysqld removes partial trxs in the tail of trx log (named binary-logs on
primaries and apply-logs on secondaries) during startup. However, relay logs
were not of much importance since it was anyways discarded and a new one would
be created.
However, with raft, this is not ideal. Relay logs are raft logs on secondaries
and have to be kept around (and kept sane and consistent). This diff adds the
ability to remove partial trxs from raft/relay logs.
Much of the code to open the last relay log (based on relay log index) and
identify partial trxs is borrowed from existing logic in
MYSQL_BIN_LOG::open_binlog() and binlog_recover()

Reviewed By: Pushapgl

Differential Revision: D26447448

---------------------------------------------------------------------

Checking for inited bool to make sure global_init_info was successful

Summary:
Port D25584004

A master.info and relay.info file can be present
but needs to be properly inited for use. We were bypassing the inited
check which could lead to issues in Raft.
In case there is an error in global_init_info, Raft will do a
raft_reset_slave and make another attempt at it. If both recourses
fail, the init of the plugin would fail.

Reviewed By: Pushapgl

Differential Revision: D26447457

---------------------------------------------------------------------

Support for dumping raft logs to vanilla async replicas

Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

---------------------------------------------------------------------

Fixed the flaky raft test suite

Summary:
First clean run of entire raft test suite :)

**Changes**
* Reset apply_logs on raft secondaries before the start of every test
* Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit.

Reviewed By: luqun

Differential Revision: D26651257

---------------------------------------------------------------------

return error from Raft_replication_delegate when plugin is not available

Summary:
Port D23065441 (facebook@b9067f7)

The new macro is used to call into raft plugin. If plugin gets unloaded
accidentally when enable_raft_plugin is ON, then this STRICT version returns
failure. This is to be called only by raft plugin currently

Reviewed By: Pushapgl

Differential Revision: D26447523

---------------------------------------------------------------------

Adding timestamps for raft rotates which happen in the context of listener thread

Summary:
Port D25572614

The timestamp of a binlog event is picked up from the when field in
the event. In most cases of rotation, the when is left unpopulated
during rotation for the top 3 events (fd, pgtid, metadata). However
in such a situation, a normal rotate (flush binary logs) still manages
to get a valid timestamp, since the thread in which the flush binary
logs happens has a valid start time.
Now enter Raft relay log rotations. In those cases and in the case
of config change rotate, the rotations are happening in the context
of a raft listener queue thread. In that context, the when and start
time of the thread are both 0. The diff handles this case by populating
the when field appropriately.

Reviewed By: bhatvinay

Differential Revision: D26194612

---------------------------------------------------------------------

Raft abrupt stepdown and trim binlog file / gtid test

Summary: binlog file should get trimmed for abrupt stepdown

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D26169975

---------------------------------------------------------------------

Port: Fixes around raft log truncation.

Summary:
**Notes**
* New functions to block and unblock dump threads for plugin to use
during raft log truncation.

Below check is already done in raft plugin as part of raft plugin in D26866429.
* Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of
just seeking to the beginning to handle the case when raft log is
truncated before starting the applier

Reviewed By: luqun

Differential Revision: D26759813

---------------------------------------------------------------------

rebase due to relay log Format_description_event event size difference

Summary:
WL#3549: Binlog Compression add extra 1 bytes data into Format_description_event

  3298 @@ -134,6 +137,7 @@ Format_description_event::Format_description_event(uint8_t binlog_ver,
  3299            VIEW_CHANGE_HEADER_LEN,
  3300            XA_PREPARE_HEADER_LEN,
  3301            ROWS_HEADER_LEN_V2,
  3302 +          TRANSACTION_PAYLOAD_EVENT,
  3303        };

Reviewed By: Pushapgl

Differential Revision: D27145095

---------------------------------------------------------------------

fix raft change string metadata event

Summary:
Saw a bunch stage-1 replicaset instance failed due to config change string failure during add/remove instance
```
I0401 00:16:10.894434 1823842 IoCacheUtils.cpp:333] getConfigChangeString eventType = ^G
E0401 00:16:10.894451 1823842 IoCacheUtils.cpp:363] Failed to read metadata event body from cache
E0401 00:16:10.894456 1823842 MysqlRaft.cpp:659] [replicate] Failed to get config change string from iocache
2021-04-01T00:16:10.894464-07:00 8307 [ERROR] [MY-010207] [Repl] Run function 'before_flush' in plugin 'RPL_RAFT' failed
2021-04-01T00:16:10.894478-07:00 8307 [ERROR] [MY-000000] [Server] Failed to rotate binary log
```

After some investigation, the issue is caused is that calculate metadata event length with config change string but forgot write config change string into event body.

Reviewed By: Pushapgl

Differential Revision: D27504157

---------------------------------------------------------------------

Tells raft mode in binlog

Summary:
WIn-Win: Tell whether this binlog is generated while this host is in raft mode.

We already has the bit in FD event, and this diff just speaks it out.

Reviewed By: mpercy

Differential Revision: D28664300

---------------------------------------------------------------------

fix flaky raft testcase

Summary:
There are multiple issues for MTR:
1. in sql/rpl_binlog_sender.cc, if secondaries IO thread receives fatal_error,
   it will quit IO thread instead of reconnect. use unknown error so that
   secondary IO thread can try to reconnect
2. mtr.add_suppression() call: mtr.add_suppression() will execute an insert
   mysql statement into mtr.test_suppressions table and mtr.test_suppressions
   table doesn't contain primary key, thus during idempotent recovery, secondary
   will fail to execute mtr.add_suppression() due to missing PK. Try to move all
   mtr.add_suppression() at the end of testcase to workaround  idempotent recovery failure.
3. When promotion, use raft_promote_to_leader.inc instead of `set rpl_raft_new_leader_uuid`,
   since raft_promote_to_leader will wait the new primary state becomes writeable
4. pass specific warning instead of '.*' to mtr.add_supression
5. etc

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D28774820

---------------------------------------------------------------------

Using locks in Dump_log methods only when raft is enabled

Summary:
We don't need to take locks in non-raft mode since the
underlying MYSQL_BIN_LOG obj will never change. To handle race between
updation of enable_raft_plugin var and using its values in Dump_log we
kill and block all dump threads while updating the var and unblock them
once the var is updated.

Reviewed By: bhatvinay

Differential Revision: D28905522

---------------------------------------------------------------------

leader election to be a sync point in the new leader

Summary:
Port of D27582002 (facebook@39c70ca) from 5.6.35 to 8.0

Newly elected raft leader makes sure that all trxs from the previous
leader is committed by sql appliers. It then switches the server's trx
logs from apply-log-* to binary-log-*. To other part of the system this
looks like a rotation, but the necessary sync calls are not made here.
So, if the server (or os) restarts, then the storage engine could lose
the commit markers of the last batch of trxs. This will result in silent
data drift.
This diff fixes the problem by making an explicit call to
ha_flush_logs() before switching the server's trx logs

Reviewed By: luqun

Differential Revision: D28880407

---------------------------------------------------------------------

Error out SQL thread on out of order opids in raft logs

Summary:
OpIds should always be in order in the raft logs. Added a check
in SQL threads that thows an error and stops the applier when out of
order opids are detected.

Reviewed By: li-chi

Differential Revision: D28810840

---------------------------------------------------------------------

add GET_COMMITTED_GTIDS for raft

Summary:
During recovery Raft Log::Init needs to check with server
what gtids have been committed. Before doing that it finds the entire
set of trxs in the last raft log. The difference between logged -
committed are the pending opids.

Reviewed By: Pushapgl

Differential Revision: D29622786

---------------------------------------------------------------------

raft: skip mts_recovery_groups during start slave

Summary:
During MySQL8+Raft DMP, some instance fail to switch to Leader or start slave

```
2021-06-24T17:56:38.627423-07:00 431 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name /data/mysql/3127/bls-unittestdb658.frc2-3305-mysql.replicaset.180021/binary-logs-3727.000033, event_master_log_pos 1129.
2021-06-24T17:56:38.627473-07:00 431 [ERROR] [MY-010575] [Repl] Error looking for file after /binlogs/binary-logs-3307.000120.
2021-06-24T17:56:38.627516-07:00 431 [ERROR] [MY-000000] [Repl] load_mi_and_rli_from_repositories: rli_init_info returned error
```

similar to 5.6, we don't need to run mts_recovery_groups due to GTID_MODE is always enabled.

Reviewed By: Pushapgl

Differential Revision: D29520066

---------------------------------------------------------------------

update RaftListenerCallbackArg struct

Summary: Due to contract between raft plugin and mysql change, update RaftListenerCallbackArg struct to add  master_uuid field

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D29981349

---------------------------------------------------------------------

always check mi during raft_reset_slave

Summary: add similar nullptr check for raft_reset_slave as raft_stop_sql_thread

Reviewed By: bhatvinay

Differential Revision: D29967915

---------------------------------------------------------------------

raft mtr: update rpl_end_raft.inc for disable-raft

Summary:
rpl_end_raft.inc will unregister raft plugin which will call reset slaves when online disable-raft is enabled. move rpl_end_raft.inc after rpl_stop_slaves.inc to stop slave correctly first.

after stop slaves, call mtr.add_suppression("") won't replicate to slaves. just call mtr.add_suppression("") for all instances(replicas).

Reviewed By: yizhang82

Differential Revision: D30062236

---------------------------------------------------------------------

fix master_uuid

Summary: fix master_uuid in raft mode.

Reviewed By: luqun

Differential Revision: D30261465

---------------------------------------------------------------------

incorrect File_size value in show raft logs result

Summary:
in show raft logs result, the File_size for latest logs file isn't updated correctly.

such as show raft logs;

```
+-------------------------+------------
| Log_name                | File_size  |
| binary-logs-3304.000670 |       1529 |
```

in fact
```
-rw-r----- 1 mysql backup 1669180487 Aug  4 14:49 binary-logs-3304.000670
-rw-r----- 1 mysql backup 1723994154 Aug  4 14:49 binary-logs-3304.000670
```
the  file_size from show raft logs is always 1529 but the real file_size is 1669180487.

The issue is related when writing IO_CACHE directly, its wrapper ostream's position isn't updated.

Reviewed By: yizhang82

Differential Revision: D30116345

---------------------------------------------------------------------

Add gtid_purged_for_tailing

Summary:
Add a new global variable gtid_purged_for_tailing.

It shows the purged GTID for binlog in non-raft mode, and purged GTID for raft log in raft mode.

Reviewed By: abhinav04sharma

Differential Revision: D29372776

---------------------------------------------------------------------

disable skip_setup_replica_if_unnecessary

Summary: After sync with latest official raft plugin, most of MTR failed due to skip_setup_replica_if_unnecessary optimize.

Reviewed By: yizhang82

Differential Revision: D30821648

---------------------------------------------------------------------

latency histograms for raft trx wait

Summary: Port of support added for the same in mysql 5.6. Should help monitor latencies in 8.0 + raft stage -1 replicasets.

Reviewed By: luqun

Differential Revision: D31064764

---------------------------------------------------------------------

handle cases where clean shutdown in raft aborts trxs

Summary: This is a port of the feature in 5.6.

Reviewed By: anirbanr-fb

Differential Revision: D31070593

---------------------------------------------------------------------

fix crash during binlog purging

Summary:
Any error returned by the plugin during binlog purging results in a
crash in mysql8 as the server tries to execute
binlog_error_action_abort. We need to differentiate explicitly between a
plugin error and other error (such as error related to doing disk IO
etc). In thsi particular case, the crash is because of trying to purge a
file that does not exist (i.e which is already purged previosuly) and
raft cannot find it in its index chunk (so it returns a error).

Reviewed By: anirbanr-fb

Differential Revision: D31149997

---------------------------------------------------------------------

update flaky rpl_raft_purged_gtids_dump_threads

Summary:
rpl_raft_purged_gtids_dump_threads MTR failed due to "Cannot replicate because the master purged required binary logs" after server4 will tail server2.

Try to sync server4 before switch to tail server2.

Reviewed By: bhatvinay

Differential Revision: D30818078

---------------------------------------------------------------------

fix various porting issues in mysql8 (raft) crash recovery

Summary:
1. Trimming of the binlog is left to raft plugin (based on the current
    leader's log). Server should skip this step as part of recovery. This
    essentially means setting 'valid_pos' to the last successfully parsed
    trx in the trx log (instead of the engine's view of the trx log) in
    MYSQL_BIN_LOG::recover()
  2. executed_gtid_set should be initialized based on the engine's view of
the trx log file cordinates. So, during startup appropriate flags need
to be passed into MYSQL_BIN_LOG::init_gtid_sets(). init_gtid_sets() is
already changed to handle this, but the flag was not set correctly
during server startup
3. Another fix is in MYSQL_BIN_LOG::init_gtid_sets()to corretly set the position
to read and calculate executed-gtid-set (based on the file name read from the
engine)

Reviewed By: anirbanr-fb

Differential Revision: D31284902

---------------------------------------------------------------------

show_raft_status should take LOCK_status

Summary:
This is a straight port of a 5.6 patch to 8.0

SHOW RAFT STATUS and SHOW GLOBAL STATUS go to the
same functions in the plugin and access shared data structures.
These functions return internal char * pointers to plugin global
variables. They need to be serialized with LOCK_status otherwise
it leads to race conditions.

Reviewed By: li-chi

Differential Revision: D31318340

---------------------------------------------------------------------

Disallowing reset master on raft replicasets

Summary:
This is a straight port of similar patch on 5.6 raft

reset master is inherently not raft compliant.
Its get rid of all binlogs and make an instance appear like
a fresh single instance database without consulting the ring.
We need to block it.

However under certain circumstances (in the future ) e.g. during
first time replicaset build, or when we can guarantee that instance
is single node raft ring, we can potentially do a reset master
followed by rebootstrap of raft. This can also be achieved by
disabling raft, reset master and then re-enabling raft, however
to keep open the door, I have left a force option and will thread
a mechanism from the plugin to call reset_master(force=true)

Reviewed By: li-chi

Differential Revision: D31299214

---------------------------------------------------------------------

Setting topology config in MTR

Summary:
Setting topology config in MTR so that feature that use
topology config can be tested easily. Setting rpl_raft_skip_smc_updates
to avoid unnecessary calls to SMC (even though we supply a dummy
replicaset name).

Reviewed By: li-chi

Differential Revision: D31543877

---------------------------------------------------------------------

Do not allow sql thread start when raft is doing a stop->{}->start transition

Summary:
This is a port of an existing patch made to 5.6 for Raft.

Raft will do a stop of the SQL thread during StopAllWrites.
Then it will repoint the binlog files and during that action, the
SQL threads have to remain stopped. We block it out in this diff by
keeping an atomic bool which can be checked from other functions.

This only applies to raft mode i.e. enable_raft_plugin = true.

Reviewed By: li-chi

Differential Revision: D31319499

---------------------------------------------------------------------

Handle printing of FD event generated by slave SQL thread

Summary:
Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries.

The original commit which added this is d048c0f (P173872135)

To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'.

Reviewed By: anirbanr-fb

Differential Revision: D26359417

---------------------------------------------------------------------

Add exponential backoff for smart restart

Summary:
RAFT Instance crash and Tx Logs filling up.

rocksdb can fill up txlogs.
we should stop restarting mysqld if we have restarted many times in a day

Reviewed By: anirbanr-fb

Differential Revision: D27372750

---------------------------------------------------------------------

always release data_lock mutex to avoid deadlock

Summary: in stage-1 replicaset, when kill a secondary instance, sometime the instance will run into deadlock due to process_raft_queue thread forgot to release its acquired mutex in raft_change_master

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D27602667

---------------------------------------------------------------------

Gracefully exit mysqld_safe loop during backoff

Summary:
Currentt systemctl stop [email protected] can take 10 mins when mysqld_safe is in backoff period.

D28517599 adds a interrupt to sleep in mysql_stop, and mysqld_safe immediately break the retry loop if sleep is interruptted.

Reviewed By: anirbanr-fb

Differential Revision: D28606439

---------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

---------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Apr 5, 2022
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990
inikep added a commit that referenced this pull request Apr 5, 2022
recover raft logs by removing partial trxs

Summary:
Port D24628821

mysqld removes partial trxs in the tail of trx log (named binary-logs on
primaries and apply-logs on secondaries) during startup. However, relay logs
were not of much importance since it was anyways discarded and a new one would
be created.
However, with raft, this is not ideal. Relay logs are raft logs on secondaries
and have to be kept around (and kept sane and consistent). This diff adds the
ability to remove partial trxs from raft/relay logs.
Much of the code to open the last relay log (based on relay log index) and
identify partial trxs is borrowed from existing logic in
MYSQL_BIN_LOG::open_binlog() and binlog_recover()

Reviewed By: Pushapgl

Differential Revision: D26447448

---------------------------------------------------------------------

Checking for inited bool to make sure global_init_info was successful

Summary:
Port D25584004

A master.info and relay.info file can be present
but needs to be properly inited for use. We were bypassing the inited
check which could lead to issues in Raft.
In case there is an error in global_init_info, Raft will do a
raft_reset_slave and make another attempt at it. If both recourses
fail, the init of the plugin would fail.

Reviewed By: Pushapgl

Differential Revision: D26447457

---------------------------------------------------------------------

Support for dumping raft logs to vanilla async replicas

Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

---------------------------------------------------------------------

Fixed the flaky raft test suite

Summary:
First clean run of entire raft test suite :)

**Changes**
* Reset apply_logs on raft secondaries before the start of every test
* Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit.

Reviewed By: luqun

Differential Revision: D26651257

---------------------------------------------------------------------

return error from Raft_replication_delegate when plugin is not available

Summary:
Port D23065441 (facebook@b9067f7)

The new macro is used to call into raft plugin. If plugin gets unloaded
accidentally when enable_raft_plugin is ON, then this STRICT version returns
failure. This is to be called only by raft plugin currently

Reviewed By: Pushapgl

Differential Revision: D26447523

---------------------------------------------------------------------

Adding timestamps for raft rotates which happen in the context of listener thread

Summary:
Port D25572614

The timestamp of a binlog event is picked up from the when field in
the event. In most cases of rotation, the when is left unpopulated
during rotation for the top 3 events (fd, pgtid, metadata). However
in such a situation, a normal rotate (flush binary logs) still manages
to get a valid timestamp, since the thread in which the flush binary
logs happens has a valid start time.
Now enter Raft relay log rotations. In those cases and in the case
of config change rotate, the rotations are happening in the context
of a raft listener queue thread. In that context, the when and start
time of the thread are both 0. The diff handles this case by populating
the when field appropriately.

Reviewed By: bhatvinay

Differential Revision: D26194612

---------------------------------------------------------------------

Raft abrupt stepdown and trim binlog file / gtid test

Summary: binlog file should get trimmed for abrupt stepdown

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D26169975

---------------------------------------------------------------------

Port: Fixes around raft log truncation.

Summary:
**Notes**
* New functions to block and unblock dump threads for plugin to use
during raft log truncation.

Below check is already done in raft plugin as part of raft plugin in D26866429.
* Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of
just seeking to the beginning to handle the case when raft log is
truncated before starting the applier

Reviewed By: luqun

Differential Revision: D26759813

---------------------------------------------------------------------

rebase due to relay log Format_description_event event size difference

Summary:
WL#3549: Binlog Compression add extra 1 bytes data into Format_description_event

  3298 @@ -134,6 +137,7 @@ Format_description_event::Format_description_event(uint8_t binlog_ver,
  3299            VIEW_CHANGE_HEADER_LEN,
  3300            XA_PREPARE_HEADER_LEN,
  3301            ROWS_HEADER_LEN_V2,
  3302 +          TRANSACTION_PAYLOAD_EVENT,
  3303        };

Reviewed By: Pushapgl

Differential Revision: D27145095

---------------------------------------------------------------------

fix raft change string metadata event

Summary:
Saw a bunch stage-1 replicaset instance failed due to config change string failure during add/remove instance
```
I0401 00:16:10.894434 1823842 IoCacheUtils.cpp:333] getConfigChangeString eventType = ^G
E0401 00:16:10.894451 1823842 IoCacheUtils.cpp:363] Failed to read metadata event body from cache
E0401 00:16:10.894456 1823842 MysqlRaft.cpp:659] [replicate] Failed to get config change string from iocache
2021-04-01T00:16:10.894464-07:00 8307 [ERROR] [MY-010207] [Repl] Run function 'before_flush' in plugin 'RPL_RAFT' failed
2021-04-01T00:16:10.894478-07:00 8307 [ERROR] [MY-000000] [Server] Failed to rotate binary log
```

After some investigation, the issue is caused is that calculate metadata event length with config change string but forgot write config change string into event body.

Reviewed By: Pushapgl

Differential Revision: D27504157

---------------------------------------------------------------------

Tells raft mode in binlog

Summary:
WIn-Win: Tell whether this binlog is generated while this host is in raft mode.

We already has the bit in FD event, and this diff just speaks it out.

Reviewed By: mpercy

Differential Revision: D28664300

---------------------------------------------------------------------

fix flaky raft testcase

Summary:
There are multiple issues for MTR:
1. in sql/rpl_binlog_sender.cc, if secondaries IO thread receives fatal_error,
   it will quit IO thread instead of reconnect. use unknown error so that
   secondary IO thread can try to reconnect
2. mtr.add_suppression() call: mtr.add_suppression() will execute an insert
   mysql statement into mtr.test_suppressions table and mtr.test_suppressions
   table doesn't contain primary key, thus during idempotent recovery, secondary
   will fail to execute mtr.add_suppression() due to missing PK. Try to move all
   mtr.add_suppression() at the end of testcase to workaround  idempotent recovery failure.
3. When promotion, use raft_promote_to_leader.inc instead of `set rpl_raft_new_leader_uuid`,
   since raft_promote_to_leader will wait the new primary state becomes writeable
4. pass specific warning instead of '.*' to mtr.add_supression
5. etc

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D28774820

---------------------------------------------------------------------

Using locks in Dump_log methods only when raft is enabled

Summary:
We don't need to take locks in non-raft mode since the
underlying MYSQL_BIN_LOG obj will never change. To handle race between
updation of enable_raft_plugin var and using its values in Dump_log we
kill and block all dump threads while updating the var and unblock them
once the var is updated.

Reviewed By: bhatvinay

Differential Revision: D28905522

---------------------------------------------------------------------

leader election to be a sync point in the new leader

Summary:
Port of D27582002 (facebook@39c70ca) from 5.6.35 to 8.0

Newly elected raft leader makes sure that all trxs from the previous
leader is committed by sql appliers. It then switches the server's trx
logs from apply-log-* to binary-log-*. To other part of the system this
looks like a rotation, but the necessary sync calls are not made here.
So, if the server (or os) restarts, then the storage engine could lose
the commit markers of the last batch of trxs. This will result in silent
data drift.
This diff fixes the problem by making an explicit call to
ha_flush_logs() before switching the server's trx logs

Reviewed By: luqun

Differential Revision: D28880407

---------------------------------------------------------------------

Error out SQL thread on out of order opids in raft logs

Summary:
OpIds should always be in order in the raft logs. Added a check
in SQL threads that thows an error and stops the applier when out of
order opids are detected.

Reviewed By: li-chi

Differential Revision: D28810840

---------------------------------------------------------------------

add GET_COMMITTED_GTIDS for raft

Summary:
During recovery Raft Log::Init needs to check with server
what gtids have been committed. Before doing that it finds the entire
set of trxs in the last raft log. The difference between logged -
committed are the pending opids.

Reviewed By: Pushapgl

Differential Revision: D29622786

---------------------------------------------------------------------

raft: skip mts_recovery_groups during start slave

Summary:
During MySQL8+Raft DMP, some instance fail to switch to Leader or start slave

```
2021-06-24T17:56:38.627423-07:00 431 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name /data/mysql/3127/bls-unittestdb658.frc2-3305-mysql.replicaset.180021/binary-logs-3727.000033, event_master_log_pos 1129.
2021-06-24T17:56:38.627473-07:00 431 [ERROR] [MY-010575] [Repl] Error looking for file after /binlogs/binary-logs-3307.000120.
2021-06-24T17:56:38.627516-07:00 431 [ERROR] [MY-000000] [Repl] load_mi_and_rli_from_repositories: rli_init_info returned error
```

similar to 5.6, we don't need to run mts_recovery_groups due to GTID_MODE is always enabled.

Reviewed By: Pushapgl

Differential Revision: D29520066

---------------------------------------------------------------------

update RaftListenerCallbackArg struct

Summary: Due to contract between raft plugin and mysql change, update RaftListenerCallbackArg struct to add  master_uuid field

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D29981349

---------------------------------------------------------------------

always check mi during raft_reset_slave

Summary: add similar nullptr check for raft_reset_slave as raft_stop_sql_thread

Reviewed By: bhatvinay

Differential Revision: D29967915

---------------------------------------------------------------------

raft mtr: update rpl_end_raft.inc for disable-raft

Summary:
rpl_end_raft.inc will unregister raft plugin which will call reset slaves when online disable-raft is enabled. move rpl_end_raft.inc after rpl_stop_slaves.inc to stop slave correctly first.

after stop slaves, call mtr.add_suppression("") won't replicate to slaves. just call mtr.add_suppression("") for all instances(replicas).

Reviewed By: yizhang82

Differential Revision: D30062236

---------------------------------------------------------------------

fix master_uuid

Summary: fix master_uuid in raft mode.

Reviewed By: luqun

Differential Revision: D30261465

---------------------------------------------------------------------

incorrect File_size value in show raft logs result

Summary:
in show raft logs result, the File_size for latest logs file isn't updated correctly.

such as show raft logs;

```
+-------------------------+------------
| Log_name                | File_size  |
| binary-logs-3304.000670 |       1529 |
```

in fact
```
-rw-r----- 1 mysql backup 1669180487 Aug  4 14:49 binary-logs-3304.000670
-rw-r----- 1 mysql backup 1723994154 Aug  4 14:49 binary-logs-3304.000670
```
the  file_size from show raft logs is always 1529 but the real file_size is 1669180487.

The issue is related when writing IO_CACHE directly, its wrapper ostream's position isn't updated.

Reviewed By: yizhang82

Differential Revision: D30116345

---------------------------------------------------------------------

Add gtid_purged_for_tailing

Summary:
Add a new global variable gtid_purged_for_tailing.

It shows the purged GTID for binlog in non-raft mode, and purged GTID for raft log in raft mode.

Reviewed By: abhinav04sharma

Differential Revision: D29372776

---------------------------------------------------------------------

disable skip_setup_replica_if_unnecessary

Summary: After sync with latest official raft plugin, most of MTR failed due to skip_setup_replica_if_unnecessary optimize.

Reviewed By: yizhang82

Differential Revision: D30821648

---------------------------------------------------------------------

latency histograms for raft trx wait

Summary: Port of support added for the same in mysql 5.6. Should help monitor latencies in 8.0 + raft stage -1 replicasets.

Reviewed By: luqun

Differential Revision: D31064764

---------------------------------------------------------------------

handle cases where clean shutdown in raft aborts trxs

Summary: This is a port of the feature in 5.6.

Reviewed By: anirbanr-fb

Differential Revision: D31070593

---------------------------------------------------------------------

fix crash during binlog purging

Summary:
Any error returned by the plugin during binlog purging results in a
crash in mysql8 as the server tries to execute
binlog_error_action_abort. We need to differentiate explicitly between a
plugin error and other error (such as error related to doing disk IO
etc). In thsi particular case, the crash is because of trying to purge a
file that does not exist (i.e which is already purged previosuly) and
raft cannot find it in its index chunk (so it returns a error).

Reviewed By: anirbanr-fb

Differential Revision: D31149997

---------------------------------------------------------------------

update flaky rpl_raft_purged_gtids_dump_threads

Summary:
rpl_raft_purged_gtids_dump_threads MTR failed due to "Cannot replicate because the master purged required binary logs" after server4 will tail server2.

Try to sync server4 before switch to tail server2.

Reviewed By: bhatvinay

Differential Revision: D30818078

---------------------------------------------------------------------

fix various porting issues in mysql8 (raft) crash recovery

Summary:
1. Trimming of the binlog is left to raft plugin (based on the current
    leader's log). Server should skip this step as part of recovery. This
    essentially means setting 'valid_pos' to the last successfully parsed
    trx in the trx log (instead of the engine's view of the trx log) in
    MYSQL_BIN_LOG::recover()
  2. executed_gtid_set should be initialized based on the engine's view of
the trx log file cordinates. So, during startup appropriate flags need
to be passed into MYSQL_BIN_LOG::init_gtid_sets(). init_gtid_sets() is
already changed to handle this, but the flag was not set correctly
during server startup
3. Another fix is in MYSQL_BIN_LOG::init_gtid_sets()to corretly set the position
to read and calculate executed-gtid-set (based on the file name read from the
engine)

Reviewed By: anirbanr-fb

Differential Revision: D31284902

---------------------------------------------------------------------

show_raft_status should take LOCK_status

Summary:
This is a straight port of a 5.6 patch to 8.0

SHOW RAFT STATUS and SHOW GLOBAL STATUS go to the
same functions in the plugin and access shared data structures.
These functions return internal char * pointers to plugin global
variables. They need to be serialized with LOCK_status otherwise
it leads to race conditions.

Reviewed By: li-chi

Differential Revision: D31318340

---------------------------------------------------------------------

Disallowing reset master on raft replicasets

Summary:
This is a straight port of similar patch on 5.6 raft

reset master is inherently not raft compliant.
Its get rid of all binlogs and make an instance appear like
a fresh single instance database without consulting the ring.
We need to block it.

However under certain circumstances (in the future ) e.g. during
first time replicaset build, or when we can guarantee that instance
is single node raft ring, we can potentially do a reset master
followed by rebootstrap of raft. This can also be achieved by
disabling raft, reset master and then re-enabling raft, however
to keep open the door, I have left a force option and will thread
a mechanism from the plugin to call reset_master(force=true)

Reviewed By: li-chi

Differential Revision: D31299214

---------------------------------------------------------------------

Setting topology config in MTR

Summary:
Setting topology config in MTR so that feature that use
topology config can be tested easily. Setting rpl_raft_skip_smc_updates
to avoid unnecessary calls to SMC (even though we supply a dummy
replicaset name).

Reviewed By: li-chi

Differential Revision: D31543877

---------------------------------------------------------------------

Do not allow sql thread start when raft is doing a stop->{}->start transition

Summary:
This is a port of an existing patch made to 5.6 for Raft.

Raft will do a stop of the SQL thread during StopAllWrites.
Then it will repoint the binlog files and during that action, the
SQL threads have to remain stopped. We block it out in this diff by
keeping an atomic bool which can be checked from other functions.

This only applies to raft mode i.e. enable_raft_plugin = true.

Reviewed By: li-chi

Differential Revision: D31319499

---------------------------------------------------------------------

Handle printing of FD event generated by slave SQL thread

Summary:
Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries.

The original commit which added this is d048c0f (P173872135)

To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'.

Reviewed By: anirbanr-fb

Differential Revision: D26359417

---------------------------------------------------------------------

Add exponential backoff for smart restart

Summary:
RAFT Instance crash and Tx Logs filling up.

rocksdb can fill up txlogs.
we should stop restarting mysqld if we have restarted many times in a day

Reviewed By: anirbanr-fb

Differential Revision: D27372750

---------------------------------------------------------------------

always release data_lock mutex to avoid deadlock

Summary: in stage-1 replicaset, when kill a secondary instance, sometime the instance will run into deadlock due to process_raft_queue thread forgot to release its acquired mutex in raft_change_master

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D27602667

---------------------------------------------------------------------

Gracefully exit mysqld_safe loop during backoff

Summary:
Currentt systemctl stop [email protected] can take 10 mins when mysqld_safe is in backoff period.

D28517599 adds a interrupt to sleep in mysql_stop, and mysqld_safe immediately break the retry loop if sleep is interruptted.

Reviewed By: anirbanr-fb

Differential Revision: D28606439

---------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

---------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752

---------------------------------------------------------------------

Reduce max backoff time from 24h to 30mins

Summary:
Over the recent SEV0, raft instances crash looping because of block skew. Even after the clock is normal, mysqld_safe failed to bring back mysqld because extended 24hours backoff.

This diff reduces the max backoff time from 24h to 30mins. So it will still keep trying, but not so aggressively that filling up the trx logs.

Reviewed By: bart2

Differential Revision: D31661172
inikep pushed a commit that referenced this pull request Apr 6, 2022
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990
inikep added a commit that referenced this pull request Apr 6, 2022
recover raft logs by removing partial trxs

Summary:
Port D24628821

mysqld removes partial trxs in the tail of trx log (named binary-logs on
primaries and apply-logs on secondaries) during startup. However, relay logs
were not of much importance since it was anyways discarded and a new one would
be created.
However, with raft, this is not ideal. Relay logs are raft logs on secondaries
and have to be kept around (and kept sane and consistent). This diff adds the
ability to remove partial trxs from raft/relay logs.
Much of the code to open the last relay log (based on relay log index) and
identify partial trxs is borrowed from existing logic in
MYSQL_BIN_LOG::open_binlog() and binlog_recover()

Reviewed By: Pushapgl

Differential Revision: D26447448

---------------------------------------------------------------------

Checking for inited bool to make sure global_init_info was successful

Summary:
Port D25584004

A master.info and relay.info file can be present
but needs to be properly inited for use. We were bypassing the inited
check which could lead to issues in Raft.
In case there is an error in global_init_info, Raft will do a
raft_reset_slave and make another attempt at it. If both recourses
fail, the init of the plugin would fail.

Reviewed By: Pushapgl

Differential Revision: D26447457

---------------------------------------------------------------------

Support for dumping raft logs to vanilla async replicas

Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

---------------------------------------------------------------------

Fixed the flaky raft test suite

Summary:
First clean run of entire raft test suite :)

**Changes**
* Reset apply_logs on raft secondaries before the start of every test
* Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit.

Reviewed By: luqun

Differential Revision: D26651257

---------------------------------------------------------------------

return error from Raft_replication_delegate when plugin is not available

Summary:
Port D23065441 (facebook@b9067f7)

The new macro is used to call into raft plugin. If plugin gets unloaded
accidentally when enable_raft_plugin is ON, then this STRICT version returns
failure. This is to be called only by raft plugin currently

Reviewed By: Pushapgl

Differential Revision: D26447523

---------------------------------------------------------------------

Adding timestamps for raft rotates which happen in the context of listener thread

Summary:
Port D25572614

The timestamp of a binlog event is picked up from the when field in
the event. In most cases of rotation, the when is left unpopulated
during rotation for the top 3 events (fd, pgtid, metadata). However
in such a situation, a normal rotate (flush binary logs) still manages
to get a valid timestamp, since the thread in which the flush binary
logs happens has a valid start time.
Now enter Raft relay log rotations. In those cases and in the case
of config change rotate, the rotations are happening in the context
of a raft listener queue thread. In that context, the when and start
time of the thread are both 0. The diff handles this case by populating
the when field appropriately.

Reviewed By: bhatvinay

Differential Revision: D26194612

---------------------------------------------------------------------

Raft abrupt stepdown and trim binlog file / gtid test

Summary: binlog file should get trimmed for abrupt stepdown

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D26169975

---------------------------------------------------------------------

Port: Fixes around raft log truncation.

Summary:
**Notes**
* New functions to block and unblock dump threads for plugin to use
during raft log truncation.

Below check is already done in raft plugin as part of raft plugin in D26866429.
* Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of
just seeking to the beginning to handle the case when raft log is
truncated before starting the applier

Reviewed By: luqun

Differential Revision: D26759813

---------------------------------------------------------------------

rebase due to relay log Format_description_event event size difference

Summary:
WL#3549: Binlog Compression add extra 1 bytes data into Format_description_event

  3298 @@ -134,6 +137,7 @@ Format_description_event::Format_description_event(uint8_t binlog_ver,
  3299            VIEW_CHANGE_HEADER_LEN,
  3300            XA_PREPARE_HEADER_LEN,
  3301            ROWS_HEADER_LEN_V2,
  3302 +          TRANSACTION_PAYLOAD_EVENT,
  3303        };

Reviewed By: Pushapgl

Differential Revision: D27145095

---------------------------------------------------------------------

fix raft change string metadata event

Summary:
Saw a bunch stage-1 replicaset instance failed due to config change string failure during add/remove instance
```
I0401 00:16:10.894434 1823842 IoCacheUtils.cpp:333] getConfigChangeString eventType = ^G
E0401 00:16:10.894451 1823842 IoCacheUtils.cpp:363] Failed to read metadata event body from cache
E0401 00:16:10.894456 1823842 MysqlRaft.cpp:659] [replicate] Failed to get config change string from iocache
2021-04-01T00:16:10.894464-07:00 8307 [ERROR] [MY-010207] [Repl] Run function 'before_flush' in plugin 'RPL_RAFT' failed
2021-04-01T00:16:10.894478-07:00 8307 [ERROR] [MY-000000] [Server] Failed to rotate binary log
```

After some investigation, the issue is caused is that calculate metadata event length with config change string but forgot write config change string into event body.

Reviewed By: Pushapgl

Differential Revision: D27504157

---------------------------------------------------------------------

Tells raft mode in binlog

Summary:
WIn-Win: Tell whether this binlog is generated while this host is in raft mode.

We already has the bit in FD event, and this diff just speaks it out.

Reviewed By: mpercy

Differential Revision: D28664300

---------------------------------------------------------------------

fix flaky raft testcase

Summary:
There are multiple issues for MTR:
1. in sql/rpl_binlog_sender.cc, if secondaries IO thread receives fatal_error,
   it will quit IO thread instead of reconnect. use unknown error so that
   secondary IO thread can try to reconnect
2. mtr.add_suppression() call: mtr.add_suppression() will execute an insert
   mysql statement into mtr.test_suppressions table and mtr.test_suppressions
   table doesn't contain primary key, thus during idempotent recovery, secondary
   will fail to execute mtr.add_suppression() due to missing PK. Try to move all
   mtr.add_suppression() at the end of testcase to workaround  idempotent recovery failure.
3. When promotion, use raft_promote_to_leader.inc instead of `set rpl_raft_new_leader_uuid`,
   since raft_promote_to_leader will wait the new primary state becomes writeable
4. pass specific warning instead of '.*' to mtr.add_supression
5. etc

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D28774820

---------------------------------------------------------------------

Using locks in Dump_log methods only when raft is enabled

Summary:
We don't need to take locks in non-raft mode since the
underlying MYSQL_BIN_LOG obj will never change. To handle race between
updation of enable_raft_plugin var and using its values in Dump_log we
kill and block all dump threads while updating the var and unblock them
once the var is updated.

Reviewed By: bhatvinay

Differential Revision: D28905522

---------------------------------------------------------------------

leader election to be a sync point in the new leader

Summary:
Port of D27582002 (facebook@39c70ca) from 5.6.35 to 8.0

Newly elected raft leader makes sure that all trxs from the previous
leader is committed by sql appliers. It then switches the server's trx
logs from apply-log-* to binary-log-*. To other part of the system this
looks like a rotation, but the necessary sync calls are not made here.
So, if the server (or os) restarts, then the storage engine could lose
the commit markers of the last batch of trxs. This will result in silent
data drift.
This diff fixes the problem by making an explicit call to
ha_flush_logs() before switching the server's trx logs

Reviewed By: luqun

Differential Revision: D28880407

---------------------------------------------------------------------

Error out SQL thread on out of order opids in raft logs

Summary:
OpIds should always be in order in the raft logs. Added a check
in SQL threads that thows an error and stops the applier when out of
order opids are detected.

Reviewed By: li-chi

Differential Revision: D28810840

---------------------------------------------------------------------

add GET_COMMITTED_GTIDS for raft

Summary:
During recovery Raft Log::Init needs to check with server
what gtids have been committed. Before doing that it finds the entire
set of trxs in the last raft log. The difference between logged -
committed are the pending opids.

Reviewed By: Pushapgl

Differential Revision: D29622786

---------------------------------------------------------------------

raft: skip mts_recovery_groups during start slave

Summary:
During MySQL8+Raft DMP, some instance fail to switch to Leader or start slave

```
2021-06-24T17:56:38.627423-07:00 431 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name /data/mysql/3127/bls-unittestdb658.frc2-3305-mysql.replicaset.180021/binary-logs-3727.000033, event_master_log_pos 1129.
2021-06-24T17:56:38.627473-07:00 431 [ERROR] [MY-010575] [Repl] Error looking for file after /binlogs/binary-logs-3307.000120.
2021-06-24T17:56:38.627516-07:00 431 [ERROR] [MY-000000] [Repl] load_mi_and_rli_from_repositories: rli_init_info returned error
```

similar to 5.6, we don't need to run mts_recovery_groups due to GTID_MODE is always enabled.

Reviewed By: Pushapgl

Differential Revision: D29520066

---------------------------------------------------------------------

update RaftListenerCallbackArg struct

Summary: Due to contract between raft plugin and mysql change, update RaftListenerCallbackArg struct to add  master_uuid field

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D29981349

---------------------------------------------------------------------

always check mi during raft_reset_slave

Summary: add similar nullptr check for raft_reset_slave as raft_stop_sql_thread

Reviewed By: bhatvinay

Differential Revision: D29967915

---------------------------------------------------------------------

raft mtr: update rpl_end_raft.inc for disable-raft

Summary:
rpl_end_raft.inc will unregister raft plugin which will call reset slaves when online disable-raft is enabled. move rpl_end_raft.inc after rpl_stop_slaves.inc to stop slave correctly first.

after stop slaves, call mtr.add_suppression("") won't replicate to slaves. just call mtr.add_suppression("") for all instances(replicas).

Reviewed By: yizhang82

Differential Revision: D30062236

---------------------------------------------------------------------

fix master_uuid

Summary: fix master_uuid in raft mode.

Reviewed By: luqun

Differential Revision: D30261465

---------------------------------------------------------------------

incorrect File_size value in show raft logs result

Summary:
in show raft logs result, the File_size for latest logs file isn't updated correctly.

such as show raft logs;

```
+-------------------------+------------
| Log_name                | File_size  |
| binary-logs-3304.000670 |       1529 |
```

in fact
```
-rw-r----- 1 mysql backup 1669180487 Aug  4 14:49 binary-logs-3304.000670
-rw-r----- 1 mysql backup 1723994154 Aug  4 14:49 binary-logs-3304.000670
```
the  file_size from show raft logs is always 1529 but the real file_size is 1669180487.

The issue is related when writing IO_CACHE directly, its wrapper ostream's position isn't updated.

Reviewed By: yizhang82

Differential Revision: D30116345

---------------------------------------------------------------------

Add gtid_purged_for_tailing

Summary:
Add a new global variable gtid_purged_for_tailing.

It shows the purged GTID for binlog in non-raft mode, and purged GTID for raft log in raft mode.

Reviewed By: abhinav04sharma

Differential Revision: D29372776

---------------------------------------------------------------------

disable skip_setup_replica_if_unnecessary

Summary: After sync with latest official raft plugin, most of MTR failed due to skip_setup_replica_if_unnecessary optimize.

Reviewed By: yizhang82

Differential Revision: D30821648

---------------------------------------------------------------------

latency histograms for raft trx wait

Summary: Port of support added for the same in mysql 5.6. Should help monitor latencies in 8.0 + raft stage -1 replicasets.

Reviewed By: luqun

Differential Revision: D31064764

---------------------------------------------------------------------

handle cases where clean shutdown in raft aborts trxs

Summary: This is a port of the feature in 5.6.

Reviewed By: anirbanr-fb

Differential Revision: D31070593

---------------------------------------------------------------------

fix crash during binlog purging

Summary:
Any error returned by the plugin during binlog purging results in a
crash in mysql8 as the server tries to execute
binlog_error_action_abort. We need to differentiate explicitly between a
plugin error and other error (such as error related to doing disk IO
etc). In thsi particular case, the crash is because of trying to purge a
file that does not exist (i.e which is already purged previosuly) and
raft cannot find it in its index chunk (so it returns a error).

Reviewed By: anirbanr-fb

Differential Revision: D31149997

---------------------------------------------------------------------

update flaky rpl_raft_purged_gtids_dump_threads

Summary:
rpl_raft_purged_gtids_dump_threads MTR failed due to "Cannot replicate because the master purged required binary logs" after server4 will tail server2.

Try to sync server4 before switch to tail server2.

Reviewed By: bhatvinay

Differential Revision: D30818078

---------------------------------------------------------------------

fix various porting issues in mysql8 (raft) crash recovery

Summary:
1. Trimming of the binlog is left to raft plugin (based on the current
    leader's log). Server should skip this step as part of recovery. This
    essentially means setting 'valid_pos' to the last successfully parsed
    trx in the trx log (instead of the engine's view of the trx log) in
    MYSQL_BIN_LOG::recover()
  2. executed_gtid_set should be initialized based on the engine's view of
the trx log file cordinates. So, during startup appropriate flags need
to be passed into MYSQL_BIN_LOG::init_gtid_sets(). init_gtid_sets() is
already changed to handle this, but the flag was not set correctly
during server startup
3. Another fix is in MYSQL_BIN_LOG::init_gtid_sets()to corretly set the position
to read and calculate executed-gtid-set (based on the file name read from the
engine)

Reviewed By: anirbanr-fb

Differential Revision: D31284902

---------------------------------------------------------------------

show_raft_status should take LOCK_status

Summary:
This is a straight port of a 5.6 patch to 8.0

SHOW RAFT STATUS and SHOW GLOBAL STATUS go to the
same functions in the plugin and access shared data structures.
These functions return internal char * pointers to plugin global
variables. They need to be serialized with LOCK_status otherwise
it leads to race conditions.

Reviewed By: li-chi

Differential Revision: D31318340

---------------------------------------------------------------------

Disallowing reset master on raft replicasets

Summary:
This is a straight port of similar patch on 5.6 raft

reset master is inherently not raft compliant.
Its get rid of all binlogs and make an instance appear like
a fresh single instance database without consulting the ring.
We need to block it.

However under certain circumstances (in the future ) e.g. during
first time replicaset build, or when we can guarantee that instance
is single node raft ring, we can potentially do a reset master
followed by rebootstrap of raft. This can also be achieved by
disabling raft, reset master and then re-enabling raft, however
to keep open the door, I have left a force option and will thread
a mechanism from the plugin to call reset_master(force=true)

Reviewed By: li-chi

Differential Revision: D31299214

---------------------------------------------------------------------

Setting topology config in MTR

Summary:
Setting topology config in MTR so that feature that use
topology config can be tested easily. Setting rpl_raft_skip_smc_updates
to avoid unnecessary calls to SMC (even though we supply a dummy
replicaset name).

Reviewed By: li-chi

Differential Revision: D31543877

---------------------------------------------------------------------

Do not allow sql thread start when raft is doing a stop->{}->start transition

Summary:
This is a port of an existing patch made to 5.6 for Raft.

Raft will do a stop of the SQL thread during StopAllWrites.
Then it will repoint the binlog files and during that action, the
SQL threads have to remain stopped. We block it out in this diff by
keeping an atomic bool which can be checked from other functions.

This only applies to raft mode i.e. enable_raft_plugin = true.

Reviewed By: li-chi

Differential Revision: D31319499

---------------------------------------------------------------------

Handle printing of FD event generated by slave SQL thread

Summary:
Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries.

The original commit which added this is d048c0f (P173872135)

To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'.

Reviewed By: anirbanr-fb

Differential Revision: D26359417

---------------------------------------------------------------------

Add exponential backoff for smart restart

Summary:
RAFT Instance crash and Tx Logs filling up.

rocksdb can fill up txlogs.
we should stop restarting mysqld if we have restarted many times in a day

Reviewed By: anirbanr-fb

Differential Revision: D27372750

---------------------------------------------------------------------

always release data_lock mutex to avoid deadlock

Summary: in stage-1 replicaset, when kill a secondary instance, sometime the instance will run into deadlock due to process_raft_queue thread forgot to release its acquired mutex in raft_change_master

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D27602667

---------------------------------------------------------------------

Gracefully exit mysqld_safe loop during backoff

Summary:
Currentt systemctl stop [email protected] can take 10 mins when mysqld_safe is in backoff period.

D28517599 adds a interrupt to sleep in mysql_stop, and mysqld_safe immediately break the retry loop if sleep is interruptted.

Reviewed By: anirbanr-fb

Differential Revision: D28606439

---------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

---------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752

---------------------------------------------------------------------

Reduce max backoff time from 24h to 30mins

Summary:
Over the recent SEV0, raft instances crash looping because of block skew. Even after the clock is normal, mysqld_safe failed to bring back mysqld because extended 24hours backoff.

This diff reduces the max backoff time from 24h to 30mins. So it will still keep trying, but not so aggressively that filling up the trx logs.

Reviewed By: bart2

Differential Revision: D31661172
inikep added a commit that referenced this pull request Apr 6, 2022
recover raft logs by removing partial trxs

Summary:
Port D24628821

mysqld removes partial trxs in the tail of trx log (named binary-logs on
primaries and apply-logs on secondaries) during startup. However, relay logs
were not of much importance since it was anyways discarded and a new one would
be created.
However, with raft, this is not ideal. Relay logs are raft logs on secondaries
and have to be kept around (and kept sane and consistent). This diff adds the
ability to remove partial trxs from raft/relay logs.
Much of the code to open the last relay log (based on relay log index) and
identify partial trxs is borrowed from existing logic in
MYSQL_BIN_LOG::open_binlog() and binlog_recover()

Reviewed By: Pushapgl

Differential Revision: D26447448

---------------------------------------------------------------------

Checking for inited bool to make sure global_init_info was successful

Summary:
Port D25584004

A master.info and relay.info file can be present
but needs to be properly inited for use. We were bypassing the inited
check which could lead to issues in Raft.
In case there is an error in global_init_info, Raft will do a
raft_reset_slave and make another attempt at it. If both recourses
fail, the init of the plugin would fail.

Reviewed By: Pushapgl

Differential Revision: D26447457

---------------------------------------------------------------------

Support for dumping raft logs to vanilla async replicas

Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

---------------------------------------------------------------------

Fixed the flaky raft test suite

Summary:
First clean run of entire raft test suite :)

**Changes**
* Reset apply_logs on raft secondaries before the start of every test
* Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit.

Reviewed By: luqun

Differential Revision: D26651257

---------------------------------------------------------------------

return error from Raft_replication_delegate when plugin is not available

Summary:
Port D23065441 (facebook@b9067f7)

The new macro is used to call into raft plugin. If plugin gets unloaded
accidentally when enable_raft_plugin is ON, then this STRICT version returns
failure. This is to be called only by raft plugin currently

Reviewed By: Pushapgl

Differential Revision: D26447523

---------------------------------------------------------------------

Adding timestamps for raft rotates which happen in the context of listener thread

Summary:
Port D25572614

The timestamp of a binlog event is picked up from the when field in
the event. In most cases of rotation, the when is left unpopulated
during rotation for the top 3 events (fd, pgtid, metadata). However
in such a situation, a normal rotate (flush binary logs) still manages
to get a valid timestamp, since the thread in which the flush binary
logs happens has a valid start time.
Now enter Raft relay log rotations. In those cases and in the case
of config change rotate, the rotations are happening in the context
of a raft listener queue thread. In that context, the when and start
time of the thread are both 0. The diff handles this case by populating
the when field appropriately.

Reviewed By: bhatvinay

Differential Revision: D26194612

---------------------------------------------------------------------

Raft abrupt stepdown and trim binlog file / gtid test

Summary: binlog file should get trimmed for abrupt stepdown

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D26169975

---------------------------------------------------------------------

Port: Fixes around raft log truncation.

Summary:
**Notes**
* New functions to block and unblock dump threads for plugin to use
during raft log truncation.

Below check is already done in raft plugin as part of raft plugin in D26866429.
* Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of
just seeking to the beginning to handle the case when raft log is
truncated before starting the applier

Reviewed By: luqun

Differential Revision: D26759813

---------------------------------------------------------------------

rebase due to relay log Format_description_event event size difference

Summary:
WL#3549: Binlog Compression add extra 1 bytes data into Format_description_event

  3298 @@ -134,6 +137,7 @@ Format_description_event::Format_description_event(uint8_t binlog_ver,
  3299            VIEW_CHANGE_HEADER_LEN,
  3300            XA_PREPARE_HEADER_LEN,
  3301            ROWS_HEADER_LEN_V2,
  3302 +          TRANSACTION_PAYLOAD_EVENT,
  3303        };

Reviewed By: Pushapgl

Differential Revision: D27145095

---------------------------------------------------------------------

fix raft change string metadata event

Summary:
Saw a bunch stage-1 replicaset instance failed due to config change string failure during add/remove instance
```
I0401 00:16:10.894434 1823842 IoCacheUtils.cpp:333] getConfigChangeString eventType = ^G
E0401 00:16:10.894451 1823842 IoCacheUtils.cpp:363] Failed to read metadata event body from cache
E0401 00:16:10.894456 1823842 MysqlRaft.cpp:659] [replicate] Failed to get config change string from iocache
2021-04-01T00:16:10.894464-07:00 8307 [ERROR] [MY-010207] [Repl] Run function 'before_flush' in plugin 'RPL_RAFT' failed
2021-04-01T00:16:10.894478-07:00 8307 [ERROR] [MY-000000] [Server] Failed to rotate binary log
```

After some investigation, the issue is caused is that calculate metadata event length with config change string but forgot write config change string into event body.

Reviewed By: Pushapgl

Differential Revision: D27504157

---------------------------------------------------------------------

Tells raft mode in binlog

Summary:
WIn-Win: Tell whether this binlog is generated while this host is in raft mode.

We already has the bit in FD event, and this diff just speaks it out.

Reviewed By: mpercy

Differential Revision: D28664300

---------------------------------------------------------------------

fix flaky raft testcase

Summary:
There are multiple issues for MTR:
1. in sql/rpl_binlog_sender.cc, if secondaries IO thread receives fatal_error,
   it will quit IO thread instead of reconnect. use unknown error so that
   secondary IO thread can try to reconnect
2. mtr.add_suppression() call: mtr.add_suppression() will execute an insert
   mysql statement into mtr.test_suppressions table and mtr.test_suppressions
   table doesn't contain primary key, thus during idempotent recovery, secondary
   will fail to execute mtr.add_suppression() due to missing PK. Try to move all
   mtr.add_suppression() at the end of testcase to workaround  idempotent recovery failure.
3. When promotion, use raft_promote_to_leader.inc instead of `set rpl_raft_new_leader_uuid`,
   since raft_promote_to_leader will wait the new primary state becomes writeable
4. pass specific warning instead of '.*' to mtr.add_supression
5. etc

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D28774820

---------------------------------------------------------------------

Using locks in Dump_log methods only when raft is enabled

Summary:
We don't need to take locks in non-raft mode since the
underlying MYSQL_BIN_LOG obj will never change. To handle race between
updation of enable_raft_plugin var and using its values in Dump_log we
kill and block all dump threads while updating the var and unblock them
once the var is updated.

Reviewed By: bhatvinay

Differential Revision: D28905522

---------------------------------------------------------------------

leader election to be a sync point in the new leader

Summary:
Port of D27582002 (facebook@39c70ca) from 5.6.35 to 8.0

Newly elected raft leader makes sure that all trxs from the previous
leader is committed by sql appliers. It then switches the server's trx
logs from apply-log-* to binary-log-*. To other part of the system this
looks like a rotation, but the necessary sync calls are not made here.
So, if the server (or os) restarts, then the storage engine could lose
the commit markers of the last batch of trxs. This will result in silent
data drift.
This diff fixes the problem by making an explicit call to
ha_flush_logs() before switching the server's trx logs

Reviewed By: luqun

Differential Revision: D28880407

---------------------------------------------------------------------

Error out SQL thread on out of order opids in raft logs

Summary:
OpIds should always be in order in the raft logs. Added a check
in SQL threads that thows an error and stops the applier when out of
order opids are detected.

Reviewed By: li-chi

Differential Revision: D28810840

---------------------------------------------------------------------

add GET_COMMITTED_GTIDS for raft

Summary:
During recovery Raft Log::Init needs to check with server
what gtids have been committed. Before doing that it finds the entire
set of trxs in the last raft log. The difference between logged -
committed are the pending opids.

Reviewed By: Pushapgl

Differential Revision: D29622786

---------------------------------------------------------------------

raft: skip mts_recovery_groups during start slave

Summary:
During MySQL8+Raft DMP, some instance fail to switch to Leader or start slave

```
2021-06-24T17:56:38.627423-07:00 431 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name /data/mysql/3127/bls-unittestdb658.frc2-3305-mysql.replicaset.180021/binary-logs-3727.000033, event_master_log_pos 1129.
2021-06-24T17:56:38.627473-07:00 431 [ERROR] [MY-010575] [Repl] Error looking for file after /binlogs/binary-logs-3307.000120.
2021-06-24T17:56:38.627516-07:00 431 [ERROR] [MY-000000] [Repl] load_mi_and_rli_from_repositories: rli_init_info returned error
```

similar to 5.6, we don't need to run mts_recovery_groups due to GTID_MODE is always enabled.

Reviewed By: Pushapgl

Differential Revision: D29520066

---------------------------------------------------------------------

update RaftListenerCallbackArg struct

Summary: Due to contract between raft plugin and mysql change, update RaftListenerCallbackArg struct to add  master_uuid field

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D29981349

---------------------------------------------------------------------

always check mi during raft_reset_slave

Summary: add similar nullptr check for raft_reset_slave as raft_stop_sql_thread

Reviewed By: bhatvinay

Differential Revision: D29967915

---------------------------------------------------------------------

raft mtr: update rpl_end_raft.inc for disable-raft

Summary:
rpl_end_raft.inc will unregister raft plugin which will call reset slaves when online disable-raft is enabled. move rpl_end_raft.inc after rpl_stop_slaves.inc to stop slave correctly first.

after stop slaves, call mtr.add_suppression("") won't replicate to slaves. just call mtr.add_suppression("") for all instances(replicas).

Reviewed By: yizhang82

Differential Revision: D30062236

---------------------------------------------------------------------

fix master_uuid

Summary: fix master_uuid in raft mode.

Reviewed By: luqun

Differential Revision: D30261465

---------------------------------------------------------------------

incorrect File_size value in show raft logs result

Summary:
in show raft logs result, the File_size for latest logs file isn't updated correctly.

such as show raft logs;

```
+-------------------------+------------
| Log_name                | File_size  |
| binary-logs-3304.000670 |       1529 |
```

in fact
```
-rw-r----- 1 mysql backup 1669180487 Aug  4 14:49 binary-logs-3304.000670
-rw-r----- 1 mysql backup 1723994154 Aug  4 14:49 binary-logs-3304.000670
```
the  file_size from show raft logs is always 1529 but the real file_size is 1669180487.

The issue is related when writing IO_CACHE directly, its wrapper ostream's position isn't updated.

Reviewed By: yizhang82

Differential Revision: D30116345

---------------------------------------------------------------------

Add gtid_purged_for_tailing

Summary:
Add a new global variable gtid_purged_for_tailing.

It shows the purged GTID for binlog in non-raft mode, and purged GTID for raft log in raft mode.

Reviewed By: abhinav04sharma

Differential Revision: D29372776

---------------------------------------------------------------------

disable skip_setup_replica_if_unnecessary

Summary: After sync with latest official raft plugin, most of MTR failed due to skip_setup_replica_if_unnecessary optimize.

Reviewed By: yizhang82

Differential Revision: D30821648

---------------------------------------------------------------------

latency histograms for raft trx wait

Summary: Port of support added for the same in mysql 5.6. Should help monitor latencies in 8.0 + raft stage -1 replicasets.

Reviewed By: luqun

Differential Revision: D31064764

---------------------------------------------------------------------

handle cases where clean shutdown in raft aborts trxs

Summary: This is a port of the feature in 5.6.

Reviewed By: anirbanr-fb

Differential Revision: D31070593

---------------------------------------------------------------------

fix crash during binlog purging

Summary:
Any error returned by the plugin during binlog purging results in a
crash in mysql8 as the server tries to execute
binlog_error_action_abort. We need to differentiate explicitly between a
plugin error and other error (such as error related to doing disk IO
etc). In thsi particular case, the crash is because of trying to purge a
file that does not exist (i.e which is already purged previosuly) and
raft cannot find it in its index chunk (so it returns a error).

Reviewed By: anirbanr-fb

Differential Revision: D31149997

---------------------------------------------------------------------

update flaky rpl_raft_purged_gtids_dump_threads

Summary:
rpl_raft_purged_gtids_dump_threads MTR failed due to "Cannot replicate because the master purged required binary logs" after server4 will tail server2.

Try to sync server4 before switch to tail server2.

Reviewed By: bhatvinay

Differential Revision: D30818078

---------------------------------------------------------------------

fix various porting issues in mysql8 (raft) crash recovery

Summary:
1. Trimming of the binlog is left to raft plugin (based on the current
    leader's log). Server should skip this step as part of recovery. This
    essentially means setting 'valid_pos' to the last successfully parsed
    trx in the trx log (instead of the engine's view of the trx log) in
    MYSQL_BIN_LOG::recover()
  2. executed_gtid_set should be initialized based on the engine's view of
the trx log file cordinates. So, during startup appropriate flags need
to be passed into MYSQL_BIN_LOG::init_gtid_sets(). init_gtid_sets() is
already changed to handle this, but the flag was not set correctly
during server startup
3. Another fix is in MYSQL_BIN_LOG::init_gtid_sets()to corretly set the position
to read and calculate executed-gtid-set (based on the file name read from the
engine)

Reviewed By: anirbanr-fb

Differential Revision: D31284902

---------------------------------------------------------------------

show_raft_status should take LOCK_status

Summary:
This is a straight port of a 5.6 patch to 8.0

SHOW RAFT STATUS and SHOW GLOBAL STATUS go to the
same functions in the plugin and access shared data structures.
These functions return internal char * pointers to plugin global
variables. They need to be serialized with LOCK_status otherwise
it leads to race conditions.

Reviewed By: li-chi

Differential Revision: D31318340

---------------------------------------------------------------------

Disallowing reset master on raft replicasets

Summary:
This is a straight port of similar patch on 5.6 raft

reset master is inherently not raft compliant.
Its get rid of all binlogs and make an instance appear like
a fresh single instance database without consulting the ring.
We need to block it.

However under certain circumstances (in the future ) e.g. during
first time replicaset build, or when we can guarantee that instance
is single node raft ring, we can potentially do a reset master
followed by rebootstrap of raft. This can also be achieved by
disabling raft, reset master and then re-enabling raft, however
to keep open the door, I have left a force option and will thread
a mechanism from the plugin to call reset_master(force=true)

Reviewed By: li-chi

Differential Revision: D31299214

---------------------------------------------------------------------

Setting topology config in MTR

Summary:
Setting topology config in MTR so that feature that use
topology config can be tested easily. Setting rpl_raft_skip_smc_updates
to avoid unnecessary calls to SMC (even though we supply a dummy
replicaset name).

Reviewed By: li-chi

Differential Revision: D31543877

---------------------------------------------------------------------

Do not allow sql thread start when raft is doing a stop->{}->start transition

Summary:
This is a port of an existing patch made to 5.6 for Raft.

Raft will do a stop of the SQL thread during StopAllWrites.
Then it will repoint the binlog files and during that action, the
SQL threads have to remain stopped. We block it out in this diff by
keeping an atomic bool which can be checked from other functions.

This only applies to raft mode i.e. enable_raft_plugin = true.

Reviewed By: li-chi

Differential Revision: D31319499

---------------------------------------------------------------------

Handle printing of FD event generated by slave SQL thread

Summary:
Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries.

The original commit which added this is d048c0f (P173872135)

To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'.

Reviewed By: anirbanr-fb

Differential Revision: D26359417

---------------------------------------------------------------------

Add exponential backoff for smart restart

Summary:
RAFT Instance crash and Tx Logs filling up.

rocksdb can fill up txlogs.
we should stop restarting mysqld if we have restarted many times in a day

Reviewed By: anirbanr-fb

Differential Revision: D27372750

---------------------------------------------------------------------

always release data_lock mutex to avoid deadlock

Summary: in stage-1 replicaset, when kill a secondary instance, sometime the instance will run into deadlock due to process_raft_queue thread forgot to release its acquired mutex in raft_change_master

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D27602667

---------------------------------------------------------------------

Gracefully exit mysqld_safe loop during backoff

Summary:
Currentt systemctl stop [email protected] can take 10 mins when mysqld_safe is in backoff period.

D28517599 adds a interrupt to sleep in mysql_stop, and mysqld_safe immediately break the retry loop if sleep is interruptted.

Reviewed By: anirbanr-fb

Differential Revision: D28606439

---------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

---------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752

---------------------------------------------------------------------

Reduce max backoff time from 24h to 30mins

Summary:
Over the recent SEV0, raft instances crash looping because of block skew. Even after the clock is normal, mysqld_safe failed to bring back mysqld because extended 24hours backoff.

This diff reduces the max backoff time from 24h to 30mins. So it will still keep trying, but not so aggressively that filling up the trx logs.

Reviewed By: bart2

Differential Revision: D31661172
inikep added a commit that referenced this pull request Apr 6, 2022
recover raft logs by removing partial trxs

Summary:
Port D24628821

mysqld removes partial trxs in the tail of trx log (named binary-logs on
primaries and apply-logs on secondaries) during startup. However, relay logs
were not of much importance since it was anyways discarded and a new one would
be created.
However, with raft, this is not ideal. Relay logs are raft logs on secondaries
and have to be kept around (and kept sane and consistent). This diff adds the
ability to remove partial trxs from raft/relay logs.
Much of the code to open the last relay log (based on relay log index) and
identify partial trxs is borrowed from existing logic in
MYSQL_BIN_LOG::open_binlog() and binlog_recover()

Reviewed By: Pushapgl

Differential Revision: D26447448

---------------------------------------------------------------------

Checking for inited bool to make sure global_init_info was successful

Summary:
Port D25584004

A master.info and relay.info file can be present
but needs to be properly inited for use. We were bypassing the inited
check which could lead to issues in Raft.
In case there is an error in global_init_info, Raft will do a
raft_reset_slave and make another attempt at it. If both recourses
fail, the init of the plugin would fail.

Reviewed By: Pushapgl

Differential Revision: D26447457

---------------------------------------------------------------------

Support for dumping raft logs to vanilla async replicas

Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

---------------------------------------------------------------------

Fixed the flaky raft test suite

Summary:
First clean run of entire raft test suite :)

**Changes**
* Reset apply_logs on raft secondaries before the start of every test
* Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit.

Reviewed By: luqun

Differential Revision: D26651257

---------------------------------------------------------------------

return error from Raft_replication_delegate when plugin is not available

Summary:
Port D23065441 (facebook@b9067f7)

The new macro is used to call into raft plugin. If plugin gets unloaded
accidentally when enable_raft_plugin is ON, then this STRICT version returns
failure. This is to be called only by raft plugin currently

Reviewed By: Pushapgl

Differential Revision: D26447523

---------------------------------------------------------------------

Adding timestamps for raft rotates which happen in the context of listener thread

Summary:
Port D25572614

The timestamp of a binlog event is picked up from the when field in
the event. In most cases of rotation, the when is left unpopulated
during rotation for the top 3 events (fd, pgtid, metadata). However
in such a situation, a normal rotate (flush binary logs) still manages
to get a valid timestamp, since the thread in which the flush binary
logs happens has a valid start time.
Now enter Raft relay log rotations. In those cases and in the case
of config change rotate, the rotations are happening in the context
of a raft listener queue thread. In that context, the when and start
time of the thread are both 0. The diff handles this case by populating
the when field appropriately.

Reviewed By: bhatvinay

Differential Revision: D26194612

---------------------------------------------------------------------

Raft abrupt stepdown and trim binlog file / gtid test

Summary: binlog file should get trimmed for abrupt stepdown

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D26169975

---------------------------------------------------------------------

Port: Fixes around raft log truncation.

Summary:
**Notes**
* New functions to block and unblock dump threads for plugin to use
during raft log truncation.

Below check is already done in raft plugin as part of raft plugin in D26866429.
* Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of
just seeking to the beginning to handle the case when raft log is
truncated before starting the applier

Reviewed By: luqun

Differential Revision: D26759813

---------------------------------------------------------------------

rebase due to relay log Format_description_event event size difference

Summary:
WL#3549: Binlog Compression add extra 1 bytes data into Format_description_event

  3298 @@ -134,6 +137,7 @@ Format_description_event::Format_description_event(uint8_t binlog_ver,
  3299            VIEW_CHANGE_HEADER_LEN,
  3300            XA_PREPARE_HEADER_LEN,
  3301            ROWS_HEADER_LEN_V2,
  3302 +          TRANSACTION_PAYLOAD_EVENT,
  3303        };

Reviewed By: Pushapgl

Differential Revision: D27145095

---------------------------------------------------------------------

fix raft change string metadata event

Summary:
Saw a bunch stage-1 replicaset instance failed due to config change string failure during add/remove instance
```
I0401 00:16:10.894434 1823842 IoCacheUtils.cpp:333] getConfigChangeString eventType = ^G
E0401 00:16:10.894451 1823842 IoCacheUtils.cpp:363] Failed to read metadata event body from cache
E0401 00:16:10.894456 1823842 MysqlRaft.cpp:659] [replicate] Failed to get config change string from iocache
2021-04-01T00:16:10.894464-07:00 8307 [ERROR] [MY-010207] [Repl] Run function 'before_flush' in plugin 'RPL_RAFT' failed
2021-04-01T00:16:10.894478-07:00 8307 [ERROR] [MY-000000] [Server] Failed to rotate binary log
```

After some investigation, the issue is caused is that calculate metadata event length with config change string but forgot write config change string into event body.

Reviewed By: Pushapgl

Differential Revision: D27504157

---------------------------------------------------------------------

Tells raft mode in binlog

Summary:
WIn-Win: Tell whether this binlog is generated while this host is in raft mode.

We already has the bit in FD event, and this diff just speaks it out.

Reviewed By: mpercy

Differential Revision: D28664300

---------------------------------------------------------------------

fix flaky raft testcase

Summary:
There are multiple issues for MTR:
1. in sql/rpl_binlog_sender.cc, if secondaries IO thread receives fatal_error,
   it will quit IO thread instead of reconnect. use unknown error so that
   secondary IO thread can try to reconnect
2. mtr.add_suppression() call: mtr.add_suppression() will execute an insert
   mysql statement into mtr.test_suppressions table and mtr.test_suppressions
   table doesn't contain primary key, thus during idempotent recovery, secondary
   will fail to execute mtr.add_suppression() due to missing PK. Try to move all
   mtr.add_suppression() at the end of testcase to workaround  idempotent recovery failure.
3. When promotion, use raft_promote_to_leader.inc instead of `set rpl_raft_new_leader_uuid`,
   since raft_promote_to_leader will wait the new primary state becomes writeable
4. pass specific warning instead of '.*' to mtr.add_supression
5. etc

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D28774820

---------------------------------------------------------------------

Using locks in Dump_log methods only when raft is enabled

Summary:
We don't need to take locks in non-raft mode since the
underlying MYSQL_BIN_LOG obj will never change. To handle race between
updation of enable_raft_plugin var and using its values in Dump_log we
kill and block all dump threads while updating the var and unblock them
once the var is updated.

Reviewed By: bhatvinay

Differential Revision: D28905522

---------------------------------------------------------------------

leader election to be a sync point in the new leader

Summary:
Port of D27582002 (facebook@39c70ca) from 5.6.35 to 8.0

Newly elected raft leader makes sure that all trxs from the previous
leader is committed by sql appliers. It then switches the server's trx
logs from apply-log-* to binary-log-*. To other part of the system this
looks like a rotation, but the necessary sync calls are not made here.
So, if the server (or os) restarts, then the storage engine could lose
the commit markers of the last batch of trxs. This will result in silent
data drift.
This diff fixes the problem by making an explicit call to
ha_flush_logs() before switching the server's trx logs

Reviewed By: luqun

Differential Revision: D28880407

---------------------------------------------------------------------

Error out SQL thread on out of order opids in raft logs

Summary:
OpIds should always be in order in the raft logs. Added a check
in SQL threads that thows an error and stops the applier when out of
order opids are detected.

Reviewed By: li-chi

Differential Revision: D28810840

---------------------------------------------------------------------

add GET_COMMITTED_GTIDS for raft

Summary:
During recovery Raft Log::Init needs to check with server
what gtids have been committed. Before doing that it finds the entire
set of trxs in the last raft log. The difference between logged -
committed are the pending opids.

Reviewed By: Pushapgl

Differential Revision: D29622786

---------------------------------------------------------------------

raft: skip mts_recovery_groups during start slave

Summary:
During MySQL8+Raft DMP, some instance fail to switch to Leader or start slave

```
2021-06-24T17:56:38.627423-07:00 431 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name /data/mysql/3127/bls-unittestdb658.frc2-3305-mysql.replicaset.180021/binary-logs-3727.000033, event_master_log_pos 1129.
2021-06-24T17:56:38.627473-07:00 431 [ERROR] [MY-010575] [Repl] Error looking for file after /binlogs/binary-logs-3307.000120.
2021-06-24T17:56:38.627516-07:00 431 [ERROR] [MY-000000] [Repl] load_mi_and_rli_from_repositories: rli_init_info returned error
```

similar to 5.6, we don't need to run mts_recovery_groups due to GTID_MODE is always enabled.

Reviewed By: Pushapgl

Differential Revision: D29520066

---------------------------------------------------------------------

update RaftListenerCallbackArg struct

Summary: Due to contract between raft plugin and mysql change, update RaftListenerCallbackArg struct to add  master_uuid field

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D29981349

---------------------------------------------------------------------

always check mi during raft_reset_slave

Summary: add similar nullptr check for raft_reset_slave as raft_stop_sql_thread

Reviewed By: bhatvinay

Differential Revision: D29967915

---------------------------------------------------------------------

raft mtr: update rpl_end_raft.inc for disable-raft

Summary:
rpl_end_raft.inc will unregister raft plugin which will call reset slaves when online disable-raft is enabled. move rpl_end_raft.inc after rpl_stop_slaves.inc to stop slave correctly first.

after stop slaves, call mtr.add_suppression("") won't replicate to slaves. just call mtr.add_suppression("") for all instances(replicas).

Reviewed By: yizhang82

Differential Revision: D30062236

---------------------------------------------------------------------

fix master_uuid

Summary: fix master_uuid in raft mode.

Reviewed By: luqun

Differential Revision: D30261465

---------------------------------------------------------------------

incorrect File_size value in show raft logs result

Summary:
in show raft logs result, the File_size for latest logs file isn't updated correctly.

such as show raft logs;

```
+-------------------------+------------
| Log_name                | File_size  |
| binary-logs-3304.000670 |       1529 |
```

in fact
```
-rw-r----- 1 mysql backup 1669180487 Aug  4 14:49 binary-logs-3304.000670
-rw-r----- 1 mysql backup 1723994154 Aug  4 14:49 binary-logs-3304.000670
```
the  file_size from show raft logs is always 1529 but the real file_size is 1669180487.

The issue is related when writing IO_CACHE directly, its wrapper ostream's position isn't updated.

Reviewed By: yizhang82

Differential Revision: D30116345

---------------------------------------------------------------------

Add gtid_purged_for_tailing

Summary:
Add a new global variable gtid_purged_for_tailing.

It shows the purged GTID for binlog in non-raft mode, and purged GTID for raft log in raft mode.

Reviewed By: abhinav04sharma

Differential Revision: D29372776

---------------------------------------------------------------------

disable skip_setup_replica_if_unnecessary

Summary: After sync with latest official raft plugin, most of MTR failed due to skip_setup_replica_if_unnecessary optimize.

Reviewed By: yizhang82

Differential Revision: D30821648

---------------------------------------------------------------------

latency histograms for raft trx wait

Summary: Port of support added for the same in mysql 5.6. Should help monitor latencies in 8.0 + raft stage -1 replicasets.

Reviewed By: luqun

Differential Revision: D31064764

---------------------------------------------------------------------

handle cases where clean shutdown in raft aborts trxs

Summary: This is a port of the feature in 5.6.

Reviewed By: anirbanr-fb

Differential Revision: D31070593

---------------------------------------------------------------------

fix crash during binlog purging

Summary:
Any error returned by the plugin during binlog purging results in a
crash in mysql8 as the server tries to execute
binlog_error_action_abort. We need to differentiate explicitly between a
plugin error and other error (such as error related to doing disk IO
etc). In thsi particular case, the crash is because of trying to purge a
file that does not exist (i.e which is already purged previosuly) and
raft cannot find it in its index chunk (so it returns a error).

Reviewed By: anirbanr-fb

Differential Revision: D31149997

---------------------------------------------------------------------

update flaky rpl_raft_purged_gtids_dump_threads

Summary:
rpl_raft_purged_gtids_dump_threads MTR failed due to "Cannot replicate because the master purged required binary logs" after server4 will tail server2.

Try to sync server4 before switch to tail server2.

Reviewed By: bhatvinay

Differential Revision: D30818078

---------------------------------------------------------------------

fix various porting issues in mysql8 (raft) crash recovery

Summary:
1. Trimming of the binlog is left to raft plugin (based on the current
    leader's log). Server should skip this step as part of recovery. This
    essentially means setting 'valid_pos' to the last successfully parsed
    trx in the trx log (instead of the engine's view of the trx log) in
    MYSQL_BIN_LOG::recover()
  2. executed_gtid_set should be initialized based on the engine's view of
the trx log file cordinates. So, during startup appropriate flags need
to be passed into MYSQL_BIN_LOG::init_gtid_sets(). init_gtid_sets() is
already changed to handle this, but the flag was not set correctly
during server startup
3. Another fix is in MYSQL_BIN_LOG::init_gtid_sets()to corretly set the position
to read and calculate executed-gtid-set (based on the file name read from the
engine)

Reviewed By: anirbanr-fb

Differential Revision: D31284902

---------------------------------------------------------------------

show_raft_status should take LOCK_status

Summary:
This is a straight port of a 5.6 patch to 8.0

SHOW RAFT STATUS and SHOW GLOBAL STATUS go to the
same functions in the plugin and access shared data structures.
These functions return internal char * pointers to plugin global
variables. They need to be serialized with LOCK_status otherwise
it leads to race conditions.

Reviewed By: li-chi

Differential Revision: D31318340

---------------------------------------------------------------------

Disallowing reset master on raft replicasets

Summary:
This is a straight port of similar patch on 5.6 raft

reset master is inherently not raft compliant.
Its get rid of all binlogs and make an instance appear like
a fresh single instance database without consulting the ring.
We need to block it.

However under certain circumstances (in the future ) e.g. during
first time replicaset build, or when we can guarantee that instance
is single node raft ring, we can potentially do a reset master
followed by rebootstrap of raft. This can also be achieved by
disabling raft, reset master and then re-enabling raft, however
to keep open the door, I have left a force option and will thread
a mechanism from the plugin to call reset_master(force=true)

Reviewed By: li-chi

Differential Revision: D31299214

---------------------------------------------------------------------

Setting topology config in MTR

Summary:
Setting topology config in MTR so that feature that use
topology config can be tested easily. Setting rpl_raft_skip_smc_updates
to avoid unnecessary calls to SMC (even though we supply a dummy
replicaset name).

Reviewed By: li-chi

Differential Revision: D31543877

---------------------------------------------------------------------

Do not allow sql thread start when raft is doing a stop->{}->start transition

Summary:
This is a port of an existing patch made to 5.6 for Raft.

Raft will do a stop of the SQL thread during StopAllWrites.
Then it will repoint the binlog files and during that action, the
SQL threads have to remain stopped. We block it out in this diff by
keeping an atomic bool which can be checked from other functions.

This only applies to raft mode i.e. enable_raft_plugin = true.

Reviewed By: li-chi

Differential Revision: D31319499

---------------------------------------------------------------------

Handle printing of FD event generated by slave SQL thread

Summary:
Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries.

The original commit which added this is d048c0f (P173872135)

To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'.

Reviewed By: anirbanr-fb

Differential Revision: D26359417

---------------------------------------------------------------------

Add exponential backoff for smart restart

Summary:
RAFT Instance crash and Tx Logs filling up.

rocksdb can fill up txlogs.
we should stop restarting mysqld if we have restarted many times in a day

Reviewed By: anirbanr-fb

Differential Revision: D27372750

---------------------------------------------------------------------

always release data_lock mutex to avoid deadlock

Summary: in stage-1 replicaset, when kill a secondary instance, sometime the instance will run into deadlock due to process_raft_queue thread forgot to release its acquired mutex in raft_change_master

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D27602667

---------------------------------------------------------------------

Gracefully exit mysqld_safe loop during backoff

Summary:
Currentt systemctl stop [email protected] can take 10 mins when mysqld_safe is in backoff period.

D28517599 adds a interrupt to sleep in mysql_stop, and mysqld_safe immediately break the retry loop if sleep is interruptted.

Reviewed By: anirbanr-fb

Differential Revision: D28606439

---------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

---------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752

---------------------------------------------------------------------

Reduce max backoff time from 24h to 30mins

Summary:
Over the recent SEV0, raft instances crash looping because of block skew. Even after the clock is normal, mysqld_safe failed to bring back mysqld because extended 24hours backoff.

This diff reduces the max backoff time from 24h to 30mins. So it will still keep trying, but not so aggressively that filling up the trx logs.

Reviewed By: bart2

Differential Revision: D31661172
inikep added a commit that referenced this pull request Apr 7, 2022
recover raft logs by removing partial trxs

Summary:
Port D24628821

mysqld removes partial trxs in the tail of trx log (named binary-logs on
primaries and apply-logs on secondaries) during startup. However, relay logs
were not of much importance since it was anyways discarded and a new one would
be created.
However, with raft, this is not ideal. Relay logs are raft logs on secondaries
and have to be kept around (and kept sane and consistent). This diff adds the
ability to remove partial trxs from raft/relay logs.
Much of the code to open the last relay log (based on relay log index) and
identify partial trxs is borrowed from existing logic in
MYSQL_BIN_LOG::open_binlog() and binlog_recover()

Reviewed By: Pushapgl

Differential Revision: D26447448

---------------------------------------------------------------------

Checking for inited bool to make sure global_init_info was successful

Summary:
Port D25584004

A master.info and relay.info file can be present
but needs to be properly inited for use. We were bypassing the inited
check which could lead to issues in Raft.
In case there is an error in global_init_info, Raft will do a
raft_reset_slave and make another attempt at it. If both recourses
fail, the init of the plugin would fail.

Reviewed By: Pushapgl

Differential Revision: D26447457

---------------------------------------------------------------------

Support for dumping raft logs to vanilla async replicas

Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

---------------------------------------------------------------------

Fixed the flaky raft test suite

Summary:
First clean run of entire raft test suite :)

**Changes**
* Reset apply_logs on raft secondaries before the start of every test
* Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit.

Reviewed By: luqun

Differential Revision: D26651257

---------------------------------------------------------------------

return error from Raft_replication_delegate when plugin is not available

Summary:
Port D23065441 (facebook@b9067f7)

The new macro is used to call into raft plugin. If plugin gets unloaded
accidentally when enable_raft_plugin is ON, then this STRICT version returns
failure. This is to be called only by raft plugin currently

Reviewed By: Pushapgl

Differential Revision: D26447523

---------------------------------------------------------------------

Adding timestamps for raft rotates which happen in the context of listener thread

Summary:
Port D25572614

The timestamp of a binlog event is picked up from the when field in
the event. In most cases of rotation, the when is left unpopulated
during rotation for the top 3 events (fd, pgtid, metadata). However
in such a situation, a normal rotate (flush binary logs) still manages
to get a valid timestamp, since the thread in which the flush binary
logs happens has a valid start time.
Now enter Raft relay log rotations. In those cases and in the case
of config change rotate, the rotations are happening in the context
of a raft listener queue thread. In that context, the when and start
time of the thread are both 0. The diff handles this case by populating
the when field appropriately.

Reviewed By: bhatvinay

Differential Revision: D26194612

---------------------------------------------------------------------

Raft abrupt stepdown and trim binlog file / gtid test

Summary: binlog file should get trimmed for abrupt stepdown

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D26169975

---------------------------------------------------------------------

Port: Fixes around raft log truncation.

Summary:
**Notes**
* New functions to block and unblock dump threads for plugin to use
during raft log truncation.

Below check is already done in raft plugin as part of raft plugin in D26866429.
* Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of
just seeking to the beginning to handle the case when raft log is
truncated before starting the applier

Reviewed By: luqun

Differential Revision: D26759813

---------------------------------------------------------------------

rebase due to relay log Format_description_event event size difference

Summary:
WL#3549: Binlog Compression add extra 1 bytes data into Format_description_event

  3298 @@ -134,6 +137,7 @@ Format_description_event::Format_description_event(uint8_t binlog_ver,
  3299            VIEW_CHANGE_HEADER_LEN,
  3300            XA_PREPARE_HEADER_LEN,
  3301            ROWS_HEADER_LEN_V2,
  3302 +          TRANSACTION_PAYLOAD_EVENT,
  3303        };

Reviewed By: Pushapgl

Differential Revision: D27145095

---------------------------------------------------------------------

fix raft change string metadata event

Summary:
Saw a bunch stage-1 replicaset instance failed due to config change string failure during add/remove instance
```
I0401 00:16:10.894434 1823842 IoCacheUtils.cpp:333] getConfigChangeString eventType = ^G
E0401 00:16:10.894451 1823842 IoCacheUtils.cpp:363] Failed to read metadata event body from cache
E0401 00:16:10.894456 1823842 MysqlRaft.cpp:659] [replicate] Failed to get config change string from iocache
2021-04-01T00:16:10.894464-07:00 8307 [ERROR] [MY-010207] [Repl] Run function 'before_flush' in plugin 'RPL_RAFT' failed
2021-04-01T00:16:10.894478-07:00 8307 [ERROR] [MY-000000] [Server] Failed to rotate binary log
```

After some investigation, the issue is caused is that calculate metadata event length with config change string but forgot write config change string into event body.

Reviewed By: Pushapgl

Differential Revision: D27504157

---------------------------------------------------------------------

Tells raft mode in binlog

Summary:
WIn-Win: Tell whether this binlog is generated while this host is in raft mode.

We already has the bit in FD event, and this diff just speaks it out.

Reviewed By: mpercy

Differential Revision: D28664300

---------------------------------------------------------------------

fix flaky raft testcase

Summary:
There are multiple issues for MTR:
1. in sql/rpl_binlog_sender.cc, if secondaries IO thread receives fatal_error,
   it will quit IO thread instead of reconnect. use unknown error so that
   secondary IO thread can try to reconnect
2. mtr.add_suppression() call: mtr.add_suppression() will execute an insert
   mysql statement into mtr.test_suppressions table and mtr.test_suppressions
   table doesn't contain primary key, thus during idempotent recovery, secondary
   will fail to execute mtr.add_suppression() due to missing PK. Try to move all
   mtr.add_suppression() at the end of testcase to workaround  idempotent recovery failure.
3. When promotion, use raft_promote_to_leader.inc instead of `set rpl_raft_new_leader_uuid`,
   since raft_promote_to_leader will wait the new primary state becomes writeable
4. pass specific warning instead of '.*' to mtr.add_supression
5. etc

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D28774820

---------------------------------------------------------------------

Using locks in Dump_log methods only when raft is enabled

Summary:
We don't need to take locks in non-raft mode since the
underlying MYSQL_BIN_LOG obj will never change. To handle race between
updation of enable_raft_plugin var and using its values in Dump_log we
kill and block all dump threads while updating the var and unblock them
once the var is updated.

Reviewed By: bhatvinay

Differential Revision: D28905522

---------------------------------------------------------------------

leader election to be a sync point in the new leader

Summary:
Port of D27582002 (facebook@39c70ca) from 5.6.35 to 8.0

Newly elected raft leader makes sure that all trxs from the previous
leader is committed by sql appliers. It then switches the server's trx
logs from apply-log-* to binary-log-*. To other part of the system this
looks like a rotation, but the necessary sync calls are not made here.
So, if the server (or os) restarts, then the storage engine could lose
the commit markers of the last batch of trxs. This will result in silent
data drift.
This diff fixes the problem by making an explicit call to
ha_flush_logs() before switching the server's trx logs

Reviewed By: luqun

Differential Revision: D28880407

---------------------------------------------------------------------

Error out SQL thread on out of order opids in raft logs

Summary:
OpIds should always be in order in the raft logs. Added a check
in SQL threads that thows an error and stops the applier when out of
order opids are detected.

Reviewed By: li-chi

Differential Revision: D28810840

---------------------------------------------------------------------

add GET_COMMITTED_GTIDS for raft

Summary:
During recovery Raft Log::Init needs to check with server
what gtids have been committed. Before doing that it finds the entire
set of trxs in the last raft log. The difference between logged -
committed are the pending opids.

Reviewed By: Pushapgl

Differential Revision: D29622786

---------------------------------------------------------------------

raft: skip mts_recovery_groups during start slave

Summary:
During MySQL8+Raft DMP, some instance fail to switch to Leader or start slave

```
2021-06-24T17:56:38.627423-07:00 431 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name /data/mysql/3127/bls-unittestdb658.frc2-3305-mysql.replicaset.180021/binary-logs-3727.000033, event_master_log_pos 1129.
2021-06-24T17:56:38.627473-07:00 431 [ERROR] [MY-010575] [Repl] Error looking for file after /binlogs/binary-logs-3307.000120.
2021-06-24T17:56:38.627516-07:00 431 [ERROR] [MY-000000] [Repl] load_mi_and_rli_from_repositories: rli_init_info returned error
```

similar to 5.6, we don't need to run mts_recovery_groups due to GTID_MODE is always enabled.

Reviewed By: Pushapgl

Differential Revision: D29520066

---------------------------------------------------------------------

update RaftListenerCallbackArg struct

Summary: Due to contract between raft plugin and mysql change, update RaftListenerCallbackArg struct to add  master_uuid field

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D29981349

---------------------------------------------------------------------

always check mi during raft_reset_slave

Summary: add similar nullptr check for raft_reset_slave as raft_stop_sql_thread

Reviewed By: bhatvinay

Differential Revision: D29967915

---------------------------------------------------------------------

raft mtr: update rpl_end_raft.inc for disable-raft

Summary:
rpl_end_raft.inc will unregister raft plugin which will call reset slaves when online disable-raft is enabled. move rpl_end_raft.inc after rpl_stop_slaves.inc to stop slave correctly first.

after stop slaves, call mtr.add_suppression("") won't replicate to slaves. just call mtr.add_suppression("") for all instances(replicas).

Reviewed By: yizhang82

Differential Revision: D30062236

---------------------------------------------------------------------

fix master_uuid

Summary: fix master_uuid in raft mode.

Reviewed By: luqun

Differential Revision: D30261465

---------------------------------------------------------------------

incorrect File_size value in show raft logs result

Summary:
in show raft logs result, the File_size for latest logs file isn't updated correctly.

such as show raft logs;

```
+-------------------------+------------
| Log_name                | File_size  |
| binary-logs-3304.000670 |       1529 |
```

in fact
```
-rw-r----- 1 mysql backup 1669180487 Aug  4 14:49 binary-logs-3304.000670
-rw-r----- 1 mysql backup 1723994154 Aug  4 14:49 binary-logs-3304.000670
```
the  file_size from show raft logs is always 1529 but the real file_size is 1669180487.

The issue is related when writing IO_CACHE directly, its wrapper ostream's position isn't updated.

Reviewed By: yizhang82

Differential Revision: D30116345

---------------------------------------------------------------------

Add gtid_purged_for_tailing

Summary:
Add a new global variable gtid_purged_for_tailing.

It shows the purged GTID for binlog in non-raft mode, and purged GTID for raft log in raft mode.

Reviewed By: abhinav04sharma

Differential Revision: D29372776

---------------------------------------------------------------------

disable skip_setup_replica_if_unnecessary

Summary: After sync with latest official raft plugin, most of MTR failed due to skip_setup_replica_if_unnecessary optimize.

Reviewed By: yizhang82

Differential Revision: D30821648

---------------------------------------------------------------------

latency histograms for raft trx wait

Summary: Port of support added for the same in mysql 5.6. Should help monitor latencies in 8.0 + raft stage -1 replicasets.

Reviewed By: luqun

Differential Revision: D31064764

---------------------------------------------------------------------

handle cases where clean shutdown in raft aborts trxs

Summary: This is a port of the feature in 5.6.

Reviewed By: anirbanr-fb

Differential Revision: D31070593

---------------------------------------------------------------------

fix crash during binlog purging

Summary:
Any error returned by the plugin during binlog purging results in a
crash in mysql8 as the server tries to execute
binlog_error_action_abort. We need to differentiate explicitly between a
plugin error and other error (such as error related to doing disk IO
etc). In thsi particular case, the crash is because of trying to purge a
file that does not exist (i.e which is already purged previosuly) and
raft cannot find it in its index chunk (so it returns a error).

Reviewed By: anirbanr-fb

Differential Revision: D31149997

---------------------------------------------------------------------

update flaky rpl_raft_purged_gtids_dump_threads

Summary:
rpl_raft_purged_gtids_dump_threads MTR failed due to "Cannot replicate because the master purged required binary logs" after server4 will tail server2.

Try to sync server4 before switch to tail server2.

Reviewed By: bhatvinay

Differential Revision: D30818078

---------------------------------------------------------------------

fix various porting issues in mysql8 (raft) crash recovery

Summary:
1. Trimming of the binlog is left to raft plugin (based on the current
    leader's log). Server should skip this step as part of recovery. This
    essentially means setting 'valid_pos' to the last successfully parsed
    trx in the trx log (instead of the engine's view of the trx log) in
    MYSQL_BIN_LOG::recover()
  2. executed_gtid_set should be initialized based on the engine's view of
the trx log file cordinates. So, during startup appropriate flags need
to be passed into MYSQL_BIN_LOG::init_gtid_sets(). init_gtid_sets() is
already changed to handle this, but the flag was not set correctly
during server startup
3. Another fix is in MYSQL_BIN_LOG::init_gtid_sets()to corretly set the position
to read and calculate executed-gtid-set (based on the file name read from the
engine)

Reviewed By: anirbanr-fb

Differential Revision: D31284902

---------------------------------------------------------------------

show_raft_status should take LOCK_status

Summary:
This is a straight port of a 5.6 patch to 8.0

SHOW RAFT STATUS and SHOW GLOBAL STATUS go to the
same functions in the plugin and access shared data structures.
These functions return internal char * pointers to plugin global
variables. They need to be serialized with LOCK_status otherwise
it leads to race conditions.

Reviewed By: li-chi

Differential Revision: D31318340

---------------------------------------------------------------------

Disallowing reset master on raft replicasets

Summary:
This is a straight port of similar patch on 5.6 raft

reset master is inherently not raft compliant.
Its get rid of all binlogs and make an instance appear like
a fresh single instance database without consulting the ring.
We need to block it.

However under certain circumstances (in the future ) e.g. during
first time replicaset build, or when we can guarantee that instance
is single node raft ring, we can potentially do a reset master
followed by rebootstrap of raft. This can also be achieved by
disabling raft, reset master and then re-enabling raft, however
to keep open the door, I have left a force option and will thread
a mechanism from the plugin to call reset_master(force=true)

Reviewed By: li-chi

Differential Revision: D31299214

---------------------------------------------------------------------

Setting topology config in MTR

Summary:
Setting topology config in MTR so that feature that use
topology config can be tested easily. Setting rpl_raft_skip_smc_updates
to avoid unnecessary calls to SMC (even though we supply a dummy
replicaset name).

Reviewed By: li-chi

Differential Revision: D31543877

---------------------------------------------------------------------

Do not allow sql thread start when raft is doing a stop->{}->start transition

Summary:
This is a port of an existing patch made to 5.6 for Raft.

Raft will do a stop of the SQL thread during StopAllWrites.
Then it will repoint the binlog files and during that action, the
SQL threads have to remain stopped. We block it out in this diff by
keeping an atomic bool which can be checked from other functions.

This only applies to raft mode i.e. enable_raft_plugin = true.

Reviewed By: li-chi

Differential Revision: D31319499

---------------------------------------------------------------------

Handle printing of FD event generated by slave SQL thread

Summary:
Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries.

The original commit which added this is d048c0f (P173872135)

To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'.

Reviewed By: anirbanr-fb

Differential Revision: D26359417

---------------------------------------------------------------------

Add exponential backoff for smart restart

Summary:
RAFT Instance crash and Tx Logs filling up.

rocksdb can fill up txlogs.
we should stop restarting mysqld if we have restarted many times in a day

Reviewed By: anirbanr-fb

Differential Revision: D27372750

---------------------------------------------------------------------

always release data_lock mutex to avoid deadlock

Summary: in stage-1 replicaset, when kill a secondary instance, sometime the instance will run into deadlock due to process_raft_queue thread forgot to release its acquired mutex in raft_change_master

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D27602667

---------------------------------------------------------------------

Gracefully exit mysqld_safe loop during backoff

Summary:
Currentt systemctl stop [email protected] can take 10 mins when mysqld_safe is in backoff period.

D28517599 adds a interrupt to sleep in mysql_stop, and mysqld_safe immediately break the retry loop if sleep is interruptted.

Reviewed By: anirbanr-fb

Differential Revision: D28606439

---------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

---------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752

---------------------------------------------------------------------

Reduce max backoff time from 24h to 30mins

Summary:
Over the recent SEV0, raft instances crash looping because of block skew. Even after the clock is normal, mysqld_safe failed to bring back mysqld because extended 24hours backoff.

This diff reduces the max backoff time from 24h to 30mins. So it will still keep trying, but not so aggressively that filling up the trx logs.

Reviewed By: bart2

Differential Revision: D31661172
inikep added a commit that referenced this pull request Apr 7, 2022
recover raft logs by removing partial trxs

Summary:
Port D24628821

mysqld removes partial trxs in the tail of trx log (named binary-logs on
primaries and apply-logs on secondaries) during startup. However, relay logs
were not of much importance since it was anyways discarded and a new one would
be created.
However, with raft, this is not ideal. Relay logs are raft logs on secondaries
and have to be kept around (and kept sane and consistent). This diff adds the
ability to remove partial trxs from raft/relay logs.
Much of the code to open the last relay log (based on relay log index) and
identify partial trxs is borrowed from existing logic in
MYSQL_BIN_LOG::open_binlog() and binlog_recover()

Reviewed By: Pushapgl

Differential Revision: D26447448

---------------------------------------------------------------------

Checking for inited bool to make sure global_init_info was successful

Summary:
Port D25584004

A master.info and relay.info file can be present
but needs to be properly inited for use. We were bypassing the inited
check which could lead to issues in Raft.
In case there is an error in global_init_info, Raft will do a
raft_reset_slave and make another attempt at it. If both recourses
fail, the init of the plugin would fail.

Reviewed By: Pushapgl

Differential Revision: D26447457

---------------------------------------------------------------------

Support for dumping raft logs to vanilla async replicas

Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

---------------------------------------------------------------------

Fixed the flaky raft test suite

Summary:
First clean run of entire raft test suite :)

**Changes**
* Reset apply_logs on raft secondaries before the start of every test
* Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit.

Reviewed By: luqun

Differential Revision: D26651257

---------------------------------------------------------------------

return error from Raft_replication_delegate when plugin is not available

Summary:
Port D23065441 (facebook@b9067f7)

The new macro is used to call into raft plugin. If plugin gets unloaded
accidentally when enable_raft_plugin is ON, then this STRICT version returns
failure. This is to be called only by raft plugin currently

Reviewed By: Pushapgl

Differential Revision: D26447523

---------------------------------------------------------------------

Adding timestamps for raft rotates which happen in the context of listener thread

Summary:
Port D25572614

The timestamp of a binlog event is picked up from the when field in
the event. In most cases of rotation, the when is left unpopulated
during rotation for the top 3 events (fd, pgtid, metadata). However
in such a situation, a normal rotate (flush binary logs) still manages
to get a valid timestamp, since the thread in which the flush binary
logs happens has a valid start time.
Now enter Raft relay log rotations. In those cases and in the case
of config change rotate, the rotations are happening in the context
of a raft listener queue thread. In that context, the when and start
time of the thread are both 0. The diff handles this case by populating
the when field appropriately.

Reviewed By: bhatvinay

Differential Revision: D26194612

---------------------------------------------------------------------

Raft abrupt stepdown and trim binlog file / gtid test

Summary: binlog file should get trimmed for abrupt stepdown

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D26169975

---------------------------------------------------------------------

Port: Fixes around raft log truncation.

Summary:
**Notes**
* New functions to block and unblock dump threads for plugin to use
during raft log truncation.

Below check is already done in raft plugin as part of raft plugin in D26866429.
* Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of
just seeking to the beginning to handle the case when raft log is
truncated before starting the applier

Reviewed By: luqun

Differential Revision: D26759813

---------------------------------------------------------------------

rebase due to relay log Format_description_event event size difference

Summary:
WL#3549: Binlog Compression add extra 1 bytes data into Format_description_event

  3298 @@ -134,6 +137,7 @@ Format_description_event::Format_description_event(uint8_t binlog_ver,
  3299            VIEW_CHANGE_HEADER_LEN,
  3300            XA_PREPARE_HEADER_LEN,
  3301            ROWS_HEADER_LEN_V2,
  3302 +          TRANSACTION_PAYLOAD_EVENT,
  3303        };

Reviewed By: Pushapgl

Differential Revision: D27145095

---------------------------------------------------------------------

fix raft change string metadata event

Summary:
Saw a bunch stage-1 replicaset instance failed due to config change string failure during add/remove instance
```
I0401 00:16:10.894434 1823842 IoCacheUtils.cpp:333] getConfigChangeString eventType = ^G
E0401 00:16:10.894451 1823842 IoCacheUtils.cpp:363] Failed to read metadata event body from cache
E0401 00:16:10.894456 1823842 MysqlRaft.cpp:659] [replicate] Failed to get config change string from iocache
2021-04-01T00:16:10.894464-07:00 8307 [ERROR] [MY-010207] [Repl] Run function 'before_flush' in plugin 'RPL_RAFT' failed
2021-04-01T00:16:10.894478-07:00 8307 [ERROR] [MY-000000] [Server] Failed to rotate binary log
```

After some investigation, the issue is caused is that calculate metadata event length with config change string but forgot write config change string into event body.

Reviewed By: Pushapgl

Differential Revision: D27504157

---------------------------------------------------------------------

Tells raft mode in binlog

Summary:
WIn-Win: Tell whether this binlog is generated while this host is in raft mode.

We already has the bit in FD event, and this diff just speaks it out.

Reviewed By: mpercy

Differential Revision: D28664300

---------------------------------------------------------------------

fix flaky raft testcase

Summary:
There are multiple issues for MTR:
1. in sql/rpl_binlog_sender.cc, if secondaries IO thread receives fatal_error,
   it will quit IO thread instead of reconnect. use unknown error so that
   secondary IO thread can try to reconnect
2. mtr.add_suppression() call: mtr.add_suppression() will execute an insert
   mysql statement into mtr.test_suppressions table and mtr.test_suppressions
   table doesn't contain primary key, thus during idempotent recovery, secondary
   will fail to execute mtr.add_suppression() due to missing PK. Try to move all
   mtr.add_suppression() at the end of testcase to workaround  idempotent recovery failure.
3. When promotion, use raft_promote_to_leader.inc instead of `set rpl_raft_new_leader_uuid`,
   since raft_promote_to_leader will wait the new primary state becomes writeable
4. pass specific warning instead of '.*' to mtr.add_supression
5. etc

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D28774820

---------------------------------------------------------------------

Using locks in Dump_log methods only when raft is enabled

Summary:
We don't need to take locks in non-raft mode since the
underlying MYSQL_BIN_LOG obj will never change. To handle race between
updation of enable_raft_plugin var and using its values in Dump_log we
kill and block all dump threads while updating the var and unblock them
once the var is updated.

Reviewed By: bhatvinay

Differential Revision: D28905522

---------------------------------------------------------------------

leader election to be a sync point in the new leader

Summary:
Port of D27582002 (facebook@39c70ca) from 5.6.35 to 8.0

Newly elected raft leader makes sure that all trxs from the previous
leader is committed by sql appliers. It then switches the server's trx
logs from apply-log-* to binary-log-*. To other part of the system this
looks like a rotation, but the necessary sync calls are not made here.
So, if the server (or os) restarts, then the storage engine could lose
the commit markers of the last batch of trxs. This will result in silent
data drift.
This diff fixes the problem by making an explicit call to
ha_flush_logs() before switching the server's trx logs

Reviewed By: luqun

Differential Revision: D28880407

---------------------------------------------------------------------

Error out SQL thread on out of order opids in raft logs

Summary:
OpIds should always be in order in the raft logs. Added a check
in SQL threads that thows an error and stops the applier when out of
order opids are detected.

Reviewed By: li-chi

Differential Revision: D28810840

---------------------------------------------------------------------

add GET_COMMITTED_GTIDS for raft

Summary:
During recovery Raft Log::Init needs to check with server
what gtids have been committed. Before doing that it finds the entire
set of trxs in the last raft log. The difference between logged -
committed are the pending opids.

Reviewed By: Pushapgl

Differential Revision: D29622786

---------------------------------------------------------------------

raft: skip mts_recovery_groups during start slave

Summary:
During MySQL8+Raft DMP, some instance fail to switch to Leader or start slave

```
2021-06-24T17:56:38.627423-07:00 431 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name /data/mysql/3127/bls-unittestdb658.frc2-3305-mysql.replicaset.180021/binary-logs-3727.000033, event_master_log_pos 1129.
2021-06-24T17:56:38.627473-07:00 431 [ERROR] [MY-010575] [Repl] Error looking for file after /binlogs/binary-logs-3307.000120.
2021-06-24T17:56:38.627516-07:00 431 [ERROR] [MY-000000] [Repl] load_mi_and_rli_from_repositories: rli_init_info returned error
```

similar to 5.6, we don't need to run mts_recovery_groups due to GTID_MODE is always enabled.

Reviewed By: Pushapgl

Differential Revision: D29520066

---------------------------------------------------------------------

update RaftListenerCallbackArg struct

Summary: Due to contract between raft plugin and mysql change, update RaftListenerCallbackArg struct to add  master_uuid field

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D29981349

---------------------------------------------------------------------

always check mi during raft_reset_slave

Summary: add similar nullptr check for raft_reset_slave as raft_stop_sql_thread

Reviewed By: bhatvinay

Differential Revision: D29967915

---------------------------------------------------------------------

raft mtr: update rpl_end_raft.inc for disable-raft

Summary:
rpl_end_raft.inc will unregister raft plugin which will call reset slaves when online disable-raft is enabled. move rpl_end_raft.inc after rpl_stop_slaves.inc to stop slave correctly first.

after stop slaves, call mtr.add_suppression("") won't replicate to slaves. just call mtr.add_suppression("") for all instances(replicas).

Reviewed By: yizhang82

Differential Revision: D30062236

---------------------------------------------------------------------

fix master_uuid

Summary: fix master_uuid in raft mode.

Reviewed By: luqun

Differential Revision: D30261465

---------------------------------------------------------------------

incorrect File_size value in show raft logs result

Summary:
in show raft logs result, the File_size for latest logs file isn't updated correctly.

such as show raft logs;

```
+-------------------------+------------
| Log_name                | File_size  |
| binary-logs-3304.000670 |       1529 |
```

in fact
```
-rw-r----- 1 mysql backup 1669180487 Aug  4 14:49 binary-logs-3304.000670
-rw-r----- 1 mysql backup 1723994154 Aug  4 14:49 binary-logs-3304.000670
```
the  file_size from show raft logs is always 1529 but the real file_size is 1669180487.

The issue is related when writing IO_CACHE directly, its wrapper ostream's position isn't updated.

Reviewed By: yizhang82

Differential Revision: D30116345

---------------------------------------------------------------------

Add gtid_purged_for_tailing

Summary:
Add a new global variable gtid_purged_for_tailing.

It shows the purged GTID for binlog in non-raft mode, and purged GTID for raft log in raft mode.

Reviewed By: abhinav04sharma

Differential Revision: D29372776

---------------------------------------------------------------------

disable skip_setup_replica_if_unnecessary

Summary: After sync with latest official raft plugin, most of MTR failed due to skip_setup_replica_if_unnecessary optimize.

Reviewed By: yizhang82

Differential Revision: D30821648

---------------------------------------------------------------------

latency histograms for raft trx wait

Summary: Port of support added for the same in mysql 5.6. Should help monitor latencies in 8.0 + raft stage -1 replicasets.

Reviewed By: luqun

Differential Revision: D31064764

---------------------------------------------------------------------

handle cases where clean shutdown in raft aborts trxs

Summary: This is a port of the feature in 5.6.

Reviewed By: anirbanr-fb

Differential Revision: D31070593

---------------------------------------------------------------------

fix crash during binlog purging

Summary:
Any error returned by the plugin during binlog purging results in a
crash in mysql8 as the server tries to execute
binlog_error_action_abort. We need to differentiate explicitly between a
plugin error and other error (such as error related to doing disk IO
etc). In thsi particular case, the crash is because of trying to purge a
file that does not exist (i.e which is already purged previosuly) and
raft cannot find it in its index chunk (so it returns a error).

Reviewed By: anirbanr-fb

Differential Revision: D31149997

---------------------------------------------------------------------

update flaky rpl_raft_purged_gtids_dump_threads

Summary:
rpl_raft_purged_gtids_dump_threads MTR failed due to "Cannot replicate because the master purged required binary logs" after server4 will tail server2.

Try to sync server4 before switch to tail server2.

Reviewed By: bhatvinay

Differential Revision: D30818078

---------------------------------------------------------------------

fix various porting issues in mysql8 (raft) crash recovery

Summary:
1. Trimming of the binlog is left to raft plugin (based on the current
    leader's log). Server should skip this step as part of recovery. This
    essentially means setting 'valid_pos' to the last successfully parsed
    trx in the trx log (instead of the engine's view of the trx log) in
    MYSQL_BIN_LOG::recover()
  2. executed_gtid_set should be initialized based on the engine's view of
the trx log file cordinates. So, during startup appropriate flags need
to be passed into MYSQL_BIN_LOG::init_gtid_sets(). init_gtid_sets() is
already changed to handle this, but the flag was not set correctly
during server startup
3. Another fix is in MYSQL_BIN_LOG::init_gtid_sets()to corretly set the position
to read and calculate executed-gtid-set (based on the file name read from the
engine)

Reviewed By: anirbanr-fb

Differential Revision: D31284902

---------------------------------------------------------------------

show_raft_status should take LOCK_status

Summary:
This is a straight port of a 5.6 patch to 8.0

SHOW RAFT STATUS and SHOW GLOBAL STATUS go to the
same functions in the plugin and access shared data structures.
These functions return internal char * pointers to plugin global
variables. They need to be serialized with LOCK_status otherwise
it leads to race conditions.

Reviewed By: li-chi

Differential Revision: D31318340

---------------------------------------------------------------------

Disallowing reset master on raft replicasets

Summary:
This is a straight port of similar patch on 5.6 raft

reset master is inherently not raft compliant.
Its get rid of all binlogs and make an instance appear like
a fresh single instance database without consulting the ring.
We need to block it.

However under certain circumstances (in the future ) e.g. during
first time replicaset build, or when we can guarantee that instance
is single node raft ring, we can potentially do a reset master
followed by rebootstrap of raft. This can also be achieved by
disabling raft, reset master and then re-enabling raft, however
to keep open the door, I have left a force option and will thread
a mechanism from the plugin to call reset_master(force=true)

Reviewed By: li-chi

Differential Revision: D31299214

---------------------------------------------------------------------

Setting topology config in MTR

Summary:
Setting topology config in MTR so that feature that use
topology config can be tested easily. Setting rpl_raft_skip_smc_updates
to avoid unnecessary calls to SMC (even though we supply a dummy
replicaset name).

Reviewed By: li-chi

Differential Revision: D31543877

---------------------------------------------------------------------

Do not allow sql thread start when raft is doing a stop->{}->start transition

Summary:
This is a port of an existing patch made to 5.6 for Raft.

Raft will do a stop of the SQL thread during StopAllWrites.
Then it will repoint the binlog files and during that action, the
SQL threads have to remain stopped. We block it out in this diff by
keeping an atomic bool which can be checked from other functions.

This only applies to raft mode i.e. enable_raft_plugin = true.

Reviewed By: li-chi

Differential Revision: D31319499

---------------------------------------------------------------------

Handle printing of FD event generated by slave SQL thread

Summary:
Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries.

The original commit which added this is d048c0f (P173872135)

To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'.

Reviewed By: anirbanr-fb

Differential Revision: D26359417

---------------------------------------------------------------------

Add exponential backoff for smart restart

Summary:
RAFT Instance crash and Tx Logs filling up.

rocksdb can fill up txlogs.
we should stop restarting mysqld if we have restarted many times in a day

Reviewed By: anirbanr-fb

Differential Revision: D27372750

---------------------------------------------------------------------

always release data_lock mutex to avoid deadlock

Summary: in stage-1 replicaset, when kill a secondary instance, sometime the instance will run into deadlock due to process_raft_queue thread forgot to release its acquired mutex in raft_change_master

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D27602667

---------------------------------------------------------------------

Gracefully exit mysqld_safe loop during backoff

Summary:
Currentt systemctl stop [email protected] can take 10 mins when mysqld_safe is in backoff period.

D28517599 adds a interrupt to sleep in mysql_stop, and mysqld_safe immediately break the retry loop if sleep is interruptted.

Reviewed By: anirbanr-fb

Differential Revision: D28606439

---------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

---------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752

---------------------------------------------------------------------

Reduce max backoff time from 24h to 30mins

Summary:
Over the recent SEV0, raft instances crash looping because of block skew. Even after the clock is normal, mysqld_safe failed to bring back mysqld because extended 24hours backoff.

This diff reduces the max backoff time from 24h to 30mins. So it will still keep trying, but not so aggressively that filling up the trx logs.

Reviewed By: bart2

Differential Revision: D31661172
inikep pushed a commit that referenced this pull request Apr 8, 2022
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990
inikep added a commit that referenced this pull request Apr 8, 2022
recover raft logs by removing partial trxs

Summary:
Port D24628821

mysqld removes partial trxs in the tail of trx log (named binary-logs on
primaries and apply-logs on secondaries) during startup. However, relay logs
were not of much importance since it was anyways discarded and a new one would
be created.
However, with raft, this is not ideal. Relay logs are raft logs on secondaries
and have to be kept around (and kept sane and consistent). This diff adds the
ability to remove partial trxs from raft/relay logs.
Much of the code to open the last relay log (based on relay log index) and
identify partial trxs is borrowed from existing logic in
MYSQL_BIN_LOG::open_binlog() and binlog_recover()

Reviewed By: Pushapgl

Differential Revision: D26447448

---------------------------------------------------------------------

Checking for inited bool to make sure global_init_info was successful

Summary:
Port D25584004

A master.info and relay.info file can be present
but needs to be properly inited for use. We were bypassing the inited
check which could lead to issues in Raft.
In case there is an error in global_init_info, Raft will do a
raft_reset_slave and make another attempt at it. If both recourses
fail, the init of the plugin would fail.

Reviewed By: Pushapgl

Differential Revision: D26447457

---------------------------------------------------------------------

Support for dumping raft logs to vanilla async replicas

Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

---------------------------------------------------------------------

Fixed the flaky raft test suite

Summary:
First clean run of entire raft test suite :)

**Changes**
* Reset apply_logs on raft secondaries before the start of every test
* Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit.

Reviewed By: luqun

Differential Revision: D26651257

---------------------------------------------------------------------

return error from Raft_replication_delegate when plugin is not available

Summary:
Port D23065441 (facebook@b9067f7)

The new macro is used to call into raft plugin. If plugin gets unloaded
accidentally when enable_raft_plugin is ON, then this STRICT version returns
failure. This is to be called only by raft plugin currently

Reviewed By: Pushapgl

Differential Revision: D26447523

---------------------------------------------------------------------

Adding timestamps for raft rotates which happen in the context of listener thread

Summary:
Port D25572614

The timestamp of a binlog event is picked up from the when field in
the event. In most cases of rotation, the when is left unpopulated
during rotation for the top 3 events (fd, pgtid, metadata). However
in such a situation, a normal rotate (flush binary logs) still manages
to get a valid timestamp, since the thread in which the flush binary
logs happens has a valid start time.
Now enter Raft relay log rotations. In those cases and in the case
of config change rotate, the rotations are happening in the context
of a raft listener queue thread. In that context, the when and start
time of the thread are both 0. The diff handles this case by populating
the when field appropriately.

Reviewed By: bhatvinay

Differential Revision: D26194612

---------------------------------------------------------------------

Raft abrupt stepdown and trim binlog file / gtid test

Summary: binlog file should get trimmed for abrupt stepdown

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D26169975

---------------------------------------------------------------------

Port: Fixes around raft log truncation.

Summary:
**Notes**
* New functions to block and unblock dump threads for plugin to use
during raft log truncation.

Below check is already done in raft plugin as part of raft plugin in D26866429.
* Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of
just seeking to the beginning to handle the case when raft log is
truncated before starting the applier

Reviewed By: luqun

Differential Revision: D26759813

---------------------------------------------------------------------

rebase due to relay log Format_description_event event size difference

Summary:
WL#3549: Binlog Compression add extra 1 bytes data into Format_description_event

  3298 @@ -134,6 +137,7 @@ Format_description_event::Format_description_event(uint8_t binlog_ver,
  3299            VIEW_CHANGE_HEADER_LEN,
  3300            XA_PREPARE_HEADER_LEN,
  3301            ROWS_HEADER_LEN_V2,
  3302 +          TRANSACTION_PAYLOAD_EVENT,
  3303        };

Reviewed By: Pushapgl

Differential Revision: D27145095

---------------------------------------------------------------------

fix raft change string metadata event

Summary:
Saw a bunch stage-1 replicaset instance failed due to config change string failure during add/remove instance
```
I0401 00:16:10.894434 1823842 IoCacheUtils.cpp:333] getConfigChangeString eventType = ^G
E0401 00:16:10.894451 1823842 IoCacheUtils.cpp:363] Failed to read metadata event body from cache
E0401 00:16:10.894456 1823842 MysqlRaft.cpp:659] [replicate] Failed to get config change string from iocache
2021-04-01T00:16:10.894464-07:00 8307 [ERROR] [MY-010207] [Repl] Run function 'before_flush' in plugin 'RPL_RAFT' failed
2021-04-01T00:16:10.894478-07:00 8307 [ERROR] [MY-000000] [Server] Failed to rotate binary log
```

After some investigation, the issue is caused is that calculate metadata event length with config change string but forgot write config change string into event body.

Reviewed By: Pushapgl

Differential Revision: D27504157

---------------------------------------------------------------------

Tells raft mode in binlog

Summary:
WIn-Win: Tell whether this binlog is generated while this host is in raft mode.

We already has the bit in FD event, and this diff just speaks it out.

Reviewed By: mpercy

Differential Revision: D28664300

---------------------------------------------------------------------

fix flaky raft testcase

Summary:
There are multiple issues for MTR:
1. in sql/rpl_binlog_sender.cc, if secondaries IO thread receives fatal_error,
   it will quit IO thread instead of reconnect. use unknown error so that
   secondary IO thread can try to reconnect
2. mtr.add_suppression() call: mtr.add_suppression() will execute an insert
   mysql statement into mtr.test_suppressions table and mtr.test_suppressions
   table doesn't contain primary key, thus during idempotent recovery, secondary
   will fail to execute mtr.add_suppression() due to missing PK. Try to move all
   mtr.add_suppression() at the end of testcase to workaround  idempotent recovery failure.
3. When promotion, use raft_promote_to_leader.inc instead of `set rpl_raft_new_leader_uuid`,
   since raft_promote_to_leader will wait the new primary state becomes writeable
4. pass specific warning instead of '.*' to mtr.add_supression
5. etc

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D28774820

---------------------------------------------------------------------

Using locks in Dump_log methods only when raft is enabled

Summary:
We don't need to take locks in non-raft mode since the
underlying MYSQL_BIN_LOG obj will never change. To handle race between
updation of enable_raft_plugin var and using its values in Dump_log we
kill and block all dump threads while updating the var and unblock them
once the var is updated.

Reviewed By: bhatvinay

Differential Revision: D28905522

---------------------------------------------------------------------

leader election to be a sync point in the new leader

Summary:
Port of D27582002 (facebook@39c70ca) from 5.6.35 to 8.0

Newly elected raft leader makes sure that all trxs from the previous
leader is committed by sql appliers. It then switches the server's trx
logs from apply-log-* to binary-log-*. To other part of the system this
looks like a rotation, but the necessary sync calls are not made here.
So, if the server (or os) restarts, then the storage engine could lose
the commit markers of the last batch of trxs. This will result in silent
data drift.
This diff fixes the problem by making an explicit call to
ha_flush_logs() before switching the server's trx logs

Reviewed By: luqun

Differential Revision: D28880407

---------------------------------------------------------------------

Error out SQL thread on out of order opids in raft logs

Summary:
OpIds should always be in order in the raft logs. Added a check
in SQL threads that thows an error and stops the applier when out of
order opids are detected.

Reviewed By: li-chi

Differential Revision: D28810840

---------------------------------------------------------------------

add GET_COMMITTED_GTIDS for raft

Summary:
During recovery Raft Log::Init needs to check with server
what gtids have been committed. Before doing that it finds the entire
set of trxs in the last raft log. The difference between logged -
committed are the pending opids.

Reviewed By: Pushapgl

Differential Revision: D29622786

---------------------------------------------------------------------

raft: skip mts_recovery_groups during start slave

Summary:
During MySQL8+Raft DMP, some instance fail to switch to Leader or start slave

```
2021-06-24T17:56:38.627423-07:00 431 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name /data/mysql/3127/bls-unittestdb658.frc2-3305-mysql.replicaset.180021/binary-logs-3727.000033, event_master_log_pos 1129.
2021-06-24T17:56:38.627473-07:00 431 [ERROR] [MY-010575] [Repl] Error looking for file after /binlogs/binary-logs-3307.000120.
2021-06-24T17:56:38.627516-07:00 431 [ERROR] [MY-000000] [Repl] load_mi_and_rli_from_repositories: rli_init_info returned error
```

similar to 5.6, we don't need to run mts_recovery_groups due to GTID_MODE is always enabled.

Reviewed By: Pushapgl

Differential Revision: D29520066

---------------------------------------------------------------------

update RaftListenerCallbackArg struct

Summary: Due to contract between raft plugin and mysql change, update RaftListenerCallbackArg struct to add  master_uuid field

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D29981349

---------------------------------------------------------------------

always check mi during raft_reset_slave

Summary: add similar nullptr check for raft_reset_slave as raft_stop_sql_thread

Reviewed By: bhatvinay

Differential Revision: D29967915

---------------------------------------------------------------------

raft mtr: update rpl_end_raft.inc for disable-raft

Summary:
rpl_end_raft.inc will unregister raft plugin which will call reset slaves when online disable-raft is enabled. move rpl_end_raft.inc after rpl_stop_slaves.inc to stop slave correctly first.

after stop slaves, call mtr.add_suppression("") won't replicate to slaves. just call mtr.add_suppression("") for all instances(replicas).

Reviewed By: yizhang82

Differential Revision: D30062236

---------------------------------------------------------------------

fix master_uuid

Summary: fix master_uuid in raft mode.

Reviewed By: luqun

Differential Revision: D30261465

---------------------------------------------------------------------

incorrect File_size value in show raft logs result

Summary:
in show raft logs result, the File_size for latest logs file isn't updated correctly.

such as show raft logs;

```
+-------------------------+------------
| Log_name                | File_size  |
| binary-logs-3304.000670 |       1529 |
```

in fact
```
-rw-r----- 1 mysql backup 1669180487 Aug  4 14:49 binary-logs-3304.000670
-rw-r----- 1 mysql backup 1723994154 Aug  4 14:49 binary-logs-3304.000670
```
the  file_size from show raft logs is always 1529 but the real file_size is 1669180487.

The issue is related when writing IO_CACHE directly, its wrapper ostream's position isn't updated.

Reviewed By: yizhang82

Differential Revision: D30116345

---------------------------------------------------------------------

Add gtid_purged_for_tailing

Summary:
Add a new global variable gtid_purged_for_tailing.

It shows the purged GTID for binlog in non-raft mode, and purged GTID for raft log in raft mode.

Reviewed By: abhinav04sharma

Differential Revision: D29372776

---------------------------------------------------------------------

disable skip_setup_replica_if_unnecessary

Summary: After sync with latest official raft plugin, most of MTR failed due to skip_setup_replica_if_unnecessary optimize.

Reviewed By: yizhang82

Differential Revision: D30821648

---------------------------------------------------------------------

latency histograms for raft trx wait

Summary: Port of support added for the same in mysql 5.6. Should help monitor latencies in 8.0 + raft stage -1 replicasets.

Reviewed By: luqun

Differential Revision: D31064764

---------------------------------------------------------------------

handle cases where clean shutdown in raft aborts trxs

Summary: This is a port of the feature in 5.6.

Reviewed By: anirbanr-fb

Differential Revision: D31070593

---------------------------------------------------------------------

fix crash during binlog purging

Summary:
Any error returned by the plugin during binlog purging results in a
crash in mysql8 as the server tries to execute
binlog_error_action_abort. We need to differentiate explicitly between a
plugin error and other error (such as error related to doing disk IO
etc). In thsi particular case, the crash is because of trying to purge a
file that does not exist (i.e which is already purged previosuly) and
raft cannot find it in its index chunk (so it returns a error).

Reviewed By: anirbanr-fb

Differential Revision: D31149997

---------------------------------------------------------------------

update flaky rpl_raft_purged_gtids_dump_threads

Summary:
rpl_raft_purged_gtids_dump_threads MTR failed due to "Cannot replicate because the master purged required binary logs" after server4 will tail server2.

Try to sync server4 before switch to tail server2.

Reviewed By: bhatvinay

Differential Revision: D30818078

---------------------------------------------------------------------

fix various porting issues in mysql8 (raft) crash recovery

Summary:
1. Trimming of the binlog is left to raft plugin (based on the current
    leader's log). Server should skip this step as part of recovery. This
    essentially means setting 'valid_pos' to the last successfully parsed
    trx in the trx log (instead of the engine's view of the trx log) in
    MYSQL_BIN_LOG::recover()
  2. executed_gtid_set should be initialized based on the engine's view of
the trx log file cordinates. So, during startup appropriate flags need
to be passed into MYSQL_BIN_LOG::init_gtid_sets(). init_gtid_sets() is
already changed to handle this, but the flag was not set correctly
during server startup
3. Another fix is in MYSQL_BIN_LOG::init_gtid_sets()to corretly set the position
to read and calculate executed-gtid-set (based on the file name read from the
engine)

Reviewed By: anirbanr-fb

Differential Revision: D31284902

---------------------------------------------------------------------

show_raft_status should take LOCK_status

Summary:
This is a straight port of a 5.6 patch to 8.0

SHOW RAFT STATUS and SHOW GLOBAL STATUS go to the
same functions in the plugin and access shared data structures.
These functions return internal char * pointers to plugin global
variables. They need to be serialized with LOCK_status otherwise
it leads to race conditions.

Reviewed By: li-chi

Differential Revision: D31318340

---------------------------------------------------------------------

Disallowing reset master on raft replicasets

Summary:
This is a straight port of similar patch on 5.6 raft

reset master is inherently not raft compliant.
Its get rid of all binlogs and make an instance appear like
a fresh single instance database without consulting the ring.
We need to block it.

However under certain circumstances (in the future ) e.g. during
first time replicaset build, or when we can guarantee that instance
is single node raft ring, we can potentially do a reset master
followed by rebootstrap of raft. This can also be achieved by
disabling raft, reset master and then re-enabling raft, however
to keep open the door, I have left a force option and will thread
a mechanism from the plugin to call reset_master(force=true)

Reviewed By: li-chi

Differential Revision: D31299214

---------------------------------------------------------------------

Setting topology config in MTR

Summary:
Setting topology config in MTR so that feature that use
topology config can be tested easily. Setting rpl_raft_skip_smc_updates
to avoid unnecessary calls to SMC (even though we supply a dummy
replicaset name).

Reviewed By: li-chi

Differential Revision: D31543877

---------------------------------------------------------------------

Do not allow sql thread start when raft is doing a stop->{}->start transition

Summary:
This is a port of an existing patch made to 5.6 for Raft.

Raft will do a stop of the SQL thread during StopAllWrites.
Then it will repoint the binlog files and during that action, the
SQL threads have to remain stopped. We block it out in this diff by
keeping an atomic bool which can be checked from other functions.

This only applies to raft mode i.e. enable_raft_plugin = true.

Reviewed By: li-chi

Differential Revision: D31319499

---------------------------------------------------------------------

Handle printing of FD event generated by slave SQL thread

Summary:
Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries.

The original commit which added this is d048c0f (P173872135)

To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'.

Reviewed By: anirbanr-fb

Differential Revision: D26359417

---------------------------------------------------------------------

Add exponential backoff for smart restart

Summary:
RAFT Instance crash and Tx Logs filling up.

rocksdb can fill up txlogs.
we should stop restarting mysqld if we have restarted many times in a day

Reviewed By: anirbanr-fb

Differential Revision: D27372750

---------------------------------------------------------------------

always release data_lock mutex to avoid deadlock

Summary: in stage-1 replicaset, when kill a secondary instance, sometime the instance will run into deadlock due to process_raft_queue thread forgot to release its acquired mutex in raft_change_master

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D27602667

---------------------------------------------------------------------

Gracefully exit mysqld_safe loop during backoff

Summary:
Currentt systemctl stop [email protected] can take 10 mins when mysqld_safe is in backoff period.

D28517599 adds a interrupt to sleep in mysql_stop, and mysqld_safe immediately break the retry loop if sleep is interruptted.

Reviewed By: anirbanr-fb

Differential Revision: D28606439

---------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

---------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752

---------------------------------------------------------------------

Reduce max backoff time from 24h to 30mins

Summary:
Over the recent SEV0, raft instances crash looping because of block skew. Even after the clock is normal, mysqld_safe failed to bring back mysqld because extended 24hours backoff.

This diff reduces the max backoff time from 24h to 30mins. So it will still keep trying, but not so aggressively that filling up the trx logs.

Reviewed By: bart2

Differential Revision: D31661172
inikep pushed a commit that referenced this pull request Apr 11, 2022
… enabled

Summary:
For secondaries, when enable_super_log_bin_read_only is on and read_only is on, currently it will forbid to install/uninstall plugin during run time.

install/uninstall plugin doesn't generate event in binlog, although the thread thd contains OPTION_BIN_LOG flag due to log_slave_updates is on by default in secondaries. It should be safe to execute install/uninstall plugin.

the change is to call set_skip_readonly_check() before install/uninstall plugin and call reset_skip_readonly_check()(for completeness) after install/uninstall plugin.

BTW, mysql will always call reset_skip_readonly_check() for at the beginning of each statement. thus set_skip_readonly_check() won't affect other statement.
```
#0  THD::reset_skip_readonly_check (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_class.h:1754
#1  0x0000000005a500e1 in THD::reset_for_next_command (this=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5892
#2  0x0000000005a517a5 in mysql_reset_thd_for_next_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:5817
#3  0x0000000005a50466 in mysql_parse (thd=0x7fb5c1a0b000, parser_state=0x7fb5f6bb4560, last_timer=0x7fb5f6bb39b0) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:6056
#4  0x0000000005a4c7c9 in dispatch_command (thd=0x7fb5c1a0b000, com_data=0x7fb5f6bb4d98, command=COM_QUERY) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:2222
#5  0x0000000005a4f991 in do_command (thd=0x7fb5c1a0b000) at /home/luqun/mysql/mysql-8.0.20/sql/sql_parse.cc:1556
#6  0x0000000005ccd4f1 in handle_connection (arg=0x7fb5cc85b740) at /home/luqun/mysql/mysql-8.0.20/sql/conn_handler/connection_handler_per_thread.cc:330
#7  0x00000000078eb95b in pfs_spawn_thread (arg=0x7fb5f8c89720) at /home/luqun/mysql/mysql-8.0.20/storage/perfschema/pfs.cc:2884
#8  0x00007fb5f957020c in start_thread (arg=0x7fb5f6bb6700) at pthread_create.c:479
#9  0x00007fb5f971881f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Reviewed By: george-reynya

Differential Revision: D27213990
inikep added a commit that referenced this pull request Apr 11, 2022
recover raft logs by removing partial trxs

Summary:
Port D24628821

mysqld removes partial trxs in the tail of trx log (named binary-logs on
primaries and apply-logs on secondaries) during startup. However, relay logs
were not of much importance since it was anyways discarded and a new one would
be created.
However, with raft, this is not ideal. Relay logs are raft logs on secondaries
and have to be kept around (and kept sane and consistent). This diff adds the
ability to remove partial trxs from raft/relay logs.
Much of the code to open the last relay log (based on relay log index) and
identify partial trxs is borrowed from existing logic in
MYSQL_BIN_LOG::open_binlog() and binlog_recover()

Reviewed By: Pushapgl

Differential Revision: D26447448

---------------------------------------------------------------------

Checking for inited bool to make sure global_init_info was successful

Summary:
Port D25584004

A master.info and relay.info file can be present
but needs to be properly inited for use. We were bypassing the inited
check which could lead to issues in Raft.
In case there is an error in global_init_info, Raft will do a
raft_reset_slave and make another attempt at it. If both recourses
fail, the init of the plugin would fail.

Reviewed By: Pushapgl

Differential Revision: D26447457

---------------------------------------------------------------------

Support for dumping raft logs to vanilla async replicas

Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

---------------------------------------------------------------------

Fixed the flaky raft test suite

Summary:
First clean run of entire raft test suite :)

**Changes**
* Reset apply_logs on raft secondaries before the start of every test
* Increased the lock timeouts and changed isolation level in rpl_raft_slave_out_of_order_commit.

Reviewed By: luqun

Differential Revision: D26651257

---------------------------------------------------------------------

return error from Raft_replication_delegate when plugin is not available

Summary:
Port D23065441 (facebook@b9067f7)

The new macro is used to call into raft plugin. If plugin gets unloaded
accidentally when enable_raft_plugin is ON, then this STRICT version returns
failure. This is to be called only by raft plugin currently

Reviewed By: Pushapgl

Differential Revision: D26447523

---------------------------------------------------------------------

Adding timestamps for raft rotates which happen in the context of listener thread

Summary:
Port D25572614

The timestamp of a binlog event is picked up from the when field in
the event. In most cases of rotation, the when is left unpopulated
during rotation for the top 3 events (fd, pgtid, metadata). However
in such a situation, a normal rotate (flush binary logs) still manages
to get a valid timestamp, since the thread in which the flush binary
logs happens has a valid start time.
Now enter Raft relay log rotations. In those cases and in the case
of config change rotate, the rotations are happening in the context
of a raft listener queue thread. In that context, the when and start
time of the thread are both 0. The diff handles this case by populating
the when field appropriately.

Reviewed By: bhatvinay

Differential Revision: D26194612

---------------------------------------------------------------------

Raft abrupt stepdown and trim binlog file / gtid test

Summary: binlog file should get trimmed for abrupt stepdown

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D26169975

---------------------------------------------------------------------

Port: Fixes around raft log truncation.

Summary:
**Notes**
* New functions to block and unblock dump threads for plugin to use
during raft log truncation.

Below check is already done in raft plugin as part of raft plugin in D26866429.
* Re-init rli->cur_log in Relay_log_info::rli_init_info() instead of
just seeking to the beginning to handle the case when raft log is
truncated before starting the applier

Reviewed By: luqun

Differential Revision: D26759813

---------------------------------------------------------------------

rebase due to relay log Format_description_event event size difference

Summary:
WL#3549: Binlog Compression add extra 1 bytes data into Format_description_event

  3298 @@ -134,6 +137,7 @@ Format_description_event::Format_description_event(uint8_t binlog_ver,
  3299            VIEW_CHANGE_HEADER_LEN,
  3300            XA_PREPARE_HEADER_LEN,
  3301            ROWS_HEADER_LEN_V2,
  3302 +          TRANSACTION_PAYLOAD_EVENT,
  3303        };

Reviewed By: Pushapgl

Differential Revision: D27145095

---------------------------------------------------------------------

fix raft change string metadata event

Summary:
Saw a bunch stage-1 replicaset instance failed due to config change string failure during add/remove instance
```
I0401 00:16:10.894434 1823842 IoCacheUtils.cpp:333] getConfigChangeString eventType = ^G
E0401 00:16:10.894451 1823842 IoCacheUtils.cpp:363] Failed to read metadata event body from cache
E0401 00:16:10.894456 1823842 MysqlRaft.cpp:659] [replicate] Failed to get config change string from iocache
2021-04-01T00:16:10.894464-07:00 8307 [ERROR] [MY-010207] [Repl] Run function 'before_flush' in plugin 'RPL_RAFT' failed
2021-04-01T00:16:10.894478-07:00 8307 [ERROR] [MY-000000] [Server] Failed to rotate binary log
```

After some investigation, the issue is caused is that calculate metadata event length with config change string but forgot write config change string into event body.

Reviewed By: Pushapgl

Differential Revision: D27504157

---------------------------------------------------------------------

Tells raft mode in binlog

Summary:
WIn-Win: Tell whether this binlog is generated while this host is in raft mode.

We already has the bit in FD event, and this diff just speaks it out.

Reviewed By: mpercy

Differential Revision: D28664300

---------------------------------------------------------------------

fix flaky raft testcase

Summary:
There are multiple issues for MTR:
1. in sql/rpl_binlog_sender.cc, if secondaries IO thread receives fatal_error,
   it will quit IO thread instead of reconnect. use unknown error so that
   secondary IO thread can try to reconnect
2. mtr.add_suppression() call: mtr.add_suppression() will execute an insert
   mysql statement into mtr.test_suppressions table and mtr.test_suppressions
   table doesn't contain primary key, thus during idempotent recovery, secondary
   will fail to execute mtr.add_suppression() due to missing PK. Try to move all
   mtr.add_suppression() at the end of testcase to workaround  idempotent recovery failure.
3. When promotion, use raft_promote_to_leader.inc instead of `set rpl_raft_new_leader_uuid`,
   since raft_promote_to_leader will wait the new primary state becomes writeable
4. pass specific warning instead of '.*' to mtr.add_supression
5. etc

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D28774820

---------------------------------------------------------------------

Using locks in Dump_log methods only when raft is enabled

Summary:
We don't need to take locks in non-raft mode since the
underlying MYSQL_BIN_LOG obj will never change. To handle race between
updation of enable_raft_plugin var and using its values in Dump_log we
kill and block all dump threads while updating the var and unblock them
once the var is updated.

Reviewed By: bhatvinay

Differential Revision: D28905522

---------------------------------------------------------------------

leader election to be a sync point in the new leader

Summary:
Port of D27582002 (facebook@39c70ca) from 5.6.35 to 8.0

Newly elected raft leader makes sure that all trxs from the previous
leader is committed by sql appliers. It then switches the server's trx
logs from apply-log-* to binary-log-*. To other part of the system this
looks like a rotation, but the necessary sync calls are not made here.
So, if the server (or os) restarts, then the storage engine could lose
the commit markers of the last batch of trxs. This will result in silent
data drift.
This diff fixes the problem by making an explicit call to
ha_flush_logs() before switching the server's trx logs

Reviewed By: luqun

Differential Revision: D28880407

---------------------------------------------------------------------

Error out SQL thread on out of order opids in raft logs

Summary:
OpIds should always be in order in the raft logs. Added a check
in SQL threads that thows an error and stops the applier when out of
order opids are detected.

Reviewed By: li-chi

Differential Revision: D28810840

---------------------------------------------------------------------

add GET_COMMITTED_GTIDS for raft

Summary:
During recovery Raft Log::Init needs to check with server
what gtids have been committed. Before doing that it finds the entire
set of trxs in the last raft log. The difference between logged -
committed are the pending opids.

Reviewed By: Pushapgl

Differential Revision: D29622786

---------------------------------------------------------------------

raft: skip mts_recovery_groups during start slave

Summary:
During MySQL8+Raft DMP, some instance fail to switch to Leader or start slave

```
2021-06-24T17:56:38.627423-07:00 431 [Note] [MY-010574] [Repl] Slave: MTS group recovery relay log info group_master_log_name /data/mysql/3127/bls-unittestdb658.frc2-3305-mysql.replicaset.180021/binary-logs-3727.000033, event_master_log_pos 1129.
2021-06-24T17:56:38.627473-07:00 431 [ERROR] [MY-010575] [Repl] Error looking for file after /binlogs/binary-logs-3307.000120.
2021-06-24T17:56:38.627516-07:00 431 [ERROR] [MY-000000] [Repl] load_mi_and_rli_from_repositories: rli_init_info returned error
```

similar to 5.6, we don't need to run mts_recovery_groups due to GTID_MODE is always enabled.

Reviewed By: Pushapgl

Differential Revision: D29520066

---------------------------------------------------------------------

update RaftListenerCallbackArg struct

Summary: Due to contract between raft plugin and mysql change, update RaftListenerCallbackArg struct to add  master_uuid field

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D29981349

---------------------------------------------------------------------

always check mi during raft_reset_slave

Summary: add similar nullptr check for raft_reset_slave as raft_stop_sql_thread

Reviewed By: bhatvinay

Differential Revision: D29967915

---------------------------------------------------------------------

raft mtr: update rpl_end_raft.inc for disable-raft

Summary:
rpl_end_raft.inc will unregister raft plugin which will call reset slaves when online disable-raft is enabled. move rpl_end_raft.inc after rpl_stop_slaves.inc to stop slave correctly first.

after stop slaves, call mtr.add_suppression("") won't replicate to slaves. just call mtr.add_suppression("") for all instances(replicas).

Reviewed By: yizhang82

Differential Revision: D30062236

---------------------------------------------------------------------

fix master_uuid

Summary: fix master_uuid in raft mode.

Reviewed By: luqun

Differential Revision: D30261465

---------------------------------------------------------------------

incorrect File_size value in show raft logs result

Summary:
in show raft logs result, the File_size for latest logs file isn't updated correctly.

such as show raft logs;

```
+-------------------------+------------
| Log_name                | File_size  |
| binary-logs-3304.000670 |       1529 |
```

in fact
```
-rw-r----- 1 mysql backup 1669180487 Aug  4 14:49 binary-logs-3304.000670
-rw-r----- 1 mysql backup 1723994154 Aug  4 14:49 binary-logs-3304.000670
```
the  file_size from show raft logs is always 1529 but the real file_size is 1669180487.

The issue is related when writing IO_CACHE directly, its wrapper ostream's position isn't updated.

Reviewed By: yizhang82

Differential Revision: D30116345

---------------------------------------------------------------------

Add gtid_purged_for_tailing

Summary:
Add a new global variable gtid_purged_for_tailing.

It shows the purged GTID for binlog in non-raft mode, and purged GTID for raft log in raft mode.

Reviewed By: abhinav04sharma

Differential Revision: D29372776

---------------------------------------------------------------------

disable skip_setup_replica_if_unnecessary

Summary: After sync with latest official raft plugin, most of MTR failed due to skip_setup_replica_if_unnecessary optimize.

Reviewed By: yizhang82

Differential Revision: D30821648

---------------------------------------------------------------------

latency histograms for raft trx wait

Summary: Port of support added for the same in mysql 5.6. Should help monitor latencies in 8.0 + raft stage -1 replicasets.

Reviewed By: luqun

Differential Revision: D31064764

---------------------------------------------------------------------

handle cases where clean shutdown in raft aborts trxs

Summary: This is a port of the feature in 5.6.

Reviewed By: anirbanr-fb

Differential Revision: D31070593

---------------------------------------------------------------------

fix crash during binlog purging

Summary:
Any error returned by the plugin during binlog purging results in a
crash in mysql8 as the server tries to execute
binlog_error_action_abort. We need to differentiate explicitly between a
plugin error and other error (such as error related to doing disk IO
etc). In thsi particular case, the crash is because of trying to purge a
file that does not exist (i.e which is already purged previosuly) and
raft cannot find it in its index chunk (so it returns a error).

Reviewed By: anirbanr-fb

Differential Revision: D31149997

---------------------------------------------------------------------

update flaky rpl_raft_purged_gtids_dump_threads

Summary:
rpl_raft_purged_gtids_dump_threads MTR failed due to "Cannot replicate because the master purged required binary logs" after server4 will tail server2.

Try to sync server4 before switch to tail server2.

Reviewed By: bhatvinay

Differential Revision: D30818078

---------------------------------------------------------------------

fix various porting issues in mysql8 (raft) crash recovery

Summary:
1. Trimming of the binlog is left to raft plugin (based on the current
    leader's log). Server should skip this step as part of recovery. This
    essentially means setting 'valid_pos' to the last successfully parsed
    trx in the trx log (instead of the engine's view of the trx log) in
    MYSQL_BIN_LOG::recover()
  2. executed_gtid_set should be initialized based on the engine's view of
the trx log file cordinates. So, during startup appropriate flags need
to be passed into MYSQL_BIN_LOG::init_gtid_sets(). init_gtid_sets() is
already changed to handle this, but the flag was not set correctly
during server startup
3. Another fix is in MYSQL_BIN_LOG::init_gtid_sets()to corretly set the position
to read and calculate executed-gtid-set (based on the file name read from the
engine)

Reviewed By: anirbanr-fb

Differential Revision: D31284902

---------------------------------------------------------------------

show_raft_status should take LOCK_status

Summary:
This is a straight port of a 5.6 patch to 8.0

SHOW RAFT STATUS and SHOW GLOBAL STATUS go to the
same functions in the plugin and access shared data structures.
These functions return internal char * pointers to plugin global
variables. They need to be serialized with LOCK_status otherwise
it leads to race conditions.

Reviewed By: li-chi

Differential Revision: D31318340

---------------------------------------------------------------------

Disallowing reset master on raft replicasets

Summary:
This is a straight port of similar patch on 5.6 raft

reset master is inherently not raft compliant.
Its get rid of all binlogs and make an instance appear like
a fresh single instance database without consulting the ring.
We need to block it.

However under certain circumstances (in the future ) e.g. during
first time replicaset build, or when we can guarantee that instance
is single node raft ring, we can potentially do a reset master
followed by rebootstrap of raft. This can also be achieved by
disabling raft, reset master and then re-enabling raft, however
to keep open the door, I have left a force option and will thread
a mechanism from the plugin to call reset_master(force=true)

Reviewed By: li-chi

Differential Revision: D31299214

---------------------------------------------------------------------

Setting topology config in MTR

Summary:
Setting topology config in MTR so that feature that use
topology config can be tested easily. Setting rpl_raft_skip_smc_updates
to avoid unnecessary calls to SMC (even though we supply a dummy
replicaset name).

Reviewed By: li-chi

Differential Revision: D31543877

---------------------------------------------------------------------

Do not allow sql thread start when raft is doing a stop->{}->start transition

Summary:
This is a port of an existing patch made to 5.6 for Raft.

Raft will do a stop of the SQL thread during StopAllWrites.
Then it will repoint the binlog files and during that action, the
SQL threads have to remain stopped. We block it out in this diff by
keeping an atomic bool which can be checked from other functions.

This only applies to raft mode i.e. enable_raft_plugin = true.

Reviewed By: li-chi

Differential Revision: D31319499

---------------------------------------------------------------------

Handle printing of FD event generated by slave SQL thread

Summary:
Early returning in the FD:print() function makes mysqlbinlog not be able to parse Raft logs on secondaries.

The original commit which added this is d048c0f (P173872135)

To comply with the intent of the original bug fix, we avoid printing the FD event of a relay log as a 'BINLOG'.

Reviewed By: anirbanr-fb

Differential Revision: D26359417

---------------------------------------------------------------------

Add exponential backoff for smart restart

Summary:
RAFT Instance crash and Tx Logs filling up.

rocksdb can fill up txlogs.
we should stop restarting mysqld if we have restarted many times in a day

Reviewed By: anirbanr-fb

Differential Revision: D27372750

---------------------------------------------------------------------

always release data_lock mutex to avoid deadlock

Summary: in stage-1 replicaset, when kill a secondary instance, sometime the instance will run into deadlock due to process_raft_queue thread forgot to release its acquired mutex in raft_change_master

Reviewed By: Pushapgl, bhatvinay

Differential Revision: D27602667

---------------------------------------------------------------------

Gracefully exit mysqld_safe loop during backoff

Summary:
Currentt systemctl stop [email protected] can take 10 mins when mysqld_safe is in backoff period.

D28517599 adds a interrupt to sleep in mysql_stop, and mysqld_safe immediately break the retry loop if sleep is interruptted.

Reviewed By: anirbanr-fb

Differential Revision: D28606439

---------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

---------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752

---------------------------------------------------------------------

Reduce max backoff time from 24h to 30mins

Summary:
Over the recent SEV0, raft instances crash looping because of block skew. Even after the clock is normal, mysqld_safe failed to bring back mysqld because extended 24hours backoff.

This diff reduces the max backoff time from 24h to 30mins. So it will still keep trying, but not so aggressively that filling up the trx logs.

Reviewed By: bart2

Differential Revision: D31661172
inikep pushed a commit that referenced this pull request Jun 1, 2023
Summary:
Rebaseline type relate tests:

1. SHOW INDEX format changes
2. No more LSM_TREE in Index_type as the corresponding
3. More rows in EXPLAIN and rows column is now in column #10 (instead of #9) and there is another filtered column that needs to be masked
4. Using utf8 results warnings but I decided to keep the warning as a reminder in the future. TODO: we need to decide what to do with binary unpacking with regarding to utf8/utf8mb3/utf8mb4.
5. YEAR(2) is deprecated
6. GROUP BY now needs ORDER BY for ordered output
7. MySQL 8 no longer allows specifying NULL for PK so I'm adding a extra_pk_col_opts to the col_opt_*.tests to only specify supported flags for PK
8. Binary output are now hex instead of character (for example, 0 => 0x30)

Reviewed By: lloyd

Differential Revision: D17503619
inikep pushed a commit that referenced this pull request Jun 1, 2023
Summary:
1. Disable group_min_max: we need to port over infinite loop fix
2. Disable blind_delete_rc/blind_delete_rr - missing rocksdb_read_free_rpl patch
3. Disable rocksdb.persistent_cache - this is a strange crash that only happens when other tests are running and requires further investigation
4. Lower number of rows in deadlock/drop_table as 8.0 debug can lead to timeout. Filed a task to investigate why the performance is different between 8.0 and 5.6. Also turn off binlog to make the test slightly faster and align with 5.6
5. rocksdb.information_schema: need to move create table before select * from information_schema.rocksdb_global_info to write out max_index. Also due to missing a few tables in mysql database (such as slave_gtid_info, etc) the index ids are also different.
6. rocksdb.rocksdb: account for different in rocksdb_number_keys_written when binlog is enabled by default in 8.0. Disabled partition related tests with disable_testcase (tracked in a separate task). Also disabled a packing related test that is tracked by a separate task.
7. rocksdb.drop_table: reduce max to avoid test timeout (filed a separate task for investigation). Also fix a bug in myrocksdb shutdown to properly abort the compaction at shutdown when rocksdb_Debug_manual_compaction_delay is set.
8. rebaseline rpl_gtid_crash_safe because the .inc file in mysql side has changed

Reviewed By: lloyd

Differential Revision: D17802248
inikep pushed a commit that referenced this pull request Jun 1, 2023
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

fbshipit-source-id: d96ebcef966

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

fbshipit-source-id: 8e7fdb8

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Jun 14, 2023
Summary:
Rebaseline type relate tests:

1. SHOW INDEX format changes
2. No more LSM_TREE in Index_type as the corresponding
3. More rows in EXPLAIN and rows column is now in column #10 (instead of #9) and there is another filtered column that needs to be masked
4. Using utf8 results warnings but I decided to keep the warning as a reminder in the future. TODO: we need to decide what to do with binary unpacking with regarding to utf8/utf8mb3/utf8mb4.
5. YEAR(2) is deprecated
6. GROUP BY now needs ORDER BY for ordered output
7. MySQL 8 no longer allows specifying NULL for PK so I'm adding a extra_pk_col_opts to the col_opt_*.tests to only specify supported flags for PK
8. Binary output are now hex instead of character (for example, 0 => 0x30)

Reviewed By: lloyd

Differential Revision: D17503619
inikep pushed a commit that referenced this pull request Jun 14, 2023
Summary:
1. Disable group_min_max: we need to port over infinite loop fix
2. Disable blind_delete_rc/blind_delete_rr - missing rocksdb_read_free_rpl patch
3. Disable rocksdb.persistent_cache - this is a strange crash that only happens when other tests are running and requires further investigation
4. Lower number of rows in deadlock/drop_table as 8.0 debug can lead to timeout. Filed a task to investigate why the performance is different between 8.0 and 5.6. Also turn off binlog to make the test slightly faster and align with 5.6
5. rocksdb.information_schema: need to move create table before select * from information_schema.rocksdb_global_info to write out max_index. Also due to missing a few tables in mysql database (such as slave_gtid_info, etc) the index ids are also different.
6. rocksdb.rocksdb: account for different in rocksdb_number_keys_written when binlog is enabled by default in 8.0. Disabled partition related tests with disable_testcase (tracked in a separate task). Also disabled a packing related test that is tracked by a separate task.
7. rocksdb.drop_table: reduce max to avoid test timeout (filed a separate task for investigation). Also fix a bug in myrocksdb shutdown to properly abort the compaction at shutdown when rocksdb_Debug_manual_compaction_delay is set.
8. rebaseline rpl_gtid_crash_safe because the .inc file in mysql side has changed

Reviewed By: lloyd

Differential Revision: D17802248
inikep pushed a commit that referenced this pull request Jun 14, 2023
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

fbshipit-source-id: d96ebcef966

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

fbshipit-source-id: 8e7fdb8

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Jun 19, 2023
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

fbshipit-source-id: d96ebcef966

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

fbshipit-source-id: 8e7fdb8

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Jun 23, 2023
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

fbshipit-source-id: d96ebcef966

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

fbshipit-source-id: 8e7fdb8

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Apr 25, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request May 7, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request May 8, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request May 9, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request May 10, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request May 13, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request May 15, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request May 16, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request May 17, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request May 17, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request May 21, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request May 21, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request May 30, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Jun 28, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Jul 2, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Jul 19, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Jul 19, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Jul 30, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Jul 31, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Aug 2, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
inikep pushed a commit that referenced this pull request Aug 6, 2024
Summary:
[Porting Notes]
We want to dump raft logs to vanilla async replicas regardless
of whether it's the relay log or binlog. Effectively after this change
we'll dump relay logs on the followers and binlogs on the leader. When
the raft role changes, the logs to the dumped are also changed.
Dump_log class is introduced as a thin wrapper/continer around
mysql_bin_log or rli->relay_log and is inited with mysql_bin_log
to emulate vanilla mysql behavior. Dump threads use the global
dump_log object instead of mysql_bin_log directly. We switch the log
in dump log only when raft role changes (in binlog_change_to_binlog()
and binlog_change_to_apply_log()).
During raft role change we take all log releated locks (LOCK_log,
LOCK_index, LOCK_binlog_end_pos, and dump log lock) to serialize it with
other log operations like dumping logs.

Related doc - https://fb.quip.com/oTVAAdgEi4zY

This diff contains below 7 patches:
D23013977
D24766787
D24716539
D24900223
D24955284
D25174166
D25775525

Reviewed By: luqun

Differential Revision: D26141496

-------------------------------------------------------------------------------

Passing raw_log pointer to wait_with_heartbeat() and wait_without_heartbeat()

Summary:
When enable_raft_plugin is OFF Dump_log::lock() is a no-op.
Which means that when enable_raft_plugin is OFF there can be a race
between log switching and dump threads. This could lead to a scenario
where the raw_log that wait_next_event() is working on might be
different than what wait_with_heartbeat()/wait_without_heartbeat() is
working on. This can cause deadlocks because
wait_with_heartbeat()/wait_without_heartbeat()'s mysql_cond_wait would
unlock and then lock a different log's LOCK_binlog_end_pos mutex which
would then never be unlocked by wait_next_event().

Reviewed By: anirbanr-fb

Differential Revision: D32152658

-----------------------------------------------------------------------------------------

Fix rpl_raft_dump_raft_logs

Summary:
This tests completes but fails because the following warning exists:
```
2022-08-30T16:28:00.159525Z 11 [ERROR] [MY-013114] [Repl] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been', Error_code: MY-013114
```
Since the MTR result file is valid, we can suppress this error.

Reviewed By: yichenshen

Differential Revision: D39141846

-------------------------------------------------------------------------------

Fix heap overflow in group_relay_log_name handling

Summary:
We were accessing group_relay_log_name in
Query_log_event::do_apply_event_worker() but it's assigned only after
the coordinator thread encounters an end event (i.e. xid event or a
query event with "COMMIT" or "ROLLBACK" query). This was causing a race
between accessing group_relay_log_name in the worker thread and writing
it on the coordinator thread. We don't need to set transaction position
in events other than end event, so now we set transaction position in
query event only if it's an end event. The race is eliminated because
group_relay_log_name is set before enqueuing the event to the worker
thread (in both dep repl and vanilla mts).

Reviewed By: lth

Differential Revision: D28767430

-------------------------------------------------------------------------------

fix memory during MYSQL_BIN_LOG::open_existing_binlog

Summary:
asandebug complain there are memory leaks during MYSQL_BIN_LOG open

Direct leak of 50 byte(s) in 1 object(s) allocated from:
    #0 0x67460ef in malloc
    #1 0x93f0777 in my_raw_malloc(unsigned long, int)
    #2 0x93f064a in my_malloc(unsigned int, unsigned long, int)
    #3 0x93f0eb0 in my_strdup(unsigned int, char const*, int)
    #4 0x8af01a6 in MYSQL_BIN_LOG::open(unsigned int, char const*, char const*, unsigned int)
    #5 0x8af8064 in MYSQL_BIN_LOG::open_binlog(char const*, char const*, unsigned long, bool, bool, bool, Format_description_log_event*, unsigned int, RaftRotateInfo*, bool)
    #6 0x8b00c00 in MYSQL_BIN_LOG::new_file_impl(bool, Format_description_log_event*, RaftRotateInfo*)
    #7 0x8d65e47 in rotate_relay_log(Master_info*, bool, bool, bool, RaftRotateInfo*)
    #8 0x8d661c0 in rotate_relay_log_for_raft(RaftRotateInfo*)
    #9 0x8c7696a in process_raft_queue
    #10 0xa0fa1fd in pfs_spawn_thread(void*)
    #11 0x7f8c9a12b20b in start_thread

release these memory before assign them

Reviewed By: Pushapgl

Differential Revision: D28819752
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants