PS-9104 Release tasks for PS-8.0.36 #5245

adivinho · 2024-03-04T11:36:09Z

No description provided.

As subject says: Patch will ensure that NDBT_SKIPPED is returned from test cases being skipped due to config needs not being meet. Change-Id: Ia037604701cd6a39bc40682cfe532bd65dad9b22

Problem: On a distributed setup, where on each site every mysqld replicates changes from each other but are set to not log these updates on their own binlog, a Table_map entry was eventually logged. This causes the replication applier of a 3rd server to fail with ER_NO_SUCH_TABLE. Analysis: 1. The Table_map events belonging to changes that were applied by a replica were being injected without looking at the log_replica_updates option. The code was only guarding against the set of events within the epoch that had the NDB_ANYVALUE_NOLOGGING_CODE flag set in the anyValue. However, this flag is only set for the specific cases that binlog is disabled. 2. The cumulative_any_value of an event operation, a bitwise AND operation over the any-value of all events for a given table belonging to the same epoch, was not being used correctly for a number of reasons: i) It was being relied on to give an overview of who had made changes on the table over the course of the epoch; ii) It could contain reserved values, whose bits are the inverse of the bits used to record server ids; iii) It was being reduced to a bitwise AND of all recorded values on each event, thus causing cancellation between inverse bits. Solution: 1. The injection of a Table_map entry for changes occured in the table is now guarded with the log_replica_updates option. 2. The cumulative_any_value is now created with a bitwise OR operation so that information is not lost. Additionally, it is filtered by the plugin event operation on every processing of the event data so that important data is retrieved, such as if the update was made by a replica or if it had a no-logging flag set. This data is then used for a correct decision on whether to inject a Table_map event. Changed rpl_conflict tests' setup since mysqld.1.1 must log replica updates and the default is turning it off. Change-Id: I978325fe01435a7d0e0bd4249979b749731f8fbb

Root cause to the problem where ssl_write() may get a smaller send buffer on retries is that consolidate() calculated how many *full* iovec buffers we could fit into buff[MaxTlsRecord]. Thus if the last iovec[] buffer size had increased on a write retry, consolidate would calculate a smaller buff[] size on the retry if the last (now larger) iovec[] buffer could not fit anymore. That was likely based on a misconseption that iovec_data_sent(), which is used to report the buffer size consumed by 'writev()', only handled the consumption of a number of full iovec[] buffers. However, that is not the case: any number of bytes may be reported as consumed, and the iovec[]' will be correctly updated accordingly. Patch allowes the full size of the buff[MaxTlsRecord] to be used for packing, with a partial copy from the last iovec[] in order to completely fill buff[]. Thus the case where we could no longer fit data from the last iovec[] buffer is avoided. Change-Id: I4d0d7d124421d510ab80e0a2566f6565785ad6d3

…restart process The management server maintains a cache mapping nodeids to host address presentation format strings (e.g. 33 -> "47.184.12.55") These cached strings are returned to MGM protocol requests for information about connected nodes. The cache is currently : - Populated when an MGM protocol request is made for a destination which is connected at the transporter level, and has no cached address string - Cleared when processing a NODE_FAILREP or NF_COMPLETEREP signal This is problematic when the disconnection of a remote node is slow, as the following sequence is possible : 1 Data node 5's address is cached 'host A' 2 Data node 5 vanishes (network loss) 3 Data node 5's failure is detected by MGMD and a disconnect is requested 4 Other data nodes detect node 5's failure, send NODE_FAILREP + NF_COMPLETEREPs to MGMD 5 MGMD processes NODE_FAILREP + NF_COMPLETEREP, clear's node 5's cached address 6 MGMD receives MGM protocol request for data node 5's address 7 Data Node 5 is *still connected*, and so its address is looked up and cached 8 Eventually data node 5 disconnect is processed 9 Later data node 5 connects from a new address 'host B' 10 MGMD received MGM protocol request for data node 5's address - returns cached 'hostA' address The problematic events are : - Slow disconnect processing (3 .... 8) - MGM protocol request arrives during slow disconnect (6,7), after node failure signal handling (5) This situation is avoided by adding a cached node address clear step as part of connecting to a node (e.g. at step 9) so that any previously cached address is discarded and a new address can be used. Change-Id: I88841acd44e654ebbdb9ce4c013f26324df45d30

Change-Id: I6bacf52751d564267b419607e35f8755e8f12eed

Post push patch Fix printf specifier for printing a size_t data type. The size_t has a bit length depending in the underlying platform / compiler. The correct specifier to be used for such a type is '%zu' according to documentation Change-Id: I2135e00aa6e69df94300970b5578b3c839b63e6d

Change-Id: I12e5e75ba31e8fd8af1eb79835148ea599a33114

Fix alloc-size-larger-than warning in Qmgr by explicitly only allocate non-zero sized array. Compiler can not deduce that that is actually always the case. The confusing warning fixed is: storage/ndb/src/kernel/blocks/qmgr/QmgrInit.cpp: In member function 'initData': storage/ndb/src/kernel/blocks/qmgr/QmgrInit.cpp:204:60: warning: argument 1 value '18446744073709551615' exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=] 204 | receivedProcessInfo = new ProcessInfo[numOfApiAndMgmNodes]; | ^ /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/new:128:26: note: in a call to allocation function 'operator new []' declared here 128 | _GLIBCXX_NODISCARD void* operator new[](std::size_t) _GLIBCXX_THROW (std::bad_alloc) | ^ Change-Id: Ie7aea5e8681db1c420d2a0e80d1cda8319f74a2b

Handling of ENOMEM errors returned from writev() and recv() in resp. TCP_Transporter::doSend() and TCP_Transporter::doReceive() was introduced by the patch: ....................... commit 367d9278a182676d8ba6a4d768ae9e3eae05c440 Author: Mikael Ronström <[email protected]> Date: Fri Jun 28 14:44:26 2019 +0200 BUG#29887068 .... Handling ENOMEM with resends of smaller size when send attempted. ...................... The ENOMEM handling attempted a resend with smaller size by direcly reducing the iov_len of the first iovec[] buffer. At best this seemed to have solved a corner case where we would run out of memory anyway real soon. The testcase which reportedly was solved was a manually run attempt to test many-nodes functionality on a small set of machines, perhaps explaining the memory pressure issues. Lots of others overload issues were also observed in similar tests, which could be claimed to use a configuration we really do not recomend nor support In lack of evidence that the ENOMEM handling had any value, and as we would like to keep the code 'clean', this patch revert the ENOMEM handling introduced by the patch refered to above. Change-Id: Ib5a269f0b4a41a6d683d27f49b575616d82d3822

Thd memory usage guard around injection of an incident is too strict as this code path invokes file rotation etc and may increase thd memory usage. Change-Id: Iff21da1f7d61276b3ffbea3b8f7441088031342f

Fixing compiler warning by adding a check to ensure accessing array element within the range. Reviewed by: Mauritz Sundell <[email protected]> Change-Id: I55a0a72b5f7733e069bcd229ff49e45ed3bf38cb (cherry picked from commit f21d8a05d5a8180cd0d2e8b49833532aaedf8f6a)

sql-common/oci/signing_key.cc:87:42: error: ignoring attributes on template argument int (*)(FILE*) [-Werror=ignored-attributes] std::unique_ptr<FILE, decltype(&fclose)> fp(fopen(file_name.c_str(), "rb"), This is the declaration for Ubuntu 23.10 extern int fclose (FILE *__stream) __nonnull ((1)); Use a scope guard closure, rather than a unique_ptr with custom deleter. Change-Id: Iecb41b238dc9b72d87f33f54aea272391f3b5977 (cherry picked from commit 5d2e8c1aadb23c9d9acba692d2dd7afb24a0897a)

For a Debug build, with -O2, we get: In file included from /usr/include/string.h:548, from extra/googletest/googletest-release-1.12.0/googletest/include/gtest/xxxxxx/gtest-port.h:262, from extra/googletest/googletest-release-1.12.0/googletest/include/gtest/gtest-message.h:55, from extra/googletest/googletest-release-1.12.0/googletest/include/gtest/gtest-assertion-result.h:46, from extra/googletest/googletest-release-1.12.0/googletest/include/gtest/gtest.h:59, from unittest/gunit/item_json_func-t.cc:23: In function void* memset(void*, int, size_t), inlined from void TRASH(void*, size_t) at include/memory_debugging.h:72:9, inlined from static void Item::operator delete(void*, size_t) at sql/item.h:962:10, inlined from void item_json_func_unittest::BM_JsonSearch(size_t) at unittest/gunit/item_json_func-t.cc:721:78: /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59:33: error: <anonymous> may be used uninitialized [-Werror=maybe-uninitialized] 59 | return __builtin___memset_chk (__dest, __ch, __len, | ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~ 60 | __glibc_objsize0 (__dest)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~ unittest/gunit/item_json_func-t.cc: In function void item_json_func_unittest::BM_JsonSearch(size_t): unittest/gunit/item_json_func-t.cc:721:78: note: <anonymous> was declared here 721 | auto search = new Item_func_json_search(table.in_use, new Item_field(&field), | ^ gcc somehow thinks we may call this: static void operator delete(void *ptr [[maybe_unused]], size_t size [[maybe_unused]]) { TRASH(ptr, size); } The fix is to evaluate 'new Item_field(&field)' separately. Change-Id: I5d73561f900043092bcd982fffc8462707627f1a (cherry picked from commit e536c8a446b97ac9c90ce957ba1c34547b798993)

Backport -Wno-alloc-size-larger-than suppressions from trunk. from the patch for: Bug#34099162 Add support for mold linker Change-Id: I3996c472b423daaf3002c477ccb757d2e55846f9

Fixing compiler warning by adding a check to ensure accessing array element within the range. Reviewed by: Mauritz Sundell <[email protected]> (backported from commit f21d8a05d5a8180cd0d2e8b49833532aaedf8f6a) Change-Id: I5c56b5ef1fa73908a3bd5c070ab7106fee4cc9ad

Change-Id: I9548896ffdf40c8dc3f40a0571cd461d49c0e7e7

…ODE RESTART Post-push fix for 7.6: Methods - bool first(Ptr<T>& p) - bool next(Ptr<T>& p) added to DLHashTable. Both methods are already present in 8.0+ version. Fixed the way report_subscription_set iterates over the subscription list. Now iteration uses the new added versions of the first/next methods. This way iteration is as in 8.0+. Change-Id: Ic60f359a4f18e4fa95be73fcb8d72343ad113af3

Change-Id: If43c680347087182cb12b9eddf7d332f9482281d

Move client_authenticate() out of SocketClient::connect() (which returns void) into a separate SocketClient::authenticate() method which can return a value. In SocketAuthenticator, change the signature of the authentication routines to return an int (which can represent a result code) rather than a bool. Results less than AuthOk represent failure, and results greater than or equal to AuthOk represent success. Remove the username and password variables from SocketAuthSimple; make them constant strings in the implementation. There are no functional changes. Change-Id: I4c25e99f1b9b692db39213dfa63352da8993a8fb

This changes TransporterRegistry::connect_ndb_mgmd() to return NdbSocket rather than ndb_socket_t. Back-ported from mysql-trunk. Change-Id: Ic3b9ccf39ec78ed25705a4bbbdc5ac2953a35611

…OSE] NdbSocket::close should always invalidate the socket file descriptor. Renamed invalidate into a private method invalidate_socket_handle and used that in close and other methods. Change-Id: Ic02406a72de3e452287337dcc87f8a0d361ecf8a

In preparation for next patch that pass NdbSocket either by const reference (const NdbSocket&) for temporary use or rvalue reference (NdbSocket&&) for transfer ownership, calls to close must move since only owner may call close (or any other non const method). Calls to NdbSocket::close are moved from caller to callee for functions that takes ownership using NdbSocket&& since the caller must not use the moved socket after call. And since callee will to the close there is no need to pass back close with reset flag to caller and that is removed from transporter_connect and connect_server methods. Calls to NdbSocket::close are moved from callee to caller for functions that are changed to "borrow" socket using const NdbSocket& since callee can not call non const methods. To keep the semantics of the function making the socket unusable on failure a new method NdbSocket::shutdown is introduced that only stop further communication on socket but still leave socket open (until caller calls close). Change-Id: I9666ba76e03e268406677c1e32d76d7d219e9a56

Back-ported for 8.0: Some parts of "WL#15524 Patch #16 Use TLS for Event Listener" have been merged into this patch. Use rvalue references (NdbSocket&&) when passing ownership of NdbSocket to function, use const references (const NdbSocket&) otherwise. When function creates a new instance of NdbSocket return it as a return value rather than via an out parameter. Implementation of struct ndb_logevent_handle is rewritten from being a plain C struct to have proper constructors for both taking ownership of socket and only borrow socket, and instead of using malloc and free function new and delete are used. Refactoring made SecureSocketOutputStream, BufferedSecureOutputStream, and SocketInputStream redundant and they are removed. Change-Id: I60caae52f2b0997b78ae179e5e9b47b6c99073b8

Post push fix. Make NdbSocket::ssl_readln return 0 on timeout. Change-Id: I4cad95abd319883c16f2c28eff5cf2b6761731d6

Bug#35692840 Disable NDB SSL socket table Remove NdbSocket::From enum and init_from_native and init_from_new methods. Remove the SSL socket table. Change-Id: I6c0023ec1e3b06e519988d9388440d577e6f2e84

Post push fix. Add dummy implementation for NdbSocket::ssl_shutdown if built against older than OpenSSL 1.1.1. Change-Id: I0209c58aa5c565a5d959b30bb782d14f4950d4ce

…issues Disconnect of transporters were not 'clean': disconnectImpl() is used to DISCONNECTING the connection, while other threads might still send/recv on the same connection. The implementation of disconnectImpl() used to call NdbSocket::close() directly, such that: - close() was called on the underlying socket, which also released the descriptor for reuse. As the descriptor was still referred by other threads, they may end up referring a descriptor reused to referring e.g. a file. - With introduction of TLS in NdbSockets, close() released the ssl object as well. The ssl object may also be referred by other threads attempting send/recv. Patch avoid releasing these objects as part of the DISCONNECTING phase: When we disconnect (-> ::disconnectImpl()) a transportrer, we will now only 'shutdown()' the NdbSocket which keep the file discriptor and ssl object in a 'shutdown' state When the transporter reach the DISCONNECTED state (-> report_disconnect()) we will call the new Transporter method to releaseAfterDisconnect() which will release these objects as well. Lifecycle of the NdbSocket::mutex is also changed, such that it is not released by a 'close' any longer. It will exist until NbdSocket is destructed. The Guard2's used with mutex are changed to use Guard instead, which require the mutex to exists. -> Guard2 just skipped the locking if there were no mutex. -> The NdbSocket::enable_locking() and ::disable_locking() methods became obsolete. Other related and required changes: - Fixed testSecureSocket.cpp: -- build break. -- Added missing stopSession(), needed to close the NdbSocket. -- Removed 'locking' argument to several c'tors and methods (Locking is now implicit if SSL is used) -- Had to disable a test case using SSL (locking) and 'blocking' communication. -> Need to discuss, do not think this combination can ever work. - Introduced missing 'lifecycle handling' in Loopback and SHM_Transporter. -- Introduced Transporter::disconnectImpl() and ::releaseAfterDisconnect() implemennting the 'lifecycle' of 'theSocket' -- Let TCP, SHM and Loopback_Transporter implementation use the above methods. - Removed some more Transporter locks: -- SHM and Loopback will not need lock in disconnectImpl() when implemented as only a 'shutdown', ... assuming that disconnecting protocol is followed (Multi-transporters breaks it ...!) -- During connection setup the transporter is not yet public available, thus locks should not be needed - EPOLL handling can now receive a 'EPOLLHUP' event when from a socket being shutdown. Handling of that event need to 'unlisten' to events on that socket. Note that the disconnect of a MultiTransporter does not follow the disconnect 'protocol'. The transporters being 'part of' the MultiTransporter was closed directly when the node communication itself has reached the DISCONNECTED state. At this stage the MultiTransporter has attempted to stop communication on the underlying transporters. However it is propably not guaranteed, and the two-step disconnecting-and-shutdown -> disconnected-and-close protocol *is* not followed. As a temporary workaround this patch introduce Transporter::forceUnsafeDisconnect() in order to make it clear where MultiTransporters are disconnected without following this protocol. Implementation of that method is similar to the 'old' doDisconnect() where we closed and released the transporter resources without waiting for the DISCONNECTED state to be reached. Bug 35750394 'MultiTransporter should follow the DISCONNECT protocol' is intended to fix this longer term. Change-Id: I19d201ddd42596da6eac071f444bdae367586b65

In transporter socket authentication, suppress the "negotiation failed" error message that can occur as a normal part of node startup. The remaining conditions that might be logged are all unexpected conditions. The bug raises some issues that are not addressed here, including the general question of where these error messages are directed to, and the fact that the message does not provide any context about the identity of the peer node. Change-Id: Ie99348110e80b7c1a663f61da3c0c3e862b07765

…onnected mgm handle Add missing calls to ndb_mgm_disconnect and remove a redundant call to ndb_mgm_connect. Fail early to avoid reuse mgm handle without disconnect. Add missing call to NdbSocket::close. Impacted test cases are: testMgm -n ApiTimeoutBasic testMgm -n ApiGetConfigTimeout testMgm -n ApiMgmStructEventTimeout test_event -n getEventBufferUsage3 test_event -n getEventBufferHighUsage And MTR test ndb.test_mgm since it includes testMgm. Change-Id: I3ab4604a80251ae01004fc69ffec11f42d6f1cbe

Replace "FIXME" comment with descriptive comment. Change-Id: If55cf98f02e0448736f2295a1195223b19315ba1

…ull` Fix the following crash: ``` mysqld-debug: /data/mysql-server/8.x/sql/sql_prepare.cc:2957: bool Prepared_statement::check_parameter_types(): Assertion `false' failed. 2024-01-19T12:14:02Z UTC - mysqld got signal 6 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. BuildID[sha1]=5373427ef47506bdab783dcd1f57d3f399fb6e64 Server Version: 8.0.36-28-debug Source distribution ```

Fix the following issue: ``` 273: /data/mysql-server/8.x/router/tests/helpers/process_manager.cc:732: Failure 273: Expected equality of these values: 273: expected_exit_status 273: Which is: Exit(0) 273: result 273: Which is: Exit(128) 273: Process 1325111 exited with Exit(128) 273: /data/mysql-server/8.x/router/tests/helpers/process_manager.cc:669: Failure 273: Failed 273: # Process: (pid=1325111) 273: /data/mysql-server/8.x-deb-gcc13-rocks/runtime_output_directory/mysqld --no-defaults --lc-messages-dir=/data/mysql-server/8.x-deb-gcc13-rocks/share --datadir=/tmp/mysqld-debug-0Pfsrx --plugin_dir=/data/mysql-server/8.x-deb-gcc13-rocks/plugin_output_directory --log-error=/tmp/mysqld-debug-0Pfsrx/mysqld-0.err --port=11010 --socket=/tmp/mysqld-debug-0Pfsrx/mysql.sock --mysqlx-port=11011 --mysqlx-socket=/tmp/mysqld-debug-0Pfsrx/mysqlx.sock --secure-file-priv=NULL --innodb_redo_log_capacity=8M --innodb_autoextend_increment=1 --innodb_buffer_pool_size=5M --innodb_use_native_aio=0 --gtid_mode=ON --enforce_gtid_consistency=ON --relay-log=relay-log 273: 273: ## Console output: 273: 273: /data/mysql-server/8.x-deb-gcc13-rocks/runtime_output_directory/mysqld could not be executed: No such file or directory (errno 2) 273: 273: ## Log content: 273: 273: <THIS ERROR COMES FROM TEST FRAMEWORK'S get_file_output(), IT IS NOT PART OF PROCESS OUTPUT: Could not open file '/tmp/mysqld-debug-0Pfsrx/mysqld-0.err' for reading: basic_ios::clear: iostream error> 273: 273: /data/mysql-server/8.x/router/tests/helpers/process_manager.cc:732: Failure 273: Expected equality of these values: 273: expected_exit_status 273: Which is: Exit(0) 273: result 273: Which is: Exit(128) 273: Process 1325111 exited with Exit(128) ```

…-8.0.36-28

…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command #13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans #13 trans_commit #14 Xid_log_event::do_commit #15 Xid_apply_log_event::do_apply_event_worker #16 Slave_worker::slave_worker_exec_event #17 slave_worker_exec_job_group #18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.

… post fix Fix RelWithDebInfo builds that have DEBUG_EXTNAME=ON set by default.

…back

…itches-my.cnf PS-9062 update alternatives switches my.cnf

…8.0-failing-upgrade-on-debian-11 PS-8873 percona-server-server failing upgrade on debian 11

…-action-stop-failed-pre-installation-fails-during-re-install PS-9064 initscript mysqlrouter action stop failed

…bles https://perconadev.atlassian.net/browse/PS-9083 When slow log was enabled with --long_query_time option set to 0 (or close value) and with --log_slow_verbosity option including "query_info" mode, server might have crashed when a protocol command other than normal statement execution (COM_QUERY or mysql_query() in C API parlance) was executed in a connection which earlier was used to rn some statement. For example, this sometimes happened for COM_QUIT command issue during graceful disconnect. The crash was caused by attempt to write to slow log list of query tables for such non-COM_QUERY command. To do so we tried to iterate through LEX::query_tables list, which in case of such commands might contain garbage left from execution of earlier statements in the connection. This fix solves the problem by avoiding writing to slow log list of query tables for server commands other than COM_QUERY (for most of them it makes no sense or not easily available).

PS-9018: Replica stalls completely when log_replica_updates=0 and a local DDL executed

https://perconadev.atlassian.net/browse/PS-9069 The test add_index_inplace_sstfilewriter fails with the following error when run --repeat=N --parallel=M where N > M even though the primary key columns are indeed sorted. CURRENT_TEST: rocksdb.add_index_inplace_sstfilewriter mysqltest: At line 48: Query 'LOAD DATA INFILE '$file' INTO TABLE t1' failed. ERROR 8000 (HY000): Rows must be inserted in primary key order during bulk load operation This happens because once the test_loadfile.txt file is generated, it is not cleared/deleted. So, the subsequent runs of the testcase use the same existing file to fill in the data and 1500000 rows are duplicated and gets appended to the contents of the file. So, when the testcase is executed for the subsequent runs, the rows of the duplicated data will cause the contents of the file to NOT be sorted. The issue is fixed by deleting the test_loadfile.txt file once the data is loaded into the database tables.

PS-9069 Merge MySQL 8.0.36 - Fix rocksdb.add_index_inplace_sstfilewriter

https://perconadev.atlassian.net/browse/PS-9048 Fixed problem with 'fts_index_fetch_nodes()' function not being able to properly handle situations when 'word' parameter contained special characters used by 'LIKE' clauses ('%' and '_'). Introduced additional boolean parameter 'exact_match' that instructs this function to use either 'WHERE word = :word' or 'WHERE word LIKE :word' clauses when selecting records from internal FTS tables. We call 'fts_index_fetch_nodes()' with 'exact_match' set to 'true' only from 'fts_optimize_words()' (when we perform 'OPTIMIZE TABLE' under 'innodb_optimize_fulltext_only' enabled). In every other place * fts_query_difference() * fts_query_intersect() * fts_query_union() * fts_query_phrase_search() where we indeed need pattern matching we call this function with 'exact_match' set to 'false' (instructing the function to use the 'LIKE' clause). Added six new MTR test cases: * 'innodb_fts.percona_ft_special_chars_default_ewc_on' * 'innodb_fts.percona_ft_special_chars_default_ewc_off' * 'innodb_fts.percona_ft_special_chars_ngram_ewc_on' * 'innodb_fts.percona_ft_special_chars_ngram_ewc_off' * 'innodb_fts.percona_ft_special_chars_mecab_ewc_on' * 'innodb_fts.percona_ft_special_chars_mecab_ewc_off' which reproduce original crash scenario using default / ngram / mecab parsers, under both 'ft_query_extra_word_chars' set to 'ON' and 'OFF'. These tests cases also check parsing / querying strings containing various special characters. Please note that they have "result mismatch" status strings recorded in the '.result' files for certain combinations of special characters and this is expected behavior. Introduced two new MTR helper '.inc' files: * percona_install_mecab_plugin.inc * percona_uninstall_mecab_plugin.inc which help install / uninstall the mecab plugin along with creating / removing its settings file. 'innodb_fts.percona_mecab_null_character' MTR test case reworked with these two include files.

…cent_ngram

…t_ngram PS-9048 fix 8.0: Fixed problem with percent character in n-grams

PS-9103 Add glibc.34 debug tarball

percona-ysorokin

LGTM

Ole John Aske and others added 30 commits September 26, 2023 16:01

Bug#35845622 return NDBT_SKIPPED from skipped testNodeRestart tests

825aa41

As subject says: Patch will ensure that NDBT_SKIPPED is returned from test cases being skipped due to config needs not being meet. Change-Id: Ia037604701cd6a39bc40682cfe532bd65dad9b22

Merge branch 'mysql-5.7-cluster-7.6' into mysql-8.0

6a4f234

Change-Id: I6bacf52751d564267b419607e35f8755e8f12eed

Merge branch 'mysql-5.7-cluster-7.6' into mysql-8.0

00f89ba

Change-Id: I12e5e75ba31e8fd8af1eb79835148ea599a33114

Bug#35657932 Ndb thd memory usage self-check too strict

4402eaa

Thd memory usage guard around injection of an incident is too strict as this code path invokes file rotation etc and may increase thd memory usage. Change-Id: Iff21da1f7d61276b3ffbea3b8f7441088031342f

Bug#35855488 Build on Ubuntu 23.10 [noclose]

5edded8

Backport -Wno-alloc-size-larger-than suppressions from trunk. from the patch for: Bug#34099162 Add support for mold linker Change-Id: I3996c472b423daaf3002c477ccb757d2e55846f9

Null-merge from 7.6

ce700ef

Change-Id: I9548896ffdf40c8dc3f40a0571cd461d49c0e7e7

Null-merge from mysql-5.7-cluster-7.6 ..

6531a44

Change-Id: If43c680347087182cb12b9eddf7d332f9482281d

WL#15524 Patch #2 TLS-safe upgrade of mgm socket to transporter

4468712

This changes TransporterRegistry::connect_ndb_mgmd() to return NdbSocket rather than ndb_socket_t. Back-ported from mysql-trunk. Change-Id: Ic3b9ccf39ec78ed25705a4bbbdc5ac2953a35611

WL#15130 Socket-level TLS patch #1: class NdbSocket

de0ff0f

Post push fix. Make NdbSocket::ssl_readln return 0 on timeout. Change-Id: I4cad95abd319883c16f2c28eff5cf2b6761731d6

Bug#35703454 NdbSocket refactoring - remove From::Existing [NOCLOSE]

909c4cb

Bug#35692840 Disable NDB SSL socket table Remove NdbSocket::From enum and init_from_native and init_from_new methods. Remove the SSL socket table. Change-Id: I6c0023ec1e3b06e519988d9388440d577e6f2e84

Bug#35703454 NdbSocket refactoring - move close calls [NOCLOSE]

4ac04c7

Post push fix. Add dummy implementation for NdbSocket::ssl_shutdown if built against older than OpenSSL 1.1.1. Change-Id: I0209c58aa5c565a5d959b30bb782d14f4950d4ce

Fixup to backport of 'WL#15524 patch#2'

59bdabd

Replace "FIXME" comment with descriptive comment. Change-Id: If55cf98f02e0448736f2295a1195223b19315ba1

inikep and others added 24 commits January 22, 2024 16:56

Implemented PS-9069 (Merge MySQL 8.0.36) - conflicts solved

fb2fd30

PS-9069: Merge MySQL 8.0.36 - Fix rocksdb.rocksdb_set_var_extension

2d2c702

Merge remote-tracking branch 'percona/8.0' at b3c1613377 into release…

3d612e1

…-8.0.36-28

PS-9069: Merge MySQL 8.0.36 - Fix "routertest_integration_routing*" -…

d18e7d3

… post fix Fix RelWithDebInfo builds that have DEBUG_EXTNAME=ON set by default.

PS-9062 update-alternatives switches my.cnf to /etc/mysql/my.cnf.fall…

d06c470

…back

PS-9062 update-alternatives switches my.cnf to /etc/mysql/my.cnf.fall…

9806250

…back

Merge pull request #5214 from adivinho/PS-9062-update-alternatives-sw…

8fddf75

…itches-my.cnf PS-9062 update alternatives switches my.cnf

PS-9059 Not all pro packages have pro suffix (#5216)

9061b43

PS-8873 percona-server-server failing upgrade on debian 11

65b08a0

Merge pull request #5217 from adivinho/PS-8873-percona-server-server-…

3627f08

…8.0-failing-upgrade-on-debian-11 PS-8873 percona-server-server failing upgrade on debian 11

PS-9064 initscript mysqlrouter action stop failed

af2a4ae

Merge pull request #5218 from adivinho/PS-9064-initscript-mysqlrouter…

4ca1f4a

…-action-stop-failed-pre-installation-fails-during-re-install PS-9064 initscript mysqlrouter action stop failed

Merge pull request #5206 from venkatesh-prasad-v/PS-9018-8.0-gca

7387d3d

PS-9018: Replica stalls completely when log_replica_updates=0 and a local DDL executed

Merge pull request #5233 from VarunNagaraju/release-8.0.36-28

9932241

PS-9069 Merge MySQL 8.0.36 - Fix rocksdb.add_index_inplace_sstfilewriter

Merge branch 'dev/PS-9048-5.7-percent_ngram' into dev/PS-9048-8.0-per…

5b0ef29

…cent_ngram

Merge pull request #5230 from percona-ysorokin/dev/PS-9048-8.0-percen…

a43a4cf

…t_ngram PS-9048 fix 8.0: Fixed problem with percent character in n-grams

PS-9103 Add glibc.34 debug tarball

ef9e5a7

Merge pull request #5237 from Percona-Lab/release-8.0.36-28

47601f1

PS-9103 Add glibc.34 debug tarball

adivinho requested review from a team, percona-ysorokin and VarunNagaraju March 4, 2024 11:36

EvgeniyPatlan approved these changes Mar 4, 2024

View reviewed changes

percona-ysorokin approved these changes Mar 14, 2024

View reviewed changes

adivinho merged commit 6af1faf into 8.0 Mar 14, 2024
21 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PS-9104 Release tasks for PS-8.0.36 #5245

PS-9104 Release tasks for PS-8.0.36 #5245

adivinho commented Mar 4, 2024

percona-ysorokin left a comment

PS-9104 Release tasks for PS-8.0.36 #5245

PS-9104 Release tasks for PS-8.0.36 #5245

Conversation

adivinho commented Mar 4, 2024

percona-ysorokin left a comment

Choose a reason for hiding this comment