-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PS-9104 Release tasks for PS-8.0.36 #5245
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
As subject says: Patch will ensure that NDBT_SKIPPED is returned from test cases being skipped due to config needs not being meet. Change-Id: Ia037604701cd6a39bc40682cfe532bd65dad9b22
Problem: On a distributed setup, where on each site every mysqld replicates changes from each other but are set to not log these updates on their own binlog, a Table_map entry was eventually logged. This causes the replication applier of a 3rd server to fail with ER_NO_SUCH_TABLE. Analysis: 1. The Table_map events belonging to changes that were applied by a replica were being injected without looking at the log_replica_updates option. The code was only guarding against the set of events within the epoch that had the NDB_ANYVALUE_NOLOGGING_CODE flag set in the anyValue. However, this flag is only set for the specific cases that binlog is disabled. 2. The cumulative_any_value of an event operation, a bitwise AND operation over the any-value of all events for a given table belonging to the same epoch, was not being used correctly for a number of reasons: i) It was being relied on to give an overview of who had made changes on the table over the course of the epoch; ii) It could contain reserved values, whose bits are the inverse of the bits used to record server ids; iii) It was being reduced to a bitwise AND of all recorded values on each event, thus causing cancellation between inverse bits. Solution: 1. The injection of a Table_map entry for changes occured in the table is now guarded with the log_replica_updates option. 2. The cumulative_any_value is now created with a bitwise OR operation so that information is not lost. Additionally, it is filtered by the plugin event operation on every processing of the event data so that important data is retrieved, such as if the update was made by a replica or if it had a no-logging flag set. This data is then used for a correct decision on whether to inject a Table_map event. Changed rpl_conflict tests' setup since mysqld.1.1 must log replica updates and the default is turning it off. Change-Id: I978325fe01435a7d0e0bd4249979b749731f8fbb
Root cause to the problem where ssl_write() may get a smaller send buffer on retries is that consolidate() calculated how many *full* iovec buffers we could fit into buff[MaxTlsRecord]. Thus if the last iovec[] buffer size had increased on a write retry, consolidate would calculate a smaller buff[] size on the retry if the last (now larger) iovec[] buffer could not fit anymore. That was likely based on a misconseption that iovec_data_sent(), which is used to report the buffer size consumed by 'writev()', only handled the consumption of a number of full iovec[] buffers. However, that is not the case: any number of bytes may be reported as consumed, and the iovec[]' will be correctly updated accordingly. Patch allowes the full size of the buff[MaxTlsRecord] to be used for packing, with a partial copy from the last iovec[] in order to completely fill buff[]. Thus the case where we could no longer fit data from the last iovec[] buffer is avoided. Change-Id: I4d0d7d124421d510ab80e0a2566f6565785ad6d3
…restart process The management server maintains a cache mapping nodeids to host address presentation format strings (e.g. 33 -> "47.184.12.55") These cached strings are returned to MGM protocol requests for information about connected nodes. The cache is currently : - Populated when an MGM protocol request is made for a destination which is connected at the transporter level, and has no cached address string - Cleared when processing a NODE_FAILREP or NF_COMPLETEREP signal This is problematic when the disconnection of a remote node is slow, as the following sequence is possible : 1 Data node 5's address is cached 'host A' 2 Data node 5 vanishes (network loss) 3 Data node 5's failure is detected by MGMD and a disconnect is requested 4 Other data nodes detect node 5's failure, send NODE_FAILREP + NF_COMPLETEREPs to MGMD 5 MGMD processes NODE_FAILREP + NF_COMPLETEREP, clear's node 5's cached address 6 MGMD receives MGM protocol request for data node 5's address 7 Data Node 5 is *still connected*, and so its address is looked up and cached 8 Eventually data node 5 disconnect is processed 9 Later data node 5 connects from a new address 'host B' 10 MGMD received MGM protocol request for data node 5's address - returns cached 'hostA' address The problematic events are : - Slow disconnect processing (3 .... 8) - MGM protocol request arrives during slow disconnect (6,7), after node failure signal handling (5) This situation is avoided by adding a cached node address clear step as part of connecting to a node (e.g. at step 9) so that any previously cached address is discarded and a new address can be used. Change-Id: I88841acd44e654ebbdb9ce4c013f26324df45d30
Change-Id: I6bacf52751d564267b419607e35f8755e8f12eed
Post push patch Fix printf specifier for printing a size_t data type. The size_t has a bit length depending in the underlying platform / compiler. The correct specifier to be used for such a type is '%zu' according to documentation Change-Id: I2135e00aa6e69df94300970b5578b3c839b63e6d
Change-Id: I12e5e75ba31e8fd8af1eb79835148ea599a33114
Fix alloc-size-larger-than warning in Qmgr by explicitly only allocate non-zero sized array. Compiler can not deduce that that is actually always the case. The confusing warning fixed is: storage/ndb/src/kernel/blocks/qmgr/QmgrInit.cpp: In member function 'initData': storage/ndb/src/kernel/blocks/qmgr/QmgrInit.cpp:204:60: warning: argument 1 value '18446744073709551615' exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=] 204 | receivedProcessInfo = new ProcessInfo[numOfApiAndMgmNodes]; | ^ /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/new:128:26: note: in a call to allocation function 'operator new []' declared here 128 | _GLIBCXX_NODISCARD void* operator new[](std::size_t) _GLIBCXX_THROW (std::bad_alloc) | ^ Change-Id: Ie7aea5e8681db1c420d2a0e80d1cda8319f74a2b
Handling of ENOMEM errors returned from writev() and recv() in resp. TCP_Transporter::doSend() and TCP_Transporter::doReceive() was introduced by the patch: ....................... commit 367d9278a182676d8ba6a4d768ae9e3eae05c440 Author: Mikael Ronström <[email protected]> Date: Fri Jun 28 14:44:26 2019 +0200 BUG#29887068 .... Handling ENOMEM with resends of smaller size when send attempted. ...................... The ENOMEM handling attempted a resend with smaller size by direcly reducing the iov_len of the first iovec[] buffer. At best this seemed to have solved a corner case where we would run out of memory anyway real soon. The testcase which reportedly was solved was a manually run attempt to test many-nodes functionality on a small set of machines, perhaps explaining the memory pressure issues. Lots of others overload issues were also observed in similar tests, which could be claimed to use a configuration we really do not recomend nor support In lack of evidence that the ENOMEM handling had any value, and as we would like to keep the code 'clean', this patch revert the ENOMEM handling introduced by the patch refered to above. Change-Id: Ib5a269f0b4a41a6d683d27f49b575616d82d3822
Thd memory usage guard around injection of an incident is too strict as this code path invokes file rotation etc and may increase thd memory usage. Change-Id: Iff21da1f7d61276b3ffbea3b8f7441088031342f
Fixing compiler warning by adding a check to ensure accessing array element within the range. Reviewed by: Mauritz Sundell <[email protected]> Change-Id: I55a0a72b5f7733e069bcd229ff49e45ed3bf38cb (cherry picked from commit f21d8a05d5a8180cd0d2e8b49833532aaedf8f6a)
sql-common/oci/signing_key.cc:87:42: error: ignoring attributes on template argument int (*)(FILE*) [-Werror=ignored-attributes] std::unique_ptr<FILE, decltype(&fclose)> fp(fopen(file_name.c_str(), "rb"), This is the declaration for Ubuntu 23.10 extern int fclose (FILE *__stream) __nonnull ((1)); Use a scope guard closure, rather than a unique_ptr with custom deleter. Change-Id: Iecb41b238dc9b72d87f33f54aea272391f3b5977 (cherry picked from commit 5d2e8c1aadb23c9d9acba692d2dd7afb24a0897a)
For a Debug build, with -O2, we get: In file included from /usr/include/string.h:548, from extra/googletest/googletest-release-1.12.0/googletest/include/gtest/xxxxxx/gtest-port.h:262, from extra/googletest/googletest-release-1.12.0/googletest/include/gtest/gtest-message.h:55, from extra/googletest/googletest-release-1.12.0/googletest/include/gtest/gtest-assertion-result.h:46, from extra/googletest/googletest-release-1.12.0/googletest/include/gtest/gtest.h:59, from unittest/gunit/item_json_func-t.cc:23: In function void* memset(void*, int, size_t), inlined from void TRASH(void*, size_t) at include/memory_debugging.h:72:9, inlined from static void Item::operator delete(void*, size_t) at sql/item.h:962:10, inlined from void item_json_func_unittest::BM_JsonSearch(size_t) at unittest/gunit/item_json_func-t.cc:721:78: /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59:33: error: <anonymous> may be used uninitialized [-Werror=maybe-uninitialized] 59 | return __builtin___memset_chk (__dest, __ch, __len, | ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~ 60 | __glibc_objsize0 (__dest)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~ unittest/gunit/item_json_func-t.cc: In function void item_json_func_unittest::BM_JsonSearch(size_t): unittest/gunit/item_json_func-t.cc:721:78: note: <anonymous> was declared here 721 | auto search = new Item_func_json_search(table.in_use, new Item_field(&field), | ^ gcc somehow thinks we may call this: static void operator delete(void *ptr [[maybe_unused]], size_t size [[maybe_unused]]) { TRASH(ptr, size); } The fix is to evaluate 'new Item_field(&field)' separately. Change-Id: I5d73561f900043092bcd982fffc8462707627f1a (cherry picked from commit e536c8a446b97ac9c90ce957ba1c34547b798993)
Backport -Wno-alloc-size-larger-than suppressions from trunk. from the patch for: Bug#34099162 Add support for mold linker Change-Id: I3996c472b423daaf3002c477ccb757d2e55846f9
Fixing compiler warning by adding a check to ensure accessing array element within the range. Reviewed by: Mauritz Sundell <[email protected]> (backported from commit f21d8a05d5a8180cd0d2e8b49833532aaedf8f6a) Change-Id: I5c56b5ef1fa73908a3bd5c070ab7106fee4cc9ad
Change-Id: I9548896ffdf40c8dc3f40a0571cd461d49c0e7e7
…ODE RESTART Post-push fix for 7.6: Methods - bool first(Ptr<T>& p) - bool next(Ptr<T>& p) added to DLHashTable. Both methods are already present in 8.0+ version. Fixed the way report_subscription_set iterates over the subscription list. Now iteration uses the new added versions of the first/next methods. This way iteration is as in 8.0+. Change-Id: Ic60f359a4f18e4fa95be73fcb8d72343ad113af3
Change-Id: If43c680347087182cb12b9eddf7d332f9482281d
Move client_authenticate() out of SocketClient::connect() (which returns void) into a separate SocketClient::authenticate() method which can return a value. In SocketAuthenticator, change the signature of the authentication routines to return an int (which can represent a result code) rather than a bool. Results less than AuthOk represent failure, and results greater than or equal to AuthOk represent success. Remove the username and password variables from SocketAuthSimple; make them constant strings in the implementation. There are no functional changes. Change-Id: I4c25e99f1b9b692db39213dfa63352da8993a8fb
This changes TransporterRegistry::connect_ndb_mgmd() to return NdbSocket rather than ndb_socket_t. Back-ported from mysql-trunk. Change-Id: Ic3b9ccf39ec78ed25705a4bbbdc5ac2953a35611
…OSE] NdbSocket::close should always invalidate the socket file descriptor. Renamed invalidate into a private method invalidate_socket_handle and used that in close and other methods. Change-Id: Ic02406a72de3e452287337dcc87f8a0d361ecf8a
In preparation for next patch that pass NdbSocket either by const reference (const NdbSocket&) for temporary use or rvalue reference (NdbSocket&&) for transfer ownership, calls to close must move since only owner may call close (or any other non const method). Calls to NdbSocket::close are moved from caller to callee for functions that takes ownership using NdbSocket&& since the caller must not use the moved socket after call. And since callee will to the close there is no need to pass back close with reset flag to caller and that is removed from transporter_connect and connect_server methods. Calls to NdbSocket::close are moved from callee to caller for functions that are changed to "borrow" socket using const NdbSocket& since callee can not call non const methods. To keep the semantics of the function making the socket unusable on failure a new method NdbSocket::shutdown is introduced that only stop further communication on socket but still leave socket open (until caller calls close). Change-Id: I9666ba76e03e268406677c1e32d76d7d219e9a56
Back-ported for 8.0: Some parts of "WL#15524 Patch #16 Use TLS for Event Listener" have been merged into this patch. Use rvalue references (NdbSocket&&) when passing ownership of NdbSocket to function, use const references (const NdbSocket&) otherwise. When function creates a new instance of NdbSocket return it as a return value rather than via an out parameter. Implementation of struct ndb_logevent_handle is rewritten from being a plain C struct to have proper constructors for both taking ownership of socket and only borrow socket, and instead of using malloc and free function new and delete are used. Refactoring made SecureSocketOutputStream, BufferedSecureOutputStream, and SocketInputStream redundant and they are removed. Change-Id: I60caae52f2b0997b78ae179e5e9b47b6c99073b8
Post push fix. Make NdbSocket::ssl_readln return 0 on timeout. Change-Id: I4cad95abd319883c16f2c28eff5cf2b6761731d6
Bug#35692840 Disable NDB SSL socket table Remove NdbSocket::From enum and init_from_native and init_from_new methods. Remove the SSL socket table. Change-Id: I6c0023ec1e3b06e519988d9388440d577e6f2e84
Post push fix. Add dummy implementation for NdbSocket::ssl_shutdown if built against older than OpenSSL 1.1.1. Change-Id: I0209c58aa5c565a5d959b30bb782d14f4950d4ce
…issues Disconnect of transporters were not 'clean': disconnectImpl() is used to DISCONNECTING the connection, while other threads might still send/recv on the same connection. The implementation of disconnectImpl() used to call NdbSocket::close() directly, such that: - close() was called on the underlying socket, which also released the descriptor for reuse. As the descriptor was still referred by other threads, they may end up referring a descriptor reused to referring e.g. a file. - With introduction of TLS in NdbSockets, close() released the ssl object as well. The ssl object may also be referred by other threads attempting send/recv. Patch avoid releasing these objects as part of the DISCONNECTING phase: When we disconnect (-> ::disconnectImpl()) a transportrer, we will now only 'shutdown()' the NdbSocket which keep the file discriptor and ssl object in a 'shutdown' state When the transporter reach the DISCONNECTED state (-> report_disconnect()) we will call the new Transporter method to releaseAfterDisconnect() which will release these objects as well. Lifecycle of the NdbSocket::mutex is also changed, such that it is not released by a 'close' any longer. It will exist until NbdSocket is destructed. The Guard2's used with mutex are changed to use Guard instead, which require the mutex to exists. -> Guard2 just skipped the locking if there were no mutex. -> The NdbSocket::enable_locking() and ::disable_locking() methods became obsolete. Other related and required changes: - Fixed testSecureSocket.cpp: -- build break. -- Added missing stopSession(), needed to close the NdbSocket. -- Removed 'locking' argument to several c'tors and methods (Locking is now implicit if SSL is used) -- Had to disable a test case using SSL (locking) and 'blocking' communication. -> Need to discuss, do not think this combination can ever work. - Introduced missing 'lifecycle handling' in Loopback and SHM_Transporter. -- Introduced Transporter::disconnectImpl() and ::releaseAfterDisconnect() implemennting the 'lifecycle' of 'theSocket' -- Let TCP, SHM and Loopback_Transporter implementation use the above methods. - Removed some more Transporter locks: -- SHM and Loopback will not need lock in disconnectImpl() when implemented as only a 'shutdown', ... assuming that disconnecting protocol is followed (Multi-transporters breaks it ...!) -- During connection setup the transporter is not yet public available, thus locks should not be needed - EPOLL handling can now receive a 'EPOLLHUP' event when from a socket being shutdown. Handling of that event need to 'unlisten' to events on that socket. Note that the disconnect of a MultiTransporter does not follow the disconnect 'protocol'. The transporters being 'part of' the MultiTransporter was closed directly when the node communication itself has reached the DISCONNECTED state. At this stage the MultiTransporter has attempted to stop communication on the underlying transporters. However it is propably not guaranteed, and the two-step disconnecting-and-shutdown -> disconnected-and-close protocol *is* not followed. As a temporary workaround this patch introduce Transporter::forceUnsafeDisconnect() in order to make it clear where MultiTransporters are disconnected without following this protocol. Implementation of that method is similar to the 'old' doDisconnect() where we closed and released the transporter resources without waiting for the DISCONNECTED state to be reached. Bug 35750394 'MultiTransporter should follow the DISCONNECT protocol' is intended to fix this longer term. Change-Id: I19d201ddd42596da6eac071f444bdae367586b65
In transporter socket authentication, suppress the "negotiation failed" error message that can occur as a normal part of node startup. The remaining conditions that might be logged are all unexpected conditions. The bug raises some issues that are not addressed here, including the general question of where these error messages are directed to, and the fact that the message does not provide any context about the identity of the peer node. Change-Id: Ie99348110e80b7c1a663f61da3c0c3e862b07765
…onnected mgm handle Add missing calls to ndb_mgm_disconnect and remove a redundant call to ndb_mgm_connect. Fail early to avoid reuse mgm handle without disconnect. Add missing call to NdbSocket::close. Impacted test cases are: testMgm -n ApiTimeoutBasic testMgm -n ApiGetConfigTimeout testMgm -n ApiMgmStructEventTimeout test_event -n getEventBufferUsage3 test_event -n getEventBufferHighUsage And MTR test ndb.test_mgm since it includes testMgm. Change-Id: I3ab4604a80251ae01004fc69ffec11f42d6f1cbe
Replace "FIXME" comment with descriptive comment. Change-Id: If55cf98f02e0448736f2295a1195223b19315ba1
…ull` Fix the following crash: ``` mysqld-debug: /data/mysql-server/8.x/sql/sql_prepare.cc:2957: bool Prepared_statement::check_parameter_types(): Assertion `false' failed. 2024-01-19T12:14:02Z UTC - mysqld got signal 6 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. BuildID[sha1]=5373427ef47506bdab783dcd1f57d3f399fb6e64 Server Version: 8.0.36-28-debug Source distribution ```
Fix the following issue: ``` 273: /data/mysql-server/8.x/router/tests/helpers/process_manager.cc:732: Failure 273: Expected equality of these values: 273: expected_exit_status 273: Which is: Exit(0) 273: result 273: Which is: Exit(128) 273: Process 1325111 exited with Exit(128) 273: /data/mysql-server/8.x/router/tests/helpers/process_manager.cc:669: Failure 273: Failed 273: # Process: (pid=1325111) 273: /data/mysql-server/8.x-deb-gcc13-rocks/runtime_output_directory/mysqld --no-defaults --lc-messages-dir=/data/mysql-server/8.x-deb-gcc13-rocks/share --datadir=/tmp/mysqld-debug-0Pfsrx --plugin_dir=/data/mysql-server/8.x-deb-gcc13-rocks/plugin_output_directory --log-error=/tmp/mysqld-debug-0Pfsrx/mysqld-0.err --port=11010 --socket=/tmp/mysqld-debug-0Pfsrx/mysql.sock --mysqlx-port=11011 --mysqlx-socket=/tmp/mysqld-debug-0Pfsrx/mysqlx.sock --secure-file-priv=NULL --innodb_redo_log_capacity=8M --innodb_autoextend_increment=1 --innodb_buffer_pool_size=5M --innodb_use_native_aio=0 --gtid_mode=ON --enforce_gtid_consistency=ON --relay-log=relay-log 273: 273: ## Console output: 273: 273: /data/mysql-server/8.x-deb-gcc13-rocks/runtime_output_directory/mysqld could not be executed: No such file or directory (errno 2) 273: 273: ## Log content: 273: 273: <THIS ERROR COMES FROM TEST FRAMEWORK'S get_file_output(), IT IS NOT PART OF PROCESS OUTPUT: Could not open file '/tmp/mysqld-debug-0Pfsrx/mysqld-0.err' for reading: basic_ios::clear: iostream error> 273: 273: /data/mysql-server/8.x/router/tests/helpers/process_manager.cc:732: Failure 273: Expected equality of these values: 273: expected_exit_status 273: Which is: Exit(0) 273: result 273: Which is: Exit(128) 273: Process 1325111 exited with Exit(128) ```
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans #9 trans_commit_implicit #10 mysql_create_like_table #11 Sql_cmd_create_table::execute #12 mysql_execute_command #13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish #9 ha_commit_low #10 trx_coordinator::commit_in_engines #11 MYSQL_BIN_LOG::commit #12 ha_commit_trans #13 trans_commit #14 Xid_log_event::do_commit #15 Xid_apply_log_event::do_apply_event_worker #16 Slave_worker::slave_worker_exec_event #17 slave_worker_exec_job_group #18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
… post fix Fix RelWithDebInfo builds that have DEBUG_EXTNAME=ON set by default.
…itches-my.cnf PS-9062 update alternatives switches my.cnf
…8.0-failing-upgrade-on-debian-11 PS-8873 percona-server-server failing upgrade on debian 11
…-action-stop-failed-pre-installation-fails-during-re-install PS-9064 initscript mysqlrouter action stop failed
…bles https://perconadev.atlassian.net/browse/PS-9083 When slow log was enabled with --long_query_time option set to 0 (or close value) and with --log_slow_verbosity option including "query_info" mode, server might have crashed when a protocol command other than normal statement execution (COM_QUERY or mysql_query() in C API parlance) was executed in a connection which earlier was used to rn some statement. For example, this sometimes happened for COM_QUIT command issue during graceful disconnect. The crash was caused by attempt to write to slow log list of query tables for such non-COM_QUERY command. To do so we tried to iterate through LEX::query_tables list, which in case of such commands might contain garbage left from execution of earlier statements in the connection. This fix solves the problem by avoiding writing to slow log list of query tables for server commands other than COM_QUERY (for most of them it makes no sense or not easily available).
PS-9018: Replica stalls completely when log_replica_updates=0 and a local DDL executed
https://perconadev.atlassian.net/browse/PS-9069 The test add_index_inplace_sstfilewriter fails with the following error when run --repeat=N --parallel=M where N > M even though the primary key columns are indeed sorted. CURRENT_TEST: rocksdb.add_index_inplace_sstfilewriter mysqltest: At line 48: Query 'LOAD DATA INFILE '$file' INTO TABLE t1' failed. ERROR 8000 (HY000): Rows must be inserted in primary key order during bulk load operation This happens because once the test_loadfile.txt file is generated, it is not cleared/deleted. So, the subsequent runs of the testcase use the same existing file to fill in the data and 1500000 rows are duplicated and gets appended to the contents of the file. So, when the testcase is executed for the subsequent runs, the rows of the duplicated data will cause the contents of the file to NOT be sorted. The issue is fixed by deleting the test_loadfile.txt file once the data is loaded into the database tables.
PS-9069 Merge MySQL 8.0.36 - Fix rocksdb.add_index_inplace_sstfilewriter
https://perconadev.atlassian.net/browse/PS-9048 Fixed problem with 'fts_index_fetch_nodes()' function not being able to properly handle situations when 'word' parameter contained special characters used by 'LIKE' clauses ('%' and '_'). Introduced additional boolean parameter 'exact_match' that instructs this function to use either 'WHERE word = :word' or 'WHERE word LIKE :word' clauses when selecting records from internal FTS tables. We call 'fts_index_fetch_nodes()' with 'exact_match' set to 'true' only from 'fts_optimize_words()' (when we perform 'OPTIMIZE TABLE' under 'innodb_optimize_fulltext_only' enabled). In every other place * fts_query_difference() * fts_query_intersect() * fts_query_union() * fts_query_phrase_search() where we indeed need pattern matching we call this function with 'exact_match' set to 'false' (instructing the function to use the 'LIKE' clause). Added six new MTR test cases: * 'innodb_fts.percona_ft_special_chars_default_ewc_on' * 'innodb_fts.percona_ft_special_chars_default_ewc_off' * 'innodb_fts.percona_ft_special_chars_ngram_ewc_on' * 'innodb_fts.percona_ft_special_chars_ngram_ewc_off' * 'innodb_fts.percona_ft_special_chars_mecab_ewc_on' * 'innodb_fts.percona_ft_special_chars_mecab_ewc_off' which reproduce original crash scenario using default / ngram / mecab parsers, under both 'ft_query_extra_word_chars' set to 'ON' and 'OFF'. These tests cases also check parsing / querying strings containing various special characters. Please note that they have "result mismatch" status strings recorded in the '.result' files for certain combinations of special characters and this is expected behavior. Introduced two new MTR helper '.inc' files: * percona_install_mecab_plugin.inc * percona_uninstall_mecab_plugin.inc which help install / uninstall the mecab plugin along with creating / removing its settings file. 'innodb_fts.percona_mecab_null_character' MTR test case reworked with these two include files.
…t_ngram PS-9048 fix 8.0: Fixed problem with percent character in n-grams
PS-9103 Add glibc.34 debug tarball
EvgeniyPatlan
approved these changes
Mar 4, 2024
percona-ysorokin
approved these changes
Mar 14, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.