-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Null merge PS 8.0.36-28 at https://github.com/percona/percona-server/commit/068b575514b into trunk #5290
Merged
oleksandr-kachan
merged 127 commits into
percona:trunk
from
oleksandr-kachan:null-merge-mysql-8.0.36-28
Apr 25, 2024
Merged
Null merge PS 8.0.36-28 at https://github.com/percona/percona-server/commit/068b575514b into trunk #5290
oleksandr-kachan
merged 127 commits into
percona:trunk
from
oleksandr-kachan:null-merge-mysql-8.0.36-28
Apr 25, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Problem: Running an ASAN build with DBUG_TRACE calls (--debug), a heap buffer overflow is detected on NDB_SHARE::create_key(). Analysis: On NDB_SHARE::create_key(), the DBUG_DUMP of the `m_buffer` field of NDB_SHARE_KEY is given with the size `size` which amounts to sizeof(NDB_SHARE_KEY) + buffer_size. Hence, DBUG_DUMP is called with sizeof(NDB_SHARE_KEY) extra bytes, caught by ASAN build. Solution: Since DBUG_DUMP is not necessary, because a DBUG_PRINT of the buffer is already present in the lines before, then it is removed. Change-Id: I92c17fb16b39d19a896a500b8ab1956addc16015
Some innodb tests don't run with nonstandard page sizes. Fix: skip the tests when run with page sizes that don't work. Change-Id: I4ccdddb981d93621cd259ca6cb9b1ede536b6987
A transporter may be set in an 'overloaded' state if a send attempt failed to send anything on it. That could e.g be to the OS-internal send buffers being filled up, possibly due to the receiver side not consuming fast enough / having full buffers as well. The handling of such overload is to let the send-thread taking a short 200us yield from the CPU if there is no other send work to be found. Right before the send-threads are yielding the CPU the yield method will check the callback method check_available_send_data() for any late arrived transporters having more data to send. That is effectively implemented by the send-threads having the flag 'bool m_more_trps' which is set every time a new transporter is entered into the queue with insert_trp(). It is also cleared if get_trp() could not find a TrpId to be returned. The special case that get_trp() had found only a delayed transporter is handled in handle_send_trp(). Here we cleared the 'm_more_trps' if we found a delay, and handle_send_thread was part of a send-thread (as opposed to an assist-send) Note that failing to clear the m_more_trps would result in the send-thread later not being able to yield the CPU. For the 'delayed-send with overload'-case however, handle_send did an early 'return false', without ever entering the code section where m_more_trps was cleared for a send-thread. This prevented the CPU yield in the overload situation. Patch is to combine the check for overload with the general check for delay to be canceled or m_more_trps to be cleared. Change-Id: Ied5cca628e10715510d88350140c7f4ce1a17f37
The "Bug45495" test in testMgmd fails under gcov testing. The test results show hundreds of lines of DEBUG log messages being written while a fiile_exists() call times out. This patch starts the mgmd servers without the --verbose flag, so that the DEBUG level log messages are suppressed. This patch is intended for both mysql-8.0 and mysql-trunk. A separate patch for this bug addresses failures in other tests and is intended for mysql-trunk only. Change-Id: I904de0a1926f3a151ee21c194659b61b2d19aed2
The Qmgr GSN_ISOLATE_ORD handling was modified to handle the larger node bitmap size necessary for supporting up to 144 data nodes. For compatibility QMGR uses the node version of the sending node to determine whether the incoming signal has its node bitmap in a long section or inline in the signal. The senderRef member of the incoming signal is used which is set by the signal originator. However in the context of ISOLATE_ORD, the original sender may be shutdown when ISOLATE_ORD is processed, in which case their node version may have been reset to zero, causing the inline bitmap path to be taken, resulting in incorrect processing. The signal handler is changed to use the presence of a long section to decide whether the incoming signal uses a long section to represent nodes to isolate or not. Change-Id: I1efe6c98c3f342770462d81ae00395b5a2eec7d7
…r.cc Issue: ha_index_end() is to be called only if the handler has been initialized this check was missing which led to the assertion failure. Fixed by adding the necessary condition. This is a back port of a change that was done as part of WL#15257 in 8.2.0 Change-Id: I2ec6717d081c872577aa731a40170d287bad8ad2
Change-Id: I9b80ed5ed08ebe8bf1b6835cfabec6695666c599
Problem: Running an ASAN build with DBUG_TRACE calls (--debug), a heap buffer overflow is detected on NDB_SHARE::create_key(). Analysis: On NDB_SHARE::create_key(), the DBUG_DUMP of the `m_buffer` field of NDB_SHARE_KEY is given with the size `size` which amounts to sizeof(NDB_SHARE_KEY) + buffer_size. Hence, DBUG_DUMP is called with sizeof(NDB_SHARE_KEY) extra bytes, caught by ASAN build. Solution: Since DBUG_DUMP is not necessary, because a DBUG_PRINT of the buffer is already present in the lines before, then it is removed. Change-Id: I649d66e2aa87951ada0483208b95e9757a97ebae
Change-Id: I8ce2a40889148c20f2bdf4a3cfd8808de5c2007f
Another attempt to fix this test failure. We need to also know which tableId we should crash on when commiting a ZUPDATE. Previous (failed!) patch is extended with using insertError2InNode() to insert the error code as well as the tableId to crash on as 'ERROR_INSERT_EXTRA' Dblqh::execCOMMIT() is enhanced to also check that tableId == ERROR_INSERT_EXTRA before CRASH_INSERTION(5048) Change-Id: If4e453bdf7400a4439465e3e0c0ad9ef7f91a46e
Post push patch, removing Multi_Transporter as a 'friend' Change-Id: I92a810dd04ebad203832332b6da685cfd3660dcb
There was a possible race between the start_clients_thread() and update_connections(), both seeing the same transporter in state DISCONNECTING. The design intention was that the start_clients_thread() should shutdown() the transporter, such that Transporter::isConnected() became 'false'. Upon seeing the !isConnected() state on a DISCONNECTING transporter, update_connections() should 'close' the transporter. However, as the start_clients_threads -> doDisconnect() code patch sets 'm_conncted = false' before doing the disconnectImpl() -> shutdown, that opened up a race where update_connections() could see the transporter as 'not-isConnected()' and thus close the transporter before it was even shutdown. With SSL enabled that resulted in that the SSL_context object was released by ::ssl_close(), and later referred by ssl_shutdown(). Without SSL enabled we still broke the intended call order of socket close vs shutdown. Patch ensure that disconnectImpl() is now completed before we set the m_connected state to 'false'. Thus update_connections will not be able to close the NdbSocket before it has been completely shutdown(). Note that this raises the theoretical possibilites of TransporterRegistry code paths ::poll*() and ::performReceive(), where we check: 'if (is_connected(trp_id) && t->isConnected()) {' ... Now possibly seeing a Transporter as 'isConnected()' when disconnectImpl() has been performed, without 'm_coonncted' not having been set to 'false' yet. However it seems that the above check is too strict: When the transporter 'protocol' for doing disconnect is followed, there will not exist cases where 'performState[] == CONNECTED' and 'not isConnected())' Note that doDisconnect(), which sets 'm_connected = false', is only called from start_clients_thread() when transporter being in state DISCONNECTING or DISCONNECTED. Thus the above assumption should be safe. Change-Id: I9444ae74b93149deed8f4197809f6dd8dce44158
when input some sql statement When a shared materialized table for a cte is marked for re-materialization because it is a lateral derived table and the counters for the invalidators do not match for the second reference to the cte, it clears all references to the cte which resets the cursors positioned for the table which materialzed the cte table leading to problems when reading from that table. More detailed analysis in the bug page and the test file (w.r.t the failing query). The fix is to not re-materialize the cte table when we are using the shared materialized derived table. We also update the invalidator counters while at it. Change-Id: I5f9daeab03fee93354c0253fd55865f1a6a1e32c
… Multi_Transporter. Post push patch fixing a regression where TransporterRegistry::get_bytes_received() returned the number of 'sent' bytes instead. Change-Id: Icac3ed5d4014ef6f36b31ee4e4c8795cc2ecc601
post push patch to add comments Change-Id: I99ad8a65682eefec6b49f46b022e0e98f64442ba
The problem is in the function JOIN::destroy(), where we call update_used_tables() to reinstate used tables information after execution of a statement that has one or more const tables, and has thus been subject to const table elimination. During this, we may encounter a table, such as a special information schema, that has already been closed and subsequently has had its table pointer (in class Table_ref) set to nullptr. The solution to this is to delay the refreshing of used table information to the start of the next execution, just after tables have been opened and we know that all table objects are in a proper state. The performance impact of this seems to be negligible. Also added some necessary code to prevent that uninitialized command properties were copied in the case of as yet unprepared statements. Change-Id: Ib44ab2cf1e61c3b188e7ceb000adb1445c48979a
Context: Some NDBAPI tests uses ConfigValues::openSection() function in order to get/set config parameters. openSection has as argument the section 'type' (SYSTEM, NODE, CONNECTION ...) and the index to the section we are interested in. Section indexes are sequential, starting from zero and are ordered by nodeId BUT nodeId and index can be different. Problem: In some testes openSection is used with wrong arguments (nodeId is used instead index). Depending on the configuration this can lead openSection to open wrong/invalid section. Solution: Tests changed to always use index instead nodeId to openSection. In particular, when iterating over all sections looking to a parameter, iteration now starts from index 0, preventing any section from being ignored. Change-Id: I10468675b1ef07cf01a89961f5d5a84a498b4568
Context: Some NDBAPI tests uses ConfigValues::openSection() function in order to get/set config parameters. openSection has as argument the section 'type' (SYSTEM, NODE, CONNECTION ...) and the index to the section we are interested in. Section indexes are sequential, starting from zero and are ordered by nodeId BUT nodeId and index can be different. Problem: In some testes openSection is used with wrong arguments (nodeId is used instead index). Depending on the configuration this can lead openSection to open wrong/invalid section. Solution: Tests changed to always use index instead nodeId to openSection. In particular, when iterating over all sections looking to a parameter, iteration now starts from index 0, preventing any section from being ignored. Change-Id: Ib783490b651de9d4ae2627b5b3a3661530495318
Change-Id: I629dac790cf573232d3dedd62f9554c16644c744
Bug#35529968 - Hide the "CPACK_COMPONENT_GROUP_INFO_DISPLAY_NAME" option in the MSI installer The INFO_SRC file was part of two CMake components, "Readme" and "Documentation", that made it being included twice in the TAR and ZIP. Using some unpacking tools, this caused an unnecessary prompt if to overwrite or not. Also The "Info" component is always to be installed. Change-Id: I046ffab11add83b095f786a45112816b8bb75d55
…word' When the Router is bootstrapped against the Cluster without existing account given via --account parameter it auto-creates a metadata user. If the --force-password-verification parameter is not given, the auto-created user is using mysql_native_password auth plugin. This plugin is deprecated in 8.1 and about to be removed in the future. This ptach changes the behavior so that the Router is using the default authentication plugin in the Cluster as if --force-password-validation was given. Change-Id: I41a235daeb4bba67fd7640b0865ce93f68083346
…sts. When the Router is bootstrapped over an existing configuration it will fetch the existing user's credentialis from the keyring and reuse this user account. This happens even if the existing user is using the deprecated mysql_native_password authentication plugin. This patch changes the behavior so that the Router is checking what is the authentication plugin for the Router user. If it is mysql_native_password it will try to change it to the Cluster's default authentication plugin. It will only do that if there is a single user/host entry for the Router's user in the mysql.user table. If there is more than one host patter it will only give a warning, advising user to change the authentication plugin manually. Change-Id: I9477cca244076488cb4da78307c6a10db646642b
Part of Bug#34199339 - Replica fail for keyless table using BIT and HASH_SCAN Refactor unpack_record to make code more streamlined, avoiding cascading and cluttered conditions. Change-Id: I16554ddae807573c285a539e42c6bae221a29e0d
Problem: When replicating a NDB table with BIT columns and with no explicit primary key (NDB hidden PK), and the search algorithm of replica rows is HASH_SCAN, replication can stop with a failure on the SQL applier thread. Analysis: The HASH_SCAN algorithm will compute a hash value using the `null_flags` of a table (info of nullable columns and the first high-bit of BIT columns) and the BI field data. This hash value is then used populate a hash table, after which the same computation will take place using the data retrieved from NDB. If the hash matches, a row is found and the event is applied. It was observed that the first hash never matched the hash of data from NDB. The value that was making the hash have wrong value was the `null_flags` of the table. The intricacies of this value is storing the null bits of the columns as well as the first 0-6 uneven bits of BIT columns (the extra unaligned bits). Solution: It was thus found that the unpack of the record, at the ndbcluster plugin level, trusted that the null bits were already set so it did nothing, i.e., did not reset the high bits on the destination buffer. Now, on the unpack of the NDB record, the ndbcluster plugin writes the value of 0 if a field is NULL, forcing the uneven high bits to be 0. Change-Id: I25e7ba2df8420a1495f317eb2e2c472ad98dd91e
Group Replication (GR) rely on the Group Communication Service (GCS) to exchange messages with the group and be informed of the group membership. When there is event from GCS, for example a message or membership update, the GCS delivery thread will call the respective handler on GR so that GR pursuits the required actions. The GCS delivery thread is not initialized with a MySQL context. Some operations done on the GR event handlers do require a MySQL context to be performed on debug builds, the context is required for the debug information provided by the performance schema instrumentation framework. The operations that do require a MySQL context on debug builds are: * GR member actions load and store; * asynchronous replication channels failover configuration load and store; * debug flags for conditional execution. To allow these operations, a temporary context was created, that is, they were surrounded by: ``` my_thread_init(); ... my_thread_end(); ``` These operations only happen on single-primary mode. On 8.X.0 it was noticed that this temporary MySQL context was causing side-effects, more precisely it was disabling the memory consumption tracking inside GCS. This was fixed on 8.X.0. On 8.0 the temporary MySQL context does not cause side effects. Though this showed that this scenario was not being tested, thence we are adding it so that it runs on regression testing. Change-Id: Id042d4f207a88195f45d9e4ac87dbc2a5009672a
…ences Elide cost numbers, which were 3.02 rather than 3.03 for Xcode 15.0.1 Change-Id: I72aac3823bec4c07a05561b0aa2c7d6d7f13b9ea (cherry picked from commit 40ad3de93b2443f10d77f169270ea376ac6da234)
Fix a test in gis_bugs_crashes that was returning a different result in MacOS after pushing wl9582. The input geometry of that test is (geometrically) invalid so the result is unexpected. The test now tests the input geometry if they are valid instead of performing operations on them. This also fixes: Bug#36017208 MTR test gis.gis_bugs_crashes fails with a result difference Change-Id: Ie66aefca95ba724eafa83751e40c3a5991e329b6 (cherry picked from commit 6a0528b6e858bd1c28d2e8b3ba4f0acf0b605758)
Problem: Right-nested UNIONs are not supported for recursive query blocks in CTEs. A check for this is missing. In WL#11350, we started allowing right-nested set operations. The missing check resulted in a crash. Solution: If there are right-nested UNIONs of recursive query blocks that cannot be safely flattened, we throw an error indicating that the operation is not yet supported. Change-Id: I8ac01f91bf63e81fe7393179a3d01f9ba1eaa4e2
Approved-by: Balasubramanian Kandasamy <[email protected]> Change-Id: I8e86c48217b407d7690af86c212fc2f95fd78ffe
…ot found from error It was discovered that some methods from `Group_member_info_manager` class did return the same value for distinct situations, which could be masquerading errors. One case is ``` Group_member_info * Group_member_info_manager::get_group_member_info_by_member_id( const Gcs_member_identifier &id); ``` that does return a allocated copy of the `member_info` when the member exists on the `Group_member_info_manager`, or returns `nullprt` when the member does not exist. Though `nullptr` can also be returned when the memory allocation of the copy fails, which means that the method caller is unable to distinguish: * member not found; * no memory available to construct the object. On both situations the method caller will interpret `nullptr` as member not found which can leverage incorrect decisions. The same pattern exists on the methods: * Group_member_info_manager::get_group_member_info() * Group_member_info_manager::get_group_member_info_by_index() * Group_member_info_manager::get_group_member_info_by_member_id() * Group_member_info_manager::get_primary_member_info() To solve the above issue, the four methods were refactored to: 1) return a boolean value to state if the member was found; 2) receive a out parameter that is a local reference to the method caller that is updated when the member is found. The use of the reference eliminates the allocation of the memory. Change-Id: Ic10263943fc63f40cdda506a59f10962aa208d92
https://perconadev.atlassian.net/browse/PS-9040 Add top-level .clang-tidy file, and extra files in strings/, mysys/, sql/ and unittest/gunit/.
https://perconadev.atlassian.net/browse/PS-9040 Creates a workflow to be triggered on a pull request to perform clang-tidy checks on the changed code and post the violations as comments on the pull request to be addressed. The workflow triggers a run of generating comiplation commands and uses the same to perform a clang-tidy check on the modified files and generates a report of the warnings in the changed code through a github action bot in the form of review comments on the pull request.
PS-9104 Release tasks for PS-8.0.36
PS-9040 Integrate clang-tidy checks for Percona server
PS-9040 Integrate clang-tidy checks for Percona server
https://perconadev.atlassian.net/browse/PS-9146 Exclude the following tests from ASAN tests run: - binlog_mysqlbinlog_4g_start_position is a big tests and requires even more time with ASAN. - buffered_error_log, processlist_tid - report internal lib issues.
Release 5.7.44-49
PS-9146: Exclude some tests from ASAN tests run
PS-9129 Deb12 service is not running after non-pro to pro update
…c:3833:ib::fatal triggered thread (percona#5257) https://perconadev.atlassian.net/browse/PS-9107 Problem: -------- Two threads are trying to delete the same ibuf rec. The first one succeeds with optimistic delete and this means the record is completely gone (added to page garbage). Now second thread, cannot do optimistic. So it tries to do pessmistic and wants to position the cursor on the record deleted by the first thread. It cannot. The record is gone. And ibuf asserts. In one of the innodb master thread active tasks (srv_master_do_active_tasks), innodb tries to merge/contract ibuf in background. So it does this by reading actual pages from disk. So it leaves the job to IO threads. On IO read, we apply pending change buffer entries. This thread (on certain conditions), can decide to a sync read vs async read. In our case, it is an async read. Master thread submits a request to read the page. Lets say space_id:page_no (978:34) At around same time, an insert/delete into secondary indexes wants to buffer a change but the ibuf has reached to its max size, so it initiates a ibuf_contract(). See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor at a random record and it can happen that it sees the same space_id:page_no as the IO thread is processing. And just when this tablespace records are about to be processed, the tablespace is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the actual read of secondary index pages. Hey, the tablespace is gone! whats the point in bringing those pages to buffer pool. Hence it decides to delete all ibuf records belonging to space_id (978) in our example. This leads to the case where two threads can simultaneously process the ibuf records of the same page. IO thread reading a secondary index page and contract ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task) user threads doing ibuf_contract, they process entries belonging to a different tablespace. And when they see that tabelspace is dropped), they try to delete ibuf entries. Fix: ---- ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0, the way tablespaces are dropped is a bit different. fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed. And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags() Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because the window between the actual deletion of the tablespace object and the user drop tablespace command is bigger. In 8.0, space object is still available, so flags is NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr. We will use this call to determine if tablespace is dropped or not. ibuf code saw the tablespace as deleted by using fil_space_acquire() in ibuf_merge_or_delete_for_page(). We will use the same call in ibuf_restore_pos() to determine if the tablespace is being deleted.
PS-9129 Deb12 service is not running after non-pro to pro update
PS-9151: Fix build with WITH_SSL=openssl11 [8.0]
….inc https://perconadev.atlassian.net/browse/PS-9145 There is an issue with running component_encryption_udf MTR tests on el7 platform with alternative system openssl11 lib. Test script is not able to find correct openssl binary which is needed to identify openssl version. To fix the issue encryption_udf_digest_table.inc modyfied to read openssl version from Tls_library_version variable.
https://perconadev.atlassian.net/browse/PS-9145 Remove unused openssl_binary_version_do_not_skip option.
PS-9145: Fix openssl version detection in encryption_udf_digest_table.inc
On branch ps-9170 modified: README.md
PS-9170 Update Percona Server README link
PS-9155: Crash in row_sel_convert_mysql_key_to_innobase
PS-9188 [8.0]: Add clang-18 to Azure Pipelines
…l-5.7.44-49 Null merge PS 5.7.44-49 at percona@1c4f3827625 into 8.0
PS-9213: Fix typo in condition in mysqld.cc::fix_secure_path()
percona-ysorokin
approved these changes
Apr 25, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.