Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null merge PS 8.0.36-28 at https://github.com/percona/percona-server/commit/068b575514b into trunk #5290

Merged

Conversation

oleksandr-kachan
Copy link
Contributor

No description provided.

dtcpatricio and others added 30 commits November 10, 2023 17:13
Problem:

Running an ASAN build with DBUG_TRACE calls (--debug), a heap buffer
overflow is detected on NDB_SHARE::create_key().

Analysis:

On NDB_SHARE::create_key(), the DBUG_DUMP of the `m_buffer` field of
NDB_SHARE_KEY is given with the size `size` which amounts to
sizeof(NDB_SHARE_KEY) + buffer_size. Hence, DBUG_DUMP is called
with sizeof(NDB_SHARE_KEY) extra bytes, caught by ASAN build.

Solution:

Since DBUG_DUMP is not necessary, because a DBUG_PRINT of the buffer is
already present in the lines before, then it is removed.

Change-Id: I92c17fb16b39d19a896a500b8ab1956addc16015
Some innodb tests don't run with nonstandard page sizes.

Fix: skip the tests when run with page sizes that don't work.

Change-Id: I4ccdddb981d93621cd259ca6cb9b1ede536b6987
A transporter may be set in an 'overloaded' state if a send attempt
failed to send anything on it. That could e.g be to the OS-internal
send buffers being filled up, possibly due to the receiver side not
consuming fast enough / having full buffers as well.

The handling of such overload is to let the send-thread taking
a short 200us yield from the CPU if there is no other send work
to be found.

Right before the send-threads are yielding the CPU the yield method
will check the callback method check_available_send_data() for any
late arrived transporters having more data to send. That is effectively
implemented by the send-threads having the flag 'bool m_more_trps' which
is set every time a new transporter is entered into the queue with
insert_trp(). It is also cleared if get_trp() could not find a TrpId
to be returned.

The special case that get_trp() had found only a delayed transporter
is handled in handle_send_trp(). Here we cleared the 'm_more_trps'
if we found a delay, and handle_send_thread was part of a send-thread
(as opposed to an assist-send)
Note that failing to clear the m_more_trps would result in the
send-thread later not being able to yield the CPU.

For the 'delayed-send with overload'-case however, handle_send
did an early 'return false', without ever entering the code section
where m_more_trps was cleared for a send-thread. This prevented
the CPU yield in the overload situation.

Patch is to combine the check for overload with the general check
for delay to be canceled or m_more_trps to be cleared.

Change-Id: Ied5cca628e10715510d88350140c7f4ce1a17f37
The "Bug45495" test in testMgmd fails under gcov testing. The
test results show hundreds of lines of DEBUG log messages
being written while a fiile_exists() call times out.
This patch starts the mgmd servers without the --verbose flag,
so that the DEBUG level log messages are suppressed.

This patch is intended for both mysql-8.0 and mysql-trunk.
A separate patch for this bug addresses failures in other tests
and is intended for mysql-trunk only.

Change-Id: I904de0a1926f3a151ee21c194659b61b2d19aed2
The Qmgr GSN_ISOLATE_ORD handling was modified to handle the larger
node bitmap size necessary for supporting up to 144 data nodes.

For compatibility QMGR uses the node version of the sending node to
determine whether the incoming signal has its node bitmap in a long
section or inline in the signal.

The senderRef member of the incoming signal is used which is set
by the signal originator.

However in the context of ISOLATE_ORD, the original sender may be
shutdown when ISOLATE_ORD is processed, in which case their node
version may have been reset to zero, causing the inline bitmap
path to be taken, resulting in incorrect processing.

The signal handler is changed to use the presence of a long section
to decide whether the incoming signal uses a long section to
represent nodes to isolate or not.

Change-Id: I1efe6c98c3f342770462d81ae00395b5a2eec7d7
…r.cc

Issue: ha_index_end() is to be called only if the handler has been
initialized this check was missing which led to the assertion failure.
Fixed by adding the necessary condition.

This is a back port of a change that was done as part of WL#15257
in 8.2.0

Change-Id: I2ec6717d081c872577aa731a40170d287bad8ad2
Change-Id: I9b80ed5ed08ebe8bf1b6835cfabec6695666c599
Problem:

Running an ASAN build with DBUG_TRACE calls (--debug), a heap buffer
overflow is detected on NDB_SHARE::create_key().

Analysis:

On NDB_SHARE::create_key(), the DBUG_DUMP of the `m_buffer` field of
NDB_SHARE_KEY is given with the size `size` which amounts to
sizeof(NDB_SHARE_KEY) + buffer_size. Hence, DBUG_DUMP is called
with sizeof(NDB_SHARE_KEY) extra bytes, caught by ASAN build.

Solution:

Since DBUG_DUMP is not necessary, because a DBUG_PRINT of the buffer is
already present in the lines before, then it is removed.

Change-Id: I649d66e2aa87951ada0483208b95e9757a97ebae
Change-Id: I8ce2a40889148c20f2bdf4a3cfd8808de5c2007f
Another attempt to fix this test failure.

We need to also know which tableId we should crash on when
commiting a ZUPDATE. Previous (failed!) patch is extended with
using insertError2InNode() to insert the error code as well
as the tableId to crash on as 'ERROR_INSERT_EXTRA'

Dblqh::execCOMMIT() is enhanced to also check that
tableId == ERROR_INSERT_EXTRA before CRASH_INSERTION(5048)

Change-Id: If4e453bdf7400a4439465e3e0c0ad9ef7f91a46e
Post push patch, removing Multi_Transporter as a 'friend'

Change-Id: I92a810dd04ebad203832332b6da685cfd3660dcb
There was a possible race between the start_clients_thread() and
update_connections(), both seeing the same transporter in state
DISCONNECTING. The design intention was that the
start_clients_thread() should shutdown() the
transporter, such that Transporter::isConnected() became 'false'.
Upon seeing the !isConnected() state on a DISCONNECTING transporter,
update_connections() should 'close' the transporter.

However, as the start_clients_threads -> doDisconnect() code patch
sets 'm_conncted = false' before doing the disconnectImpl() -> shutdown,
that opened up a race where update_connections() could see the transporter
as 'not-isConnected()' and thus close the transporter before it was even
shutdown.

With SSL enabled that resulted in that the SSL_context object
was released by ::ssl_close(), and later referred by ssl_shutdown().
Without SSL enabled we still broke the intended call order of socket
close vs shutdown.

Patch ensure that disconnectImpl() is now completed before we set
the m_connected state to 'false'. Thus update_connections will not
be able to close the NdbSocket before it has been completely shutdown().

Note that this raises the theoretical possibilites of TransporterRegistry
code paths ::poll*() and ::performReceive(), where we check:

  'if (is_connected(trp_id) && t->isConnected()) {'

... Now possibly seeing a Transporter as 'isConnected()' when
disconnectImpl() has been performed, without 'm_coonncted' not
having been set to 'false' yet.

However it seems that the above check is too strict: When
the transporter 'protocol' for doing disconnect is followed,
there will not exist cases where 'performState[] == CONNECTED'
and 'not isConnected())'

Note that doDisconnect(), which sets 'm_connected = false',
is only called from start_clients_thread() when transporter
being in state DISCONNECTING or DISCONNECTED. Thus the above
assumption should be safe.

Change-Id: I9444ae74b93149deed8f4197809f6dd8dce44158
              when input some sql statement

When a shared materialized table for a cte is marked
for re-materialization because it is a lateral derived
table and the counters for the invalidators do not
match for the second reference to the cte, it clears
all references to the cte which resets the cursors
positioned for the table which materialzed the cte
table leading to problems when reading from that table.
More detailed analysis in the bug page and the test
file (w.r.t the failing query).
The fix is to not re-materialize the cte table when
we are using the shared materialized derived table.
We also update the invalidator counters while at
it.

Change-Id: I5f9daeab03fee93354c0253fd55865f1a6a1e32c
… Multi_Transporter.

Post push patch fixing a regression where TransporterRegistry::get_bytes_received()
returned the number of 'sent' bytes instead.

Change-Id: Icac3ed5d4014ef6f36b31ee4e4c8795cc2ecc601
post push patch to add comments

Change-Id: I99ad8a65682eefec6b49f46b022e0e98f64442ba
The problem is in the function JOIN::destroy(), where we call
update_used_tables() to reinstate used tables information after
execution of a statement that has one or more const tables, and has thus
been subject to const table elimination. During this, we may encounter
a table, such as a special information schema, that has already been
closed and subsequently has had its table pointer (in class Table_ref)
set to nullptr.

The solution to this is to delay the refreshing of used table
information to the start of the next execution, just after tables have
been opened and we know that all table objects are in a proper state.

The performance impact of this seems to be negligible.

Also added some necessary code to prevent that uninitialized command
properties were copied in the case of as yet unprepared statements.

Change-Id: Ib44ab2cf1e61c3b188e7ceb000adb1445c48979a
Context:
Some NDBAPI tests uses ConfigValues::openSection()
function in order to get/set config parameters.
openSection has as argument the section 'type'
(SYSTEM, NODE, CONNECTION ...) and the index to the
section we are interested in.
Section indexes are sequential, starting from zero
and are ordered by nodeId BUT nodeId and index can
be different.

Problem:
In some testes openSection is used with wrong
arguments (nodeId is used instead index). Depending
on the configuration this can lead openSection to
open wrong/invalid section.

Solution:
Tests changed to always use index instead nodeId
to openSection.
In particular, when iterating over all sections
looking to a parameter, iteration now starts from
index 0, preventing any section from being ignored.

Change-Id: I10468675b1ef07cf01a89961f5d5a84a498b4568
Context:
Some NDBAPI tests uses ConfigValues::openSection()
function in order to get/set config parameters.
openSection has as argument the section 'type'
(SYSTEM, NODE, CONNECTION ...) and the index to the
section we are interested in.
Section indexes are sequential, starting from zero
and are ordered by nodeId BUT nodeId and index can
be different.

Problem:
In some testes openSection is used with wrong
arguments (nodeId is used instead index). Depending
on the configuration this can lead openSection to
open wrong/invalid section.

Solution:
Tests changed to always use index instead nodeId
to openSection.
In particular, when iterating over all sections
looking to a parameter, iteration now starts from
index 0, preventing any section from being ignored.

Change-Id: Ib783490b651de9d4ae2627b5b3a3661530495318
Change-Id: I629dac790cf573232d3dedd62f9554c16644c744
Bug#35529968 - Hide the "CPACK_COMPONENT_GROUP_INFO_DISPLAY_NAME" option in the MSI installer

The INFO_SRC file was part of two CMake components, "Readme" and
"Documentation", that made it being included twice in the TAR and ZIP.
Using some unpacking tools, this caused an unnecessary prompt if to
overwrite or not. Also The "Info" component is always to be installed.

Change-Id: I046ffab11add83b095f786a45112816b8bb75d55
…word'

When the Router is bootstrapped against the Cluster without
existing account given via --account parameter it auto-creates
a metadata user. If the --force-password-verification parameter
is not given, the auto-created user is using mysql_native_password
auth plugin. This plugin is deprecated in 8.1 and about to be
removed in the future.

This ptach changes the behavior so that the Router is using the default
authentication plugin in the Cluster as if --force-password-validation was
given.

Change-Id: I41a235daeb4bba67fd7640b0865ce93f68083346
…sts.

When the Router is bootstrapped over an existing configuration it
will fetch the existing user's credentialis from the keyring and reuse
this user account. This happens even if the existing user is using
the deprecated mysql_native_password authentication plugin.

This patch changes the behavior so that the Router is checking what
is the authentication plugin for the Router user. If it is
mysql_native_password it will try to change it to the Cluster's
default authentication plugin. It will only do that if there is
a single user/host entry for the Router's user in the mysql.user
table. If there is more than one host patter it will only give
a warning, advising user to change the authentication plugin manually.

Change-Id: I9477cca244076488cb4da78307c6a10db646642b
Part of Bug#34199339 - Replica fail for keyless table using BIT and
HASH_SCAN

Refactor unpack_record to make code more streamlined, avoiding
cascading and cluttered conditions.

Change-Id: I16554ddae807573c285a539e42c6bae221a29e0d
Problem:

When replicating a NDB table with BIT columns and with no explicit primary key (NDB
hidden PK), and the search algorithm of replica rows is HASH_SCAN,
replication can stop with a failure on the SQL applier thread.

Analysis:

The HASH_SCAN algorithm will compute a hash value using the
`null_flags` of a table (info of nullable columns and the first
high-bit of BIT columns) and the BI field data. This hash value
is then used populate a hash table, after which the same computation
will take place using the data retrieved from NDB. If the hash
matches, a row is found and the event is applied.

It was observed that the first hash never matched the hash of data
from NDB. The value that was making the hash have wrong value was the
`null_flags` of the table. The intricacies of this value is storing
the null bits of the columns as well as the first 0-6 uneven bits of
BIT columns (the extra unaligned bits).

Solution:

It was thus found that the unpack of the record, at the ndbcluster
plugin level, trusted that the null bits were already set so it did
nothing, i.e., did not reset the high bits on the destination buffer.
Now, on the unpack of the NDB record, the ndbcluster plugin writes the
value of 0 if a field is NULL, forcing the uneven high bits to be 0.

Change-Id: I25e7ba2df8420a1495f317eb2e2c472ad98dd91e
Group Replication (GR) rely on the Group Communication Service (GCS)
to exchange messages with the group and be informed of the group
membership.
When there is event from GCS, for example a message or membership
update, the GCS delivery thread will call the respective handler on
GR so that GR pursuits the required actions. The GCS delivery thread
is not initialized with a MySQL context. Some operations done on the
GR event handlers do require a MySQL context to be performed on
debug builds, the context is required for the debug information
provided by the performance schema instrumentation framework.
The operations that do require a MySQL context on debug builds are:
 * GR member actions load and store;
 * asynchronous replication channels failover configuration load and
   store;
 * debug flags for conditional execution.
To allow these operations, a temporary context was created, that is,
they were surrounded by:
```
  my_thread_init();
  ...
  my_thread_end();
```
These operations only happen on single-primary mode.

On 8.X.0 it was noticed that this temporary MySQL context was
causing side-effects, more precisely it was disabling the memory
consumption tracking inside GCS. This was fixed on 8.X.0.

On 8.0 the temporary MySQL context does not cause side effects.
Though this showed that this scenario was not being tested, thence
we are adding it so that it runs on regression testing.

Change-Id: Id042d4f207a88195f45d9e4ac87dbc2a5009672a
…ences

Elide cost numbers, which were 3.02 rather than 3.03 for Xcode 15.0.1

Change-Id: I72aac3823bec4c07a05561b0aa2c7d6d7f13b9ea
(cherry picked from commit 40ad3de93b2443f10d77f169270ea376ac6da234)
Fix a test in gis_bugs_crashes that was returning a different result
in MacOS after pushing wl9582. The input geometry of that test is
(geometrically) invalid so the result is unexpected. The test now
tests the input geometry if they are valid instead of performing
operations on them.

This also fixes:
Bug#36017208 MTR test gis.gis_bugs_crashes fails with a result difference

Change-Id: Ie66aefca95ba724eafa83751e40c3a5991e329b6
(cherry picked from commit 6a0528b6e858bd1c28d2e8b3ba4f0acf0b605758)
Problem: Right-nested UNIONs are not supported for recursive query
blocks in CTEs. A check for this is missing. In WL#11350, we started
allowing right-nested set operations. The missing check resulted in
a crash.

Solution: If there are right-nested UNIONs of recursive query blocks
that cannot be safely flattened, we throw an error indicating that
the operation is not yet supported.

Change-Id: I8ac01f91bf63e81fe7393179a3d01f9ba1eaa4e2
Approved-by: Balasubramanian Kandasamy <[email protected]>
Change-Id: I8e86c48217b407d7690af86c212fc2f95fd78ffe
…ot found from error

It was discovered that some methods from `Group_member_info_manager`
class did return the same value for distinct situations, which could
be masquerading errors.
One case is
```
Group_member_info *
Group_member_info_manager::get_group_member_info_by_member_id(
    const Gcs_member_identifier &id);
```
that does return a allocated copy of the `member_info` when the
member exists on the `Group_member_info_manager`, or returns
`nullprt` when the member does not exist.
Though `nullptr` can also be returned when the memory allocation of
the copy fails, which means that the method caller is unable to
distinguish:
 * member not found;
 * no memory available to construct the object.
On both situations the method caller will interpret `nullptr` as
member not found which can leverage incorrect decisions.

The same pattern exists on the methods:
 * Group_member_info_manager::get_group_member_info()
 * Group_member_info_manager::get_group_member_info_by_index()
 * Group_member_info_manager::get_group_member_info_by_member_id()
 * Group_member_info_manager::get_primary_member_info()

To solve the above issue, the four methods were refactored to:
 1) return a boolean value to state if the member was found;
 2) receive a out parameter that is a local reference to the method
    caller that is updated when the member is found. The use of the
    reference eliminates the allocation of the memory.

Change-Id: Ic10263943fc63f40cdda506a59f10962aa208d92
VarunNagaraju and others added 26 commits March 13, 2024 16:13
https://perconadev.atlassian.net/browse/PS-9040

Add top-level .clang-tidy file, and extra files in
strings/, mysys/, sql/ and unittest/gunit/.
https://perconadev.atlassian.net/browse/PS-9040

Creates a workflow to be triggered on a pull request
to perform clang-tidy checks on the changed code and post
the violations as comments on the pull request to be
addressed.

The workflow triggers a run of generating comiplation commands
and uses the same to perform a clang-tidy check on the modified
files and generates a report of the warnings in the changed code
through a github action bot in the form of review comments on
the pull request.
PS-9040 Integrate clang-tidy checks for Percona server
PS-9040 Integrate clang-tidy checks for Percona server
https://perconadev.atlassian.net/browse/PS-9146

Exclude the following tests from ASAN tests run:
- binlog_mysqlbinlog_4g_start_position is a big tests and requires even
  more time with ASAN.
- buffered_error_log, processlist_tid - report internal lib issues.
PS-9146: Exclude some tests from ASAN tests run
PS-9129 Deb12 service is not running after non-pro to pro update
…c:3833:ib::fatal triggered thread (percona#5257)

https://perconadev.atlassian.net/browse/PS-9107

Problem:
--------
Two threads are trying to delete the same ibuf rec. The first one succeeds with
optimistic delete and this means the record is completely gone (added to page garbage).

Now second thread, cannot do optimistic. So it tries to do pessmistic and wants
to position the cursor on the record deleted by the first thread. It cannot.
The record is gone. And ibuf asserts.

In one of the innodb master thread active tasks (srv_master_do_active_tasks),
innodb tries to merge/contract ibuf in background. So it does this by reading
actual pages from disk. So it leaves the job to IO threads. On IO read,
we apply pending change buffer entries. This thread (on certain conditions),
can decide to a sync read vs async read. In our case, it is an async read.
Master thread submits a request to read the page. Lets say space_id:page_no (978:34)

At around same time, an insert/delete into secondary indexes wants to buffer a
change but the ibuf has reached to its max size, so it initiates a ibuf_contract().
See ibuf_insert()-> calling ibuf_contract()->ibuf_merge_pages. This opens a cursor
at a random record and it can happen that it sees the same space_id:page_no as the
IO thread is processing.

And just when this tablespace records are about to be processed, the tablespace
is dropped. So the ibuf_merge_pages() decides it is no longer necessary to do the
actual read of secondary index pages. Hey, the tablespace is gone! whats the point
in bringing those pages to buffer pool. Hence it decides to delete all ibuf records
belonging to space_id (978) in our example.

This leads to the case where two threads can simultaneously process the ibuf
records of the same page. IO thread  reading a secondary index page and contract
ibuf belonging to this space_id,page_no (this read is initiated by innodb master ibuf merge task)
user threads doing ibuf_contract, they process entries belonging to a different tablespace.
And when they see that tabelspace is dropped), they try to delete ibuf entries.

Fix:
----
ibuf restore pos is designed to handle the “dropped tablespace” already, but in 8.0,
the way tablespaces are dropped is a bit different.

fil_space_get_flags(space) === ULINT32_UNDEFINED happens when space object is freed.
And on nullptr, ULINT32_UNDEFINED is returned by fil_space_get_flags()

Technically the problem can happen in 5.7 too, but it is more obvious in 8.0 because
the window between the actual deletion of the tablespace object and the user drop
tablespace command is bigger. In 8.0, space object is still available, so flags is
NOT ULINT32_UNDEFINED. At the time of crash, only stop_new_ops is set to true. If this
fil_space_t->stop_new_ops is true, fil_space_acquire() returns nullptr.
We will use this call to determine if tablespace is dropped or not.

ibuf code saw the tablespace as deleted by using fil_space_acquire() in
ibuf_merge_or_delete_for_page().

We will use the same call in ibuf_restore_pos() to determine if the tablespace is being
deleted.
PS-9129 Deb12 service is not running after non-pro to pro update
PS-9151: Fix build with WITH_SSL=openssl11 [8.0]
….inc

https://perconadev.atlassian.net/browse/PS-9145

There is an issue with running component_encryption_udf MTR tests on el7
platform with alternative system openssl11 lib. Test script is not able
to find correct openssl binary which is needed to identify openssl
version.

To fix the issue encryption_udf_digest_table.inc modyfied to read
openssl version from Tls_library_version variable.
PS-9145: Fix openssl version detection in encryption_udf_digest_table.inc
 On branch ps-9170
	modified:   README.md
PS-9170 Update Percona Server README link
PS-9155: Crash in row_sel_convert_mysql_key_to_innobase
PS-9188 [8.0]: Add clang-18 to Azure Pipelines
PS-9213: Fix typo in condition in mysqld.cc::fix_secure_path()
Copy link
Collaborator

@percona-ysorokin percona-ysorokin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@oleksandr-kachan oleksandr-kachan merged commit 8866762 into percona:trunk Apr 25, 2024
5 of 7 checks passed
@oleksandr-kachan oleksandr-kachan deleted the null-merge-mysql-8.0.36-28 branch April 25, 2024 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.