Vats 5.7 #2

sensssz · 2016-10-17T16:32:30Z

Hi, Laurynas,

I forked your branch and fixed all known test failures based on it.

I removed the assertions you added because I don't think they should hold. Previously I used lock_sys->rec_hash directly because it was the only hash list in 5.6, but more were added in 5.7. I've changed the implementation so that it uses the correct hash list.

…ns jump the queue.

…into vats-5.7

…ble name.

…preference.

laurynas-biveinis · 2016-10-18T05:38:31Z

Thank you. Removing the asserts is the right thing to do, as they have been added only to catch missing implementation bits

…page and lock_grant_and_move_on_rec.

…Use higher resolution trx start time.

Summary: Inside index_next_same() call, we should 1. first check whether the record matches the index lookup prefix, 2. then check pushed index condition. If we try to check #2 without checking #1 first, we may walk off the index lookup prefix and scan till the end of the index. Test Plan: Run mtr Reviewers: hermanlee4, maykov, jtolmer, yoshinorim Reviewed By: yoshinorim Differential Revision: https://reviews.facebook.net/D38769

Summary: MyRocks had two bugs when calculating index scan cost. 1. block_size was not considered. This made covering index scan cost (both full index scan and range scan) much higher 2. ha_rocksdb::records_in_range() may have estimated more rows than the estimated number of rows in the table. This was wrong, and MySQL optimizer decided to use full index scan even though range scan was more efficient. This diff fixes #1 by setting stats.block_size at ha_rocksdb::open(), and fixes #2 by reducing the number of estimated rows if it was larger than stats.records. Test Plan: mtr, updating some affected test cases, and new test case rocksdb_range2 Reviewers: hermanlee4, jkedgar, spetrunia Reviewed By: spetrunia Subscribers: MarkCallaghan, webscalesql-eng Differential Revision: https://reviews.facebook.net/D55869

…hutdown) On several testcases (i.e. rpl_gtid_mode), LeakSanitizer diagnoses missed memory deallocation: ================================================================= ==16675==ERROR: LeakSanitizer: detected memory leaks Direct leak of 21 byte(s) in 1 object(s) allocated from: #0 0x7f17748fa54a in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x9854a) laurynas-biveinis#1 0xff7f7f in my_malloc /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/mysys/my_malloc.c:38 laurynas-biveinis#2 0x1634b83 in add_pfs_instr_to_array(char const*, char const*) /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/storage/perfschema/pfs_server.cc:251 laurynas-biveinis#3 0x58cccf in mysqld_get_one_option /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/sql/mysqld.cc:9198 laurynas-biveinis#4 0x10256c6 in my_handle_options /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/mysys_ssl/my_getopt.cc:817 laurynas-biveinis#5 0x1025c63 in handle_options /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/mysys_ssl/my_getopt.cc:308 laurynas-biveinis#6 0x5963e5 in handle_early_options() /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/sql/mysqld.cc:7263 laurynas-biveinis#7 0x5a35a3 in mysqld_main(int, char**) /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/sql/mysqld.cc:5613 laurynas-biveinis#8 0x586aae in main /mnt/workspace/percona-server-5.6-asan-param/BUILD_TYPE/debug-asan/Host/ubuntu-xenial-64bit/sql/main.cc:25 laurynas-biveinis#9 0x7f17726cc82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f) This class of errors is already attempted to suppress in valgrind.supp. But these suppressions have been added to work around a bug of racy PFS shutdown, which is not required anymore as pfs_instr_config_array is deallocated exactly once since [1]. Thus, free the elements of this array and remove related suppressions instead.

In WL-included builds ASAN run witnessed missed ~Query_log_event invocation. The destruct-or was not called due to the WL's changes in the error propagation that specifically affect LC MTS. The failure is exposed in particular by rpl_trigger as the following stack: #0 0x9ecd98 in __interceptor_malloc (/export/home/pb2/test/sb_2-22611026-1489061390.32/mysql-commercial-8.0.1-dmr-linux-x86_64-asan/bin/mysqld+0x9ecd98) #1 0x2b1a245 in my_raw_malloc(unsigned long, int) obj/mysys/../../mysqlcom-pro-8.0.1-dmr/mysys/my_malloc.cc:209:12 #2 0x2b1a245 in my_malloc obj/mysys/../../mysqlcom-pro-8.0.1-dmr/mysys/my_malloc.cc:72 #3 0x2940590 in Query_log_event::Query_log_event(char const*, unsigned int, binary_log::Format_description_event const*, binary_log::Log_event_type) obj/sql/../../mysqlcom-pro-8.0.1-dmr/sql/log_event.cc:4343:46 #4 0x293d235 in Log_event::read_log_event(char const*, unsigned int, char const**, Format_description_log_event const*, bool) obj/sql/../../mysqlcom-pro-8.0.1-dmr/sql/log_event.cc:1686:17 #5 0x293b96f in Log_event::read_log_event() #6 0x2a2a1c9 in next_event(Relay_log_info*) Previously before the WL Mts_submode_logical_clock::wait_for_workers_to_finish() had not returned any error even when Coordinator thread is killed. The WL patch needed to refine such behavior, but at doing so it also had to attend log_event.cc::schedule_next_event() to register an error to follow an existing pattern. While my_error() does not take place the killed Coordinator continued scheduling, ineffectively though - no Worker gets engaged (legal case of deferred scheduling), and without noticing its killed status up to a point when it resets the event pointer in apply_event_and_update_pos(): *ptr_ev= NULL; // announcing the event is passed to w-worker The reset was intended for an assigned Worker to perform the event destruction or by Coordinator itself when the event is deferred. As neither is the current case the event gets unattended for its termination. In contrast in the pre-WL sources the killed Coordinator does find a Worker. However such Worker could be already down (errored out and exited), in which case apply_event_and_update_pos() reasonably returns an error and executes delete ev in exec_relay_log_event() error branch. **Fixed** with deploying my_error() call in log_event.cc::schedule_next_event() error branch which fits to the existing pattern. THD::is_error() has been always checked by Coordinator before any attempt to reset *ptr_ev= NULL. In the errored case Coordinator does not reset and destroys the event itself in the exec_relay_log_event() error branch pretty similarly to how the pre-WL sources do. Tested against rpl_trigger and rpl suites to pass. Approved on rb#15667.

… WITH DEVELOPER STUDIO When we do a release type build of the server (with both optimized and debug enabled server/plugins) with Developer Studio, some MTR tests when run with --debug-server will fail in one of two ways: 1. Tests which try to load a plugin into the mysql client fail with missing symbols. This is caused by the plugin having references to functions which do not exist in the non-debug client. 2. Some tests on sparc fail with Thread stack overrun. Fix for issue #1: mtr will have appended /debug to the plugin dir part when running with --debug-server and if there actually is such a directory. The fix is to remove any trailing /debug from the env. variable within the test. This will affect the client only, not the server. Developer builds will not have put the plugins in a subdirectory /debug so it makes no different to those. Fix for issue #2: apparently this thread stack overrun is not feasible to avoid, so just skip the test if running with debug server on sparc; there is already an include file to do that. Also added not_sparc_debug.inc to the "white list" so the tests are skipped even when running mtr --no-skip. (cherry picked from commit 9c79e477261ab252e38def436bca3336ef597603)

Some character sets are designated as MY_CS_STRNXFRM, meaning that sorting needs to go through my_strnxfrm() (implemented by the charset), and some are not, meaning that a client can do the strnxfrm itself based on cs->sort_order. However, most of the logic related to the latter has been removed already (e.g. filesort always uses my_strnxfrm() since 2003), and now it's mostly in the way. The three main uses left are: 1. A microoptimization for constructing sort keys in filesort. 2. A home-grown implementation of Boyer-Moore for accelerating certain LIKE patterns that should probably be handled through FTS. 3. Some optimizations to MyISAM prefix keys. Given that our default collation (utf8mb4_0900_ai_ci) now is a strnxfrm-based collation, the benefits of keeping these around for a narrow range of single-byte locales (like latin1_swedish_ci, cp850 and a bunch of more obscure locales) seems dubious. We seemingly can't remove the flag entirely due to #3 seemingly affecting the on-disk MyISAM structure, but we can remove the code for #1 and #2. Change-Id: If974e490d451b7278355e33ab1fca993f446b792

(A) move ndb_basename into utility library (B) new portability function ndb_getsockname()

Patch #2: find_child_doms() checks for duplicates each time it adds a result to the result vector. As explained in the header comment for Json_wrapper::seek(), the duplicate elimination is needed for paths which contain multiple ellipses, so in most cases it is unnecessary work. This patch makes find_child_doms() only check for duplicates in the case where the path contains multiple ellipses, and only when inspecting an ellipsis path leg which is not the first one. Additionally: Remove checks for empty vector after a successful call to push_back(). That is, replace checks for is_seek_done(result, only_need_one) with a simple check for only_need_one when we know the result vector cannot be empty. Call is_seek_done() from the loop in Json_dom::seek() instead of at the top of find_child_doms(), so that we can break out of the loop earlier if we find a match and only need one. Microbenchmarks (64-bit, Intel Core i7-4770 3.4 GHz, GCC 6.3): BM_JsonDomSearchEllipsis 25693 ns/iter [+210.9%] BM_JsonDomSearchEllipsis_OnlyOne 17881 ns/iter [+324.3%] BM_JsonDomSearchKey 128 ns/iter [ +0.8%] BM_JsonBinarySearchEllipsis 231319 ns/iter [ +38.7%] BM_JsonBinarySearchEllipsis_OnlyOne 222726 ns/iter [ +41.6%] BM_JsonBinarySearchKey 86 ns/iter [ 0.0%] Change-Id: I0ee624830680247ec5aed302c0408db00240d441

PLUGIN INSTALLS Split the UDF initialization/deinitialization into two: 1. Initialization/deinitialization of the global structures 2. Loading of the UDF definitions from the table and removing them from the global Then kept the #2 at the place of the current initialization/deinitalization routines and added #2 initialization very early (before component/plugin initialization) and #2 deinitialization very late (after the plugin/compononent deinitialization. Added a test plugin and a regression test.

Fix misc UBSAN warnings in unit tests. To repeat: export UBSAN_OPTIONS="print_stacktrace=1" ./runtime_output_directory/merge_large_tests-t --gtest_filter='-*DeathTest*' > /dev/null unittest/gunit/gis_algos-t.cc:78:70: runtime error: downcast of address 0x000012dc0be8 which does not point to an object of type 'Gis_polygon_ring' include/sql_string.h:683:35: runtime error: null pointer passed as argument 2, which is declared to never be null #1 0x373e7af in histograms::Value_map<String>::add_values(String const&, unsigned long long) sql/histograms/value_map.cc:149 #2 0x294fcf2 in dd_column_statistics_unittest::add_values(histograms::Value_map<String>&) unittest/gunit/dd_column_statistics-t.cc:62 runtime_output_directory/merge_keyring_file_tests-t --gtest_filter='-*DeathTest*' > /dev/null plugin/keyring/common/keyring_key.cc:82:57: runtime error: null pointer passed as argument 2, which is declared to never be null Change-Id: I2651362e3373244b72e6893f0e22e67402b49a52 (cherry picked from commit 1fe3f72561994da1d912a257689e1b18106f8828)

…TABLE_UPGRADE_GUARD To repeat: cmake -DWITH_ASAN=1 -DWITH_ASAN_SCOPE=1 ./mtr --mem --sanitize main.dd_upgrade_error A few dd tests fail with: ==26861==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7000063bf5e8 at pc 0x00010d4dbe8b bp 0x7000063bda40 sp 0x7000063bda38 READ of size 8 at 0x7000063bf5e8 thread T2 #0 0x10d4dbe8a in Prealloced_array<st_plugin_int**, 16ul>::empty() const prealloced_array.h:186 #1 0x10d406a8b in lex_end(LEX*) sql_lex.cc:560 #2 0x10dae4b6d in dd::upgrade::Table_upgrade_guard::~Table_upgrade_guard() (mysqld:x86_64+0x100f87b6d) #3 0x10dadc557 in dd::upgrade::migrate_table_to_dd(THD*, std::__1::basic_string<char, std::__1::char_traits<char>, Stateless_allocator<char, dd::String_type_alloc, My_free_functor> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, Stateless_allocator<char, dd::String_type_alloc, My_free_functor> > const&, bool) (mysqld:x86_64+0x100f7f557) #4 0x10dad7e85 in dd::upgrade::migrate_plugin_table_to_dd(THD*) (mysqld:x86_64+0x100f7ae85) #5 0x10daec6a1 in dd::upgrade::do_pre_checks_and_initialize_dd(THD*) upgrade.cc:1216 #6 0x10cd0a5c0 in bootstrap::handle_bootstrap(void*) bootstrap.cc:336 Change-Id: I265ec6dd97ee8076aaf03763840c0cdf9e20325b Fix: increase lifetime of 'LEX lex;' which is used by 'table_guard'

as a component * The validate password plugin is converted to component. * For mysql-8.0 we will have both component and plugin(But plugin will be installed/uninstalled with below deprecate warning) "validate password plugin' is deprecated and will be removed in a future release. Please use validate_password component instead" * In the next major release we remove the plugin .so file. * With the component enabled, we see the system/status variables preceded with "validate_password." instead of "validate_password_"(for example, if we take 'length' system variable we see it as "validate_password.length" instead of "validate_password_length") * The packaging script should be installing the component by default for new installs The way it currently works with deprecations and removals is the following: 1. We add a deprecation warning to using the old way and we switch the server to use the new way by default for new installations. 2. We expect that people will upgrade their existing installations too to avoid the warning. 3. In the next major release we remove the old way and hope that people did #2.

…/BINDINGS/XCOM/XCOM/TASK.C Problem ---------------------------- A Segmentation fault was encountered on a daily-trunk pushbuild run for the test 'group_replication.gr_ssl_mode_verify_identity_error' when starting group replication with the below symptoms - group_replication.gr_ssl_mode_verify_identity_error w5 [ fail ] Test ended at 2017-09-27 17:37:18 CURRENT_TEST: group_replication.gr_ssl_mode_verify_identity_error mysqltest: At line 58: query 'START GROUP_REPLICATION' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 3092 Analysis ----------------------------- The test that fails sporadically is Step #2 of gr_ssl_mode_verify_identity_error. That step causes a global SSL comunication failure, in which not even the local connections can be established, thus causing the join operation to fail. When that happens, GCS will request the XCom thread termination. Since it does not have socket access, it will contact it directly calling xcom_fsm(exit), but this is a well-known non-thread safe operation, hence causing this crash. Fix ----------------------------- Create a new callback for GCS called should_exit, which is tested within task_loop, in order to check if we should exit the XCom thread.

A subset of binlog encryption tests was crashing with: * thread percona#39, stop reason = signal SIGSTOP frame #0: 0x00007fff56063b66 libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff5622e080 libsystem_pthread.dylib`pthread_kill + 333 frame #2: 0x000000010657442b mysqld-debug`my_write_core(sig=11) at stacktrace.cc:278 frame #3: 0x0000000104d84334 mysqld-debug`::handle_fatal_signal(sig=11) at signal_handler.cc:254 frame #4: 0x00007fff56221f5a libsystem_platform.dylib`_sigtramp + 26 frame #5: 0x00007fff5622934d libsystem_pthread.dylib`pthread_mutex_lock + 1 frame #6: 0x0000000106578d05 mysqld-debug`native_mutex_lock(mutex=0x0000000000000000) at thr_mutex.h:93 frame #7: 0x0000000106578a57 mysqld-debug`safe_mutex_lock(mp=0x0000000000000000, try_lock=false, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.cc:70 frame #8: 0x000000010653cd3a mysqld-debug`my_mutex_lock(mp=0x00007ffb6b215038, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.h:180 frame #9: 0x000000010653b2cc mysqld-debug`inline_mysql_mutex_lock(that=0x00007ffb6b215038, src_file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", src_line=113) at mysql_mutex.h:267 * frame #10: 0x000000010653b0d8 mysqld-debug`my_b_append_tell(info=0x00007ffb6b214fd8) at mf_iocache2.cc:113 frame #11: 0x0000000105ed6a96 mysqld-debug`MYSQL_BIN_LOG::write_buffer(this=0x00007ffb6b214cb8, buf="", len=47, mi=0x00007ffb6b1f6a00) at binlog.cc:7128 frame #12: 0x0000000105f4d54b mysqld-debug`queue_event(mi=0x00007ffb6b1f6a00, buf="", event_len=47, do_flush_mi=true) at rpl_slave.cc:7756 frame #13: 0x0000000105f3a243 mysqld-debug`::handle_slave_io(arg=0x00007ffb6b1f6a00) at rpl_slave.cc:5382 frame #14: 0x00000001065b87a5 mysqld-debug`pfs_spawn_thread(arg=0x00007ffb6a543af0) at pfs.cc:2836 frame #15: 0x00007fff5622b661 libsystem_pthread.dylib`_pthread_body + 340 frame #16: 0x00007fff5622b50d libsystem_pthread.dylib`_pthread_start + 377 frame #17: 0x00007fff5622abf9 libsystem_pthread.dylib`thread_start + 13 This was caused by my_b_append_tell trying to lock a nullptr IO_CACHE::append_buffer_lock. The lock was nullptr, because it's only initialized for SEQ_READ_APPEND IO_CACHEs, whereas we have WRITE_CACHE. This mismatch was introduced by WL#8599 [1] changing the IO_CACHE type from the former to the latter. Fix by using the correct API for the new IO_CACHE type: my_b_tell instead of my_b_append_tell. [1]: commit dbd2ca2 Author: Joao Gramacho <[email protected]> Date: Tue Nov 1 06:45:39 2016 +0000 WL#8599: Reduce contention in IO and SQL threads (...)

create_table_info_t::create_table_def leaked memory in the case enable_encryption(table) call failed: worker[5] Sanitizer report from /tmp/results/PS/mysql-test/var/5/log/mysqld.2.err after tests: binlog_encryption.binlog_encryption_without_keyring group_replication.gr_change_master_hidden group_replication.gr_server_uuid_matches_group_name group_replication.gr_perfschema_connect_status group_replication.gr_single_primary_and_leader_election_on_error group_replication.gr_without_perfschema rpl.rpl_key_rotation -------------------------------------------------------------------------- ==14131==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1136 byte(s) in 1 object(s) allocated from: #0 0x7fe9233f1602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602) #1 0xc692483 in ut_allocator<unsigned char>::allocate(unsigned long, unsigned char const*, unsigned int, bool, bool) storage/innobase/include/ut0new.h:608 #2 0xc692483 in mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long) storage/innobase/mem/memory.cc:281 #3 0xb99ff96 in mem_heap_create_func storage/innobase/include/mem0mem.ic:464 #4 0xbae8604 in create_table_info_t::create_table_def(dd::Table const*) storage/innobase/handler/ha_innodb.cc:10349 #5 0xbaee018 in create_table_info_t::create_table(dd::Table const*) storage/innobase/handler/ha_innodb.cc:12420 #6 0xbaf1aba in int innobase_basic_ddl::create_impl<dd::Table>(THD*, char const*, TABLE*, HA_CREATE_INFO*, dd::Table*, bool, bool, bool, unsigned long, unsigned long) storage/innobase/handler/ha_innodb.cc:12805 #7 0xbaf7e6a in ha_innobase::create(char const*, TABLE*, HA_CREATE_INFO*, dd::Table*) storage/innobase/handler/ha_innodb.cc:13756 #8 0x2857f7a in ha_create_table(THD*, char const*, char const*, char const*, HA_CREATE_INFO*, List<Create_field> const*, bool, bool, dd::Table*) sql/handler.cc:5156 #9 0x19d0d9f in rea_create_base_table sql/sql_table.cc:991 #10 0x19d0d9f in create_table_impl sql/sql_table.cc:7118 #11 0x19d37cf in mysql_create_table_no_lock(THD*, char const*, char const*, HA_CREATE_INFO*, Alter_info*, unsigned int, bool, bool*, handlerton**) sql/sql_table.cc:7200 #12 0x19dffb2 in mysql_create_table(THD*, TABLE_LIST*, HA_CREATE_INFO*, Alter_info*) sql/sql_table.cc:7950 #13 0x3b58b9b in Sql_cmd_create_table::execute(THD*) sql/sql_cmd_ddl_table.cc:319 #14 0x15917c1 in mysql_execute_command(THD*, bool) sql/sql_parse.cc:4417 #15 0x15b086e in mysql_parse(THD*, Parser_state*, bool) sql/sql_parse.cc:5139 #16 0x8efc7fd in Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned long) sql/log_event.cc:5295 #17 0x8f7ea48 in Log_event::apply_event(Relay_log_info*) sql/log_event.cc:3882 #18 0x91cb682 in apply_event_and_update_pos sql/rpl_slave.cc:4352 #19 0x9215e69 in exec_relay_log_event sql/rpl_slave.cc:4812 #20 0x9254685 in handle_slave_sql sql/rpl_slave.cc:6912 #21 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 #22 0x7fe9231436b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) Fix by adding the missing mem_heap_free(heap) call.

Avoid undefined behavior in audit_log_update_thd_local by avoiding passing NULL as source pointer to memcpy, even with zero length. The UBSan report fixed is /usr/include/x86_64-linux-gnu/bits/string3.h:53:71: runtime error: null pointer passed as argument 2, which is declared to never be null #0 0x7fe5aad56fb1 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53 #1 0x7fe5aad56fb1 in audit_log_update_thd_local plugin/audit_log/audit_log.cc:987 #2 0x7fe5aad56fb1 in audit_log_notify plugin/audit_log/audit_log.cc:1105 #3 0x1ecac37 in plugins_dispatch sql/sql_audit.cc:1284 #4 0x1ecac37 in event_class_dispatch sql/sql_audit.cc:1322 #5 0x1ecb311 in event_class_dispatch_error sql/sql_audit.cc:1340 #6 0x1ed21b1 in mysql_audit_notify(THD*, mysql_event_connection_subclass_t, char const*, int) sql/sql_audit.cc:438 #7 0x1350071 in check_connection sql/sql_connect.cc:868 #8 0x1350071 in login_connection sql/sql_connect.cc:929 #9 0x1357881 in thd_prepare_connection(THD*, bool) sql/sql_connect.cc:1084 #10 0x1e66347 in handle_connection sql/conn_handler/connection_handler_per_thread.cc:313 #11 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836 #12 0x7fe5d352f6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) #13 0x7fe5d0bd741c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)

… keyring Issue: when the server is started without a keyring, the redo log encryption checks were never performed, redo log encryption wasn't initialized, but the setting was left on, misleading the user. Later operations which triggered this check could possibly fail. Issue #2: redo log encryption didn't work properly when it was turned on using the command line: in some cases, the redo log remained unencrypted (until another operation triggered the redo log encryption setup methods to be called again) Both issues were caused by the refactoring of the periodic, once every second encryption checks in upstream, in PS-5189. This commit: * changes the code so redo log encryption routines are called at least twice during every startup with the encryption settings on. This fixes issue #2. * changes the code so redo log encryption routines are called at least once during every startup, even without a keyring, or a read-only server: this fixes issues #1 * adds a test ensuring that in an invalid configuration (without a keyring) the the server remains running, but the redo log encryption setting correctly displays the OFF status * adds a test ensuring that when the redo log is correctly configured (using the dynamic variable or a command line parameter) the log is encrypted, and no unencrypted data is present in it. * backports additional redo log related bugfixes/refactorings from 8.0 (92198f4). The original commit contains fixes for both the redo and the undo log, the backport only includes the redo log related parts refactored to match our changes. (cherry picked from commit c9f151e)

Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3 PS-5217 : Merge fb-prod201803 Summary: Original report: https://jira.mariadb.org/browse/MDEV-15816 To reproduce this bug just following below steps, client 1: USE test; CREATE TABLE t1 (i INT) ENGINE=MyISAM; HANDLER t1 OPEN h; CREATE TABLE t2 (i INT) ENGINE=RocksDB; LOCK TABLES t2 WRITE; client 2: FLUSH TABLES WITH READ LOCK; client 1: INSERT INTO t2 VALUES (1); So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE. Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly set to RDB_LOCK_NONE, as below ``` #0 myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE) #1 get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2) #2 mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0) #3 THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true) #4 MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8) #5 MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2) #6 Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0) ``` Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE' failed in myrocks::ha_rocksdb::write_row() Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE. Closes facebook/mysql-5.6#838 Pull Request resolved: facebook/mysql-5.6#871 Differential Revision: D9417382 Pulled By: lth fbshipit-source-id: c36c164e06c

…E TO A SERVER Problem ======================================================================== Running the GCS tests with ASAN seldomly reports a user-after-free of the server reference that the acceptor_learner_task uses. Here is an excerpt of ASAN's output: ==43936==ERROR: AddressSanitizer: heap-use-after-free on address 0x63100021c840 at pc 0x000000530ff8 bp 0x7fc0427e8530 sp 0x7fc0427e8520 WRITE of size 8 at 0x63100021c840 thread T3 #0 0x530ff7 in server_detected /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:962 #1 0x533814 in buffered_read_bytes /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:1249 #2 0x5481af in buffered_read_msg /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:1399 #3 0x51e171 in acceptor_learner_task /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4690 #4 0x562357 in task_loop /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task.c:1140 #5 0x5003b2 in xcom_taskmain2 /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1324 #6 0x6a278a in Gcs_xcom_proxy_impl::xcom_init(unsigned short, node_address*) /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_proxy.cc:164 #7 0x59b3c1 in xcom_taskmain_startup /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:107 #8 0x7fc04a2e4dd4 in start_thread (/lib64/libpthread.so.0+0x7dd4) #9 0x7fc047ff2bfc in __clone (/lib64/libc.so.6+0xfebfc) 0x63100021c840 is located 64 bytes inside of 65688-byte region [0x63100021c800,0x63100022c898) freed by thread T3 here: #0 0x7fc04a5d7508 in __interceptor_free (/lib64/libasan.so.4+0xde508) #1 0x52cf86 in freesrv /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:836 #2 0x52ea78 in srv_unref /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:868 #3 0x524c30 in reply_handler_task /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4914 #4 0x562357 in task_loop /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task.c:1140 #5 0x5003b2 in xcom_taskmain2 /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1324 #6 0x6a278a in Gcs_xcom_proxy_impl::xcom_init(unsigned short, node_address*) /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_proxy.cc:164 #7 0x59b3c1 in xcom_taskmain_startup /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:107 #8 0x7fc04a2e4dd4 in start_thread (/lib64/libpthread.so.0+0x7dd4) previously allocated by thread T3 here: #0 0x7fc04a5d7a88 in __interceptor_calloc (/lib64/libasan.so.4+0xdea88) #1 0x543604 in mksrv /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:721 #2 0x543b4c in addsrv /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:755 #3 0x54af61 in update_servers /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:1747 #4 0x501082 in site_install_action /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1572 #5 0x55447c in import_config /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/site_def.c:486 #6 0x506dfc in handle_x_snapshot /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:5257 #7 0x50c444 in xcom_fsm /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:5325 #8 0x516c36 in dispatch_op /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4510 #9 0x521997 in acceptor_learner_task /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4772 #10 0x562357 in task_loop /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task.c:1140 #11 0x5003b2 in xcom_taskmain2 /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1324 #12 0x6a278a in Gcs_xcom_proxy_impl::xcom_init(unsigned short, node_address*) /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_proxy.cc:164 #13 0x59b3c1 in xcom_taskmain_startup /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:107 #14 0x7fc04a2e4dd4 in start_thread (/lib64/libpthread.so.0+0x7dd4) Analysis ======================================================================== The server structure is reference counted by the associated sender_task and reply_handler_task. When they finish, they unreference the server, which leads to its memory being freed. However, the acceptor_learner_task keeps a "naked" reference to the server structure. Under the right ordering of operations, i.e. the sender_task and reply_handler_task terminating after the acceptor_learner_task acquires, but before it uses, the reference to the server structure, leads to the acceptor_learner_task accessing the server structure after it has been freed. Solution ======================================================================== Let the acceptor_learner_task also reference count the server structure so it is not freed while still in use. Reviewed-by: André Negrão <[email protected]> Reviewed-by: Venkatesh Venugopal <[email protected]> RB: 21209

… keyring Issue: when the server is started without a keyring, the redo log encryption checks were never performed, redo log encryption wasn't initialized, but the setting was left on, misleading the user. Later operations which triggered this check could possibly fail. Issue #2: redo log encryption didn't work properly when it was turned on using the command line: in some cases, the redo log remained unencrypted (until another operation triggered the redo log encryption setup methods to be called again) Both issues were caused by the refactoring of the periodic, once every second encryption checks in upstream, in PS-5189. This commit: * changes the code so redo log encryption routines are called at least twice during every startup with the encryption settings on. This fixes issue #2. * changes the code so redo log encryption routines are called at least once during every startup, even without a keyring, or a read-only server: this fixes issues #1 * adds a test ensuring that in an invalid configuration (without a keyring) the the server remains running, but the redo log encryption setting correctly displays the OFF status * adds a test ensuring that when the redo log is correctly configured (using the dynamic variable or a command line parameter) the log is encrypted, and no unencrypted data is present in it. * backports additional redo log related bugfixes/refactorings from 8.0 (92198f4). The original commit contains fixes for both the redo and the undo log, the backport only includes the redo log related parts refactored to match our changes.

sensssz added 17 commits October 13, 2016 15:41

Put locks in the correct hash table: use lock_hash_get.

ec49449

Remove unnecessary asserts.

14bb46e

Atempt to jump the queue directly, just like high priority transactio…

f6b0deb

…ns jump the queue.

Undo change to PerconaFT.

e464f73

Merge branch 'vats-5.7' of https://github.com/sensssz/percona-server …

da7a58b

…into vats-5.7

Add forward declaration of lock_rec_has_to_wait_in_queue.

90c7527

Move lock after deadlock is resolved.

5ea9e50

Fix rec_fold computation: m_recid is not avaiable.

903dcd4

Respect slave commit order.

d29f28c

Move definition of thd_priority_preference to sql_class.h

62f5823

Remove consts because get_thd_worker does not support it.

aba22cf

Bug fix for commit_order_manager_priority_preference: incorrect varia…

cf29daf

…ble name.

Add NULL checks for the two workers in commit_order_manager_priority_…

8ea4b48

…preference.

Disable VATS during replication on slave servers.

72ff168

Remove duplicated function definition.

1c51283

Grant the moved lock when possible.

19db48b

Do not move high priority transactions.

35733e4

sensssz added 6 commits October 18, 2016 16:11

Fix tests for new sysvar innodb_lock_schedule_algorithm.

d6e670c

Fix show_sanity output format.

39fec53

Fix show_sanity output format.

f799b29

Force all granted locks to inserted before the wait locks.

250ebba

Bug fix: move previous to the correct lock in lock_grant_and_move_on_…

e87ab71

…page and lock_grant_and_move_on_rec.

Use thd_is_replication_slave_thread instead is is_slave_replication. …

b0f5b9b

…Use higher resolution trx start time.

laurynas-biveinis self-assigned this Oct 24, 2016

sensssz closed this Dec 7, 2016

sensssz deleted the vats-5.7 branch December 7, 2016 15:43

laurynas-biveinis pushed a commit that referenced this pull request Jul 21, 2017

wl#9819 patch #2 : Misc. housekeeping required for worklog

92f3f38

(A) move ndb_basename into utility library (B) new portability function ndb_getsockname()

laurynas-biveinis pushed a commit that referenced this pull request Jul 21, 2017

wl#9819 Version 2. Patch #2: New definition of ProcessInfoRep signal.

63aaebd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vats 5.7 #2

Vats 5.7 #2

sensssz commented Oct 17, 2016

laurynas-biveinis commented Oct 18, 2016

Vats 5.7 #2

Vats 5.7 #2

Conversation

sensssz commented Oct 17, 2016

laurynas-biveinis commented Oct 18, 2016