Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PS-3410 : LP #1570114: Long running ALTER TABLE ADD INDEX causes sema… #3143

Merged
merged 1 commit into from
Apr 19, 2019

Conversation

satya-bodapati
Copy link
Contributor

…phore wait > 600 assertion

Problem:

A long running ALTER TABLE ADD INDEX with concurrent inserts causes sempahore waits and
eventually crashes the server.

To see this problem you need to have

  1. A table with lots of data. Add index should take significant time to create many pages

  2. Compressed table. This is becuase CPU is spent on compress() with mtr already latching index->lock
    More time spent by mtr, more waits by the INSERT. Helps in crash.

  3. Concurrent inserts when ALTER is running. The inserts should happen specifically after the read phase
    of ALTER and after Bulk load index build (bottump build) started.

The entire bulkload process latches the index->lock X mode for the whole duration of bottom up build of index. The index->lock is held across mtrs (because many pages are created during index build).

An example is this: Page1 mtr latches index->lock X mode, when page is full, a sibling page is created.
The sibling Page 2 (mtr) also acquires index->lock X mode.

Recursive X latching is allowed by same thread. Now Page 1 mtr commits but index->lock is still held by Page 2. Now when page 2 is full, another sibling page is created. Sibling Page 3 now acquires index->lock X mode. Page 2 mtr commits.. This goes on and on. Also happens with Pages at non-root levels.

Essentially the time index->lock is held is equally proportional to number of pages/mtrs created. And compress tables helps in making mtr take a bit more time in doing compress() and duration of each mtr is higher with compressed tables.

At this stage, a concurrent INSERT comes and since there is concurrent DDL and the index is uncommited, this insert should go to online ALTER log. It tries to acquire index->lock in S mode.

Bulk load index already took index->lock X mode and is not going to release it until is over.

INSERT thread keeps on waiting, and when the wait crosses 600 seconds to acquire index->lock, it will crash.

Fix:

INSERT thread acquires index->lock to check the index online status. During the bulk load index build, there is no concurrent insert or read. So there is no need to acquire index->lock at all.

Bulk load index build is also used to create indexes in table rebuild cases. For example DROP COLUMN, ADD COLUMN. The indexes on intermediate table (#sql-ib..) are built using bulk load insert. A concurrent DMLs at this stage do not acquire index->lock. So acquiring index->lock on the intermediate table, which is not visible to anyone else doesn't block concurrent DMLs.

Ideally we can try to remove all index->lock X acquisitions in bulk load index build path. We play safe and remove acquisitions only incase of uncommited indexes. The other path (bulk load used during rebuild) is not affected anyway.

@satya-bodapati
Copy link
Contributor Author

Submitted jenkins job with --big-test on 5.7 pipeline

https://ps57.cd.percona.com/job/percona-server-5.7-pipeline/923/

Copy link
Contributor

@laurynas-biveinis laurynas-biveinis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show the point where INSERT thread tried to acquire index latch in S?

@laurynas-biveinis
Copy link
Contributor

Also, should 5.6 be fixed too?

@satya-bodapati
Copy link
Contributor Author

Also, should 5.6 be fixed too?

no, bulk load (WL#7277) is added only in 5.7

…phore wait > 600 assertion

Problem:
--------
A long running ALTER TABLE ADD INDEX with concurrent inserts causes sempahore waits and
eventually crashes the server.

To see this problem you need to have
1. A table with lots of data. Add index should take significant time to create many pages

2. Compressed table. This is becuase CPU is spent on compress() with mtr already latching index->lock
   More time spent by mtr, more waits by the INSERT. Helps in crash.

3. Concurrent inserts when ALTER is running. The inserts should happen specifically after the read phase
   of ALTER and after Bulk load index build (bottump build) started.

The entire bulkload process latches the index->lock X mode for the whole duration of bottom up build of index.
The index->lock is held across mtrs (because many pages are created during index build).

An example is this: Page1 mtr latches index->lock X mode, when page is full, a sibling page is created.
The sibling Page 2 (mtr) also acquires index->lock X mode.

Recursive X latching is allowed by same thread. Now Page 1 mtr commits but index->lock is still held by Page 2.
Now when page 2 is full, another sibling page is created. Sibling Page 3 now acquires index->lock X mode.
Page 2 mtr commits.. This goes on and on. Also happens with Pages at non-root levels.

Essentially the time index->lock is held is equally proportional to number of pages/mtrs created. And compress
tables helps in making mtr take a bit more time in doing compress() and duration of each mtr is higher with
compressed tables.

At this stage, a concurrent INSERT comes and since there is concurrent DDL and the index is uncommited,
this insert should go to online ALTER log. It tries to acquire index->lock in S mode.

Bulk load index already took index->lock X mode and is not going to release it until is over.

INSERT thread keeps on waiting, and when the wait crosses 600 seconds to acquire index->lock, it will crash.

Fix:
----
INSERT thread acquires index->lock to check the index online status. During the bulk load index build, there is no
concurrent insert or read. So there is no need to acquire index->lock at all.

Bulk load index build is also used to create indexes in table rebuild cases. For example DROP COLUMN, ADD COLUMN.
The indexes on intermediate table (#sql-ib..) are built using bulk load insert. A concurrent DMLs at this
stage do not acquire index->lock. So acquiring index->lock on the intermediate table, which is not visible to
anyone else doesn't block concurrent DMLs.

Ideally we can try to remove all index->lock X acquisitions in bulk load index build path. We play *safe* and remove
acquisitions only incase of uncommited indexes. The other path (bulk load used during rebuild) is not affected
anyway.
@satya-bodapati
Copy link
Contributor Author

Can you show the point where INSERT thread tried to acquire index latch in S?

Thread 26 (Thread 0x7ffff0bf9700 (LWP 15407)):
#0 0x00007ffff7bc39f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7fff8c038120) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7fff8c0380d0, cond=0x7fff8c0380f8) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7fff8c0380f8, mutex=0x7fff8c0380d0) at pthread_cond_wait.c:655
#3 0x0000555556f99b30 in os_event::wait (this=0x7fff8c0380c0) at /home/satya/WORK/PS-5.7-3410/storage/innobase/include/os0event.h:156
#4 0x0000555556f99482 in os_event::wait_low (this=0x7fff8c0380c0, reset_sig_count=1) at /home/satya/WORK/PS-5.7-3410/storage/innobase/os/os0event.cc:131
#5 0x0000555556f99897 in os_event_wait_low (event=0x7fff8c0380c0, reset_sig_count=1) at /home/satya/WORK/PS-5.7-3410/storage/innobase/os/os0event.cc:328
#6 0x00005555570b5ab1 in sync_array_wait_event (arr=0x55555853d788, cell=@0x7ffff0bf5240: 0x7ffff135e0a8)
at /home/satya/WORK/PS-5.7-3410/storage/innobase/sync/sync0arr.cc:475
#7 0x00005555570b9198 in rw_lock_s_lock_spin (lock=0x7fff8c038080, pass=0, file_name=0x5555578296e8 "/home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc",
line=2953) at /home/satya/WORK/PS-5.7-3410/storage/innobase/sync/sync0rw.cc:419
---Type to continue, or q to quit---
#8 0x0000555556ff9439 in rw_lock_s_lock_func (lock=0x7fff8c038080, pass=0, file_name=0x5555578296e8 "/home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc",
line=2953) at /home/satya/WORK/PS-5.7-3410/storage/innobase/include/sync0rw.ic:433
#9 0x0000555556ff94c6 in pfs_rw_lock_s_lock_func (lock=0x7fff8c038080, pass=0,
file_name=0x5555578296e8 "/home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc", line=2953)
at /home/satya/WORK/PS-5.7-3410/storage/innobase/include/sync0rw.ic:796
#10 0x0000555557007b9d in mtr_t::s_lock (this=0x7ffff0bf5950, lock=0x7fff8c038080,
file=0x5555578296e8 "/home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc", line=2953)
at /home/satya/WORK/PS-5.7-3410/storage/innobase/include/mtr0mtr.ic:244
#11 0x0000555557005196 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x7fff8c037e90, offsets_heap=0x7fff881f1248, heap=0x7fff88d59d48, entry=0x7fff88124300,
trx_id=0, thr=0x7fff881f0aa0, dup_chk_only=false) at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc:2953
#12 0x000055555700699f in row_ins_sec_index_entry (index=0x7fff8c037e90, entry=0x7fff88124300, thr=0x7fff881f0aa0, dup_chk_only=false)
at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc:3444
#13 0x0000555557006b4c in row_ins_index_entry (index=0x7fff8c037e90, entry=0x7fff88124300, thr=0x7fff881f0aa0)
at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc:3495
#14 0x00005555570070bc in row_ins_index_entry_step (node=0x7fff881f0730, thr=0x7fff881f0aa0) at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc:3643
#15 0x0000555557007478 in row_ins (node=0x7fff881f0730, thr=0x7fff881f0aa0) at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc:3785
#16 0x0000555557007ab5 in row_ins_step (thr=0x7fff881f0aa0) at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0ins.cc:3970
#17 0x00005555570298de in row_insert_for_mysql_using_ins_graph (mysql_rec=0x7fff88ce4dd0 "\360", prebuilt=0x7fff881f0130)
at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0mysql.cc:2324
#18 0x0000555557029ed0 in row_insert_for_mysql (mysql_rec=0x7fff88ce4dd0 "\360", prebuilt=0x7fff881f0130)
at /home/satya/WORK/PS-5.7-3410/storage/innobase/row/row0mysql.cc:2447
#19 0x0000555556ea57ab in ha_innobase::write_row (this=0x7fff88df5460, record=0x7fff88ce4dd0 "\360")
at /home/satya/WORK/PS-5.7-3410/storage/innobase/handler/ha_innodb.cc:8704
#20 0x0000555556351f57 in handler::ha_write_row (this=0x7fff88df5460, buf=0x7fff88ce4dd0 "\360") at /home/satya/WORK/PS-5.7-3410/sql/handler.cc:8529
#21 0x0000555556c88c5c in write_record (thd=0x7fff88000dd0, table=0x7fff881ef700, info=0x7ffff0bf6fd0, update=0x7ffff0bf7050)
at /home/satya/WORK/PS-5.7-3410/sql/sql_insert.cc:1881
#22 0x0000555556c85a8e in Sql_cmd_insert::mysql_insert (this=0x7fff88d83e38, thd=0x7fff88000dd0, table_list=0x7fff88d836f8)
at /home/satya/WORK/PS-5.7-3410/sql/sql_insert.cc:773
#23 0x0000555556c8cb97 in Sql_cmd_insert::execute (this=0x7fff88d83e38, thd=0x7fff88000dd0) at /home/satya/WORK/PS-5.7-3410/sql/sql_insert.cc:3120
#24 0x00005555569b9d6d in mysql_execute_command (thd=0x7fff88000dd0, first_level=true) at /home/satya/WORK/PS-5.7-3410/sql/sql_parse.cc:3810
#25 0x00005555569c0255 in mysql_parse (thd=0x7fff88000dd0, parser_state=0x7ffff0bf84a0, update_userstat=false) at /home/satya/WORK/PS-5.7-3410/sql/sql_parse.cc:5901
#26 0x00005555569b4379 in dispatch_command (thd=0x7fff88000dd0, com_data=0x7ffff0bf8dc0, command=COM_QUERY) at /home/satya/WORK/PS-5.7-3410/sql/sql_parse.cc:1528
#27 0x00005555569b3104 in do_command (thd=0x7fff88000dd0) at /home/satya/WORK/PS-5.7-3410/sql/sql_parse.cc:1053
#28 0x0000555556af7d8f in handle_connection (arg=0x555558d5ec60) at /home/satya/WORK/PS-5.7-3410/sql/conn_handler/connection_handler_per_thread.cc:312
#29 0x0000555556df0a44 in pfs_spawn_thread (arg=0x555558d57ef0) at /home/satya/WORK/PS-5.7-3410/storage/perfschema/pfs.cc:2190
#30 0x00007ffff7bbd6db in start_thread (arg=0x7ffff0bf9700) at pthread_create.c:463
#31 0x00007ffff5e6288f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

====
row_ins_sec_index_entry_low:

948 const bool check = !index->is_committed(); │
│2949 if (check) { │
│2950 DEBUG_SYNC_C("row_ins_sec_index_enter"); │
│2951 if (mode == BTR_MODIFY_LEAF) { │
│2952 search_mode |= BTR_ALREADY_S_LATCHED; │

│2953 mtr_s_lock(dict_index_get_lock(index), &mtr); │
│2954 } else { │
│2955 mtr_sx_lock(dict_index_get_lock(index), &mtr); │

Copy link
Contributor

@laurynas-biveinis laurynas-biveinis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still having trouble wrapping around my head on why is it safe not to X-latch index lock here, given the comment at row_ins_sec_index_entry_low:
/* Ensure that we acquire index->lock when inserting into an
index with index->online_status == ONLINE_INDEX_COMPLETE, but
could still be subject to rollback_inplace_alter_table().
This prevents a concurrent change of index->online_status.
and upstream fix at f2f7d43

@satya-bodapati
Copy link
Contributor Author

I am still having trouble wrapping around my head on why is it safe not to X-latch index lock here, given the comment at row_ins_sec_index_entry_low:
/* Ensure that we acquire index->lock when inserting into an
index with index->online_status == ONLINE_INDEX_COMPLETE, but
could still be subject to rollback_inplace_alter_table().
This prevents a concurrent change of index->online_status.
and upstream fix at f2f7d43

Am not removing the index->lock here (in row_ins_sec_index_entry_low()). The acquisition of index->lock is still there. As commented here, the index->lock is necessary when index->onine_status is read/changed.

And bulk load insert build doesn't change index->online_status at all (from BtrBulk::init() to Btrbulk::finish()). Please check

@laurynas-biveinis
Copy link
Contributor

OK, looks good, only the testcase question remains

@satya-bodapati satya-bodapati merged commit 7855bd5 into percona:5.7 Apr 19, 2019
@satya-bodapati satya-bodapati deleted the PS-5.7-3410 branch June 14, 2019 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants