…phore wait > 600 assertion
Problem:
--------
A long running ALTER TABLE ADD INDEX with concurrent inserts causes sempahore waits and
eventually crashes the server.
To see this problem you need to have
1. A table with lots of data. Add index should take significant time to create many pages
2. Compressed table. This is becuase CPU is spent on compress() with mtr already latching index->lock
More time spent by mtr, more waits by the INSERT. Helps in crash.
3. Concurrent inserts when ALTER is running. The inserts should happen specifically after the read phase
of ALTER and after Bulk load index build (bottump build) started.
The entire bulkload process latches the index->lock X mode for the whole duration of bottom up build of index.
The index->lock is held across mtrs (because many pages are created during index build).
An example is this: Page1 mtr latches index->lock X mode, when page is full, a sibling page is created.
The sibling Page 2 (mtr) also acquires index->lock X mode.
Recursive X latching is allowed by same thread. Now Page 1 mtr commits but index->lock is still held by Page 2.
Now when page 2 is full, another sibling page is created. Sibling Page 3 now acquires index->lock X mode.
Page 2 mtr commits.. This goes on and on. Also happens with Pages at non-root levels.
Essentially the time index->lock is held is equally proportional to number of pages/mtrs created. And compress
tables helps in making mtr take a bit more time in doing compress() and duration of each mtr is higher with
compressed tables.
At this stage, a concurrent INSERT comes and since there is concurrent DDL and the index is uncommited,
this insert should go to online ALTER log. It tries to acquire index->lock in S mode.
Bulk load index already took index->lock X mode and is not going to release it until is over.
INSERT thread keeps on waiting, and when the wait crosses 600 seconds to acquire index->lock, it will crash.
Fix:
----
INSERT thread acquires index->lock to check the index online status. During the bulk load index build, there is no
concurrent insert or read. So there is no need to acquire index->lock at all.
Bulk load index build is also used to create indexes in table rebuild cases. For example DROP COLUMN, ADD COLUMN.
The indexes on intermediate table (#sql-ib..) are built using bulk load insert. A concurrent DMLs at this
stage do not acquire index->lock. So acquiring index->lock on the intermediate table, which is not visible to
anyone else doesn't block concurrent DMLs.
Ideally we can try to remove all index->lock X acquisitions in bulk load index build path. We play *safe* and remove
acquisitions only incase of uncommited indexes. The other path (bulk load used during rebuild) is not affected
anyway.