Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PS-3410 : LP #1570114: Long running ALTER TABLE ADD INDEX causes sema… #3143

Merged
merged 1 commit into from
Apr 19, 2019

Commits on Apr 17, 2019

  1. PS-3410 : LP #1570114: Long running ALTER TABLE ADD INDEX causes sema…

    …phore wait > 600 assertion
    
    Problem:
    --------
    A long running ALTER TABLE ADD INDEX with concurrent inserts causes sempahore waits and
    eventually crashes the server.
    
    To see this problem you need to have
    1. A table with lots of data. Add index should take significant time to create many pages
    
    2. Compressed table. This is becuase CPU is spent on compress() with mtr already latching index->lock
       More time spent by mtr, more waits by the INSERT. Helps in crash.
    
    3. Concurrent inserts when ALTER is running. The inserts should happen specifically after the read phase
       of ALTER and after Bulk load index build (bottump build) started.
    
    The entire bulkload process latches the index->lock X mode for the whole duration of bottom up build of index.
    The index->lock is held across mtrs (because many pages are created during index build).
    
    An example is this: Page1 mtr latches index->lock X mode, when page is full, a sibling page is created.
    The sibling Page 2 (mtr) also acquires index->lock X mode.
    
    Recursive X latching is allowed by same thread. Now Page 1 mtr commits but index->lock is still held by Page 2.
    Now when page 2 is full, another sibling page is created. Sibling Page 3 now acquires index->lock X mode.
    Page 2 mtr commits.. This goes on and on. Also happens with Pages at non-root levels.
    
    Essentially the time index->lock is held is equally proportional to number of pages/mtrs created. And compress
    tables helps in making mtr take a bit more time in doing compress() and duration of each mtr is higher with
    compressed tables.
    
    At this stage, a concurrent INSERT comes and since there is concurrent DDL and the index is uncommited,
    this insert should go to online ALTER log. It tries to acquire index->lock in S mode.
    
    Bulk load index already took index->lock X mode and is not going to release it until is over.
    
    INSERT thread keeps on waiting, and when the wait crosses 600 seconds to acquire index->lock, it will crash.
    
    Fix:
    ----
    INSERT thread acquires index->lock to check the index online status. During the bulk load index build, there is no
    concurrent insert or read. So there is no need to acquire index->lock at all.
    
    Bulk load index build is also used to create indexes in table rebuild cases. For example DROP COLUMN, ADD COLUMN.
    The indexes on intermediate table (#sql-ib..) are built using bulk load insert. A concurrent DMLs at this
    stage do not acquire index->lock. So acquiring index->lock on the intermediate table, which is not visible to
    anyone else doesn't block concurrent DMLs.
    
    Ideally we can try to remove all index->lock X acquisitions in bulk load index build path. We play *safe* and remove
    acquisitions only incase of uncommited indexes. The other path (bulk load used during rebuild) is not affected
    anyway.
    satya-bodapati committed Apr 17, 2019
    Configuration menu
    Copy the full SHA
    7ebaef8 View commit details
    Browse the repository at this point in the history