Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use mysql 8.0 INSTANT DDL if supported #1198

Closed
morgo opened this issue Nov 10, 2022 · 8 comments · Fixed by #1201
Closed

Use mysql 8.0 INSTANT DDL if supported #1198

morgo opened this issue Nov 10, 2022 · 8 comments · Fixed by #1201

Comments

@morgo
Copy link
Contributor

morgo commented Nov 10, 2022

Hi friends,

I would like to propose a new feature (likely on by default) where gh-ost attempts to apply the DDL change first (but with an assertion that it must be INSTANT). This is particularly targeted at MySQL 8.0+ users, but will likely work on some earlier versions as well (Aurora 5.7 supports instant changes, although I've not specifically tested with the instant assertion..).

I am happy to contribute a patch for it - but per your request I am opening an issue here first to discuss :-)

I hacked up something just now to show that it looks easy enough to add:
https://github.com/github/gh-ost/compare/master...morgo:gh-ost:attempt-instant-ddl?expand=1

For an example usage:

go run . \
--max-load=Threads_running=25 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--user="msandbox" \
--password="msandbox" \
--host="127.0.0.1" \
--port=8031 \
--allow-on-master \
--database="test" \
--table="customers" \
--verbose \
--initially-drop-old-table \
--ok-to-drop-table \
--execute  --initially-drop-ghost-table --alter="add newcol9 int"
2022-11-09 20:17:59 INFO starting gh-ost
2022-11-09 20:17:59 INFO Migrating `test`.`customers`
2022-11-09 20:17:59 INFO inspector connection validated on 127.0.0.1:8031
2022-11-09 20:17:59 INFO User has SUPER, REPLICATION SLAVE privileges, and has ALL privileges on `test`.*
2022-11-09 20:17:59 INFO binary logs validated on 127.0.0.1:8031
2022-11-09 20:17:59 INFO Restarting replication on 127.0.0.1:8031 to make sure binlog settings apply to replication thread
2022-11-09 20:17:59 INFO Inspector initiated on mtocker-macbookpro.local:8031, version 8.0.31
2022-11-09 20:17:59 INFO Table found. Engine=InnoDB
2022-11-09 20:17:59 INFO Estimated number of rows via EXPLAIN: 9
2022-11-09 20:17:59 INFO Recursively searching for replication master
2022-11-09 20:17:59 INFO Master found to be mtocker-macbookpro.local:8031
2022-11-09 20:17:59 INFO log_slave_updates validated on 127.0.0.1:8031
2022-11-09 20:17:59 INFO Attempting to execute ALTER TABLE as INSTANT DDL
2022-11-09 20:17:59 INFO applier connection validated on 127.0.0.1:8031
2022-11-09 20:17:59 INFO applier connection validated on 127.0.0.1:8031
2022-11-09 20:17:59 INFO will use time_zone='SYSTEM' on applier
2022-11-09 20:17:59 INFO Examining table structure on applier
2022-11-09 20:17:59 INFO Applier initiated on mtocker-macbookpro.local:8031, version 8.0.31
2022-11-09 20:17:59 INFO INSTANT DDL Query is: ALTER /* gh-ost */ TABLE `test`.`customers` add newcol9 int, ALGORITHM=INSTANT
2022-11-09 20:17:59 INFO Success! Table `test`.`customers` migrated instantly
2022-11-09 20:17:59 INFO Tearing down inspector
2022-11-09 20:17:59 INFO Tearing down applier
# Done
@timvaillancourt
Copy link
Collaborator

@morgo thanks for this idea and branch 🙏! I'm curious if you've been able to test the performance of this on a large or active data set?

The reason I ask is, while "instant" (which I read as non-blocking) to the caller, I wonder if this triggers asynchronous work in InnoDB that could impact the server 🤔. We generally throttle operations that can cause impact, but here we probably cannot

@morgo
Copy link
Contributor Author

morgo commented Nov 13, 2022

@morgo thanks for this idea and branch 🙏! I'm curious if you've been able to test the performance of this on a large or active data set?

We currently use 5.7, so not yet. We don't have many workloads that would trigger the biggest impact this will have, but more on that below.

The reason I ask is, while "instant" (which I read as non-blocking) to the caller, I wonder if this triggers asynchronous work in InnoDB that could impact the server 🤔. We generally throttle operations that can cause impact, but here we probably cannot

INSTANT means it is a metadata change-only. Internally all the instant operations in various flavors all have Tencent games to thank as their ancestor. The way it works is it recycles a couple of bits in the row header that were previously not used to describe the row's version (which maps to a system table internally). Depending on the version when the row is read, InnoDB figures out if it has to patch out (or patch in) columns before handing them to the server. It does not attempt to ever sweep through and rewrite older versions of the rows to the later version, so there is no async work which is triggered.

But in terms of risks: it does require a meta data lock to change the table. This is the same metadata lock which is required at the end of the gh-ost operation to switch the tables. The time this will take is up to the length of the longest running transaction (which will have shared MDL locks preventing the table from being changed until they have finished executing). So the worst case is for users with long running transactions that involve this table (i.e. it's not based on server load, or database size - it's really p100 transaction length.)

Gh-ost requires that MDL lock eventually, but the difference here is that in current usage you get to elect the cut-over time to incur that MDL lock. Here it will try and acquire it immediately.

@timvaillancourt
Copy link
Collaborator

INSTANT means it is a metadata change-only. Internally all the instant operations in various flavors all have Tencent games to thank as their ancestor. The way it works is it recycles a couple of bits in the row header that were previously not used to describe the row's version (which maps to a system table internally)

Awesome, thanks for clarifying @morgo. This doesn't sound too concerning 👍

But in terms of risks: it does require a meta data lock to change the table. This is the same metadata lock which is required at the end of the gh-ost operation to switch the tables

Makes sense, the difference in timing of the metadata lock doesn't seem like a big deal considering the efficiency gains 👍. I think explaining this in the docs/ folder is all that's needed, more opinions appreciated cc: @dm-2 / @rashiq / @shlomi-noach

I suggest a PR with the branch mentioned is opened. My only thought is it might be best to try INSTANT only when the MySQL version is >= a certain value vs try it on all attempts (this is already done in some places IIRC), as it seems many still use 5.7

@morgo
Copy link
Contributor Author

morgo commented Nov 13, 2022

I suggest a PR with the branch mentioned is opened. My only thought is it might be best to try INSTANT only when the MySQL version is >= a certain value vs try it on all attempts (this is already done in some places IIRC), as it seems many still use 5.7

Sure, I'm happy to do that. I will wait a couple of days for other feedback, and then incorporate it into a PR.

@dm-2
Copy link
Contributor

dm-2 commented Nov 14, 2022

👋 @morgo Thanks for the PR, I think this is a great idea!

Makes sense, the difference in timing of the metadata lock doesn't seem like a big deal considering the efficiency gains 👍. I think explaining this in the docs/ folder is all that's needed, more opinions appreciated

Agreed 👍

I'd prefer attempt-instant-ddl to default to false for the time being so that we can add this to the next minor release - we can then default it to true in the next major release of gh-ost.

Would you mind also adding some tests? 🙇

@morgo
Copy link
Contributor Author

morgo commented Nov 14, 2022

I'd prefer attempt-instant-ddl to default to false for the time being so that we can add this to the next minor release - we can then default it to true in the next major release of gh-ost.

SGTM.

Would you mind also adding some tests? 🙇

I'm cool to add tests - I just wanted to make sure this is something that would be accepted first :-)

@morgo morgo mentioned this issue Nov 14, 2022
2 tasks
@hong-yi-yang
Copy link

hong-yi-yang commented Oct 25, 2023

The time this will take is up to the length of the longest running transaction (which will have shared MDL locks preventing the table from being changed until they have finished executing).

@morgo does this mean an instant DDL can finish executing first, but the metadata lock is held longer until the longest running transaction at the time finishes because the lock is shared between them?

@morgo
Copy link
Contributor Author

morgo commented Oct 25, 2023

No, step 1 is the MDL is acquired. Then for step 2 the meta data is changed.

So even though it is instant, that only applies to step 2. Step 1 is typically where all time is spent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
@morgo @timvaillancourt @dm-2 @hong-yi-yang and others