Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure in concurrent usage. #643

Closed
fireboy1919 opened this issue Sep 19, 2018 · 3 comments
Closed

Failure in concurrent usage. #643

fireboy1919 opened this issue Sep 19, 2018 · 3 comments

Comments

@fireboy1919
Copy link

fireboy1919 commented Sep 19, 2018

Using the latest AWS instances (MySQL 5.6.34):
gh-ost version 1.0.46
Command:

            --password MYSQL_PASSWORD
            --host  MYSQL_REPLICA
            --assume-master-host MYSQL_HOST
            --database SCHEMA
            --table TABLE
            --max-load=Threads_running=25
            --critical-load=Threads_running=1000
            --verbose
            --assume-rbr
            --panic-flag-file=/tmp/ghost.panic.flag
            --throttle-additional-flag-file=/tmp/ghost.throttle.flag
            --initially-drop-old-table
            --initially-drop-ghost-table
            --replica-server-id  ID
            --execute
            --alter [SQL COMMAND]

I'm running ten processes concurrently with this; each has it's own replica server id and table. The commands are all simply altering the encoding of columns. There is one replica, and one master that all of these tables are connecting to.

Occasionally, one of these ten processes will get a log entry like this:

2018-09-19 15:46:03,458 - tasks - INFO - Copy: 0/21266 0.0%; Applied: 0; Backlog: 0/1000; Time: 4s(total), 0s(copy); streamer: mysql-bin-changelog.001508:12243531; State: migrating; ETA: N/A
2018-09-19 15:46:04,506 - tasks - INFO - Copy: 0/21266 0.0%; Applied: 0; Backlog: 0/1000; Time: 5s(total), 1s(copy); streamer: mysql-bin-changelog.001508:16007408; State: migrating; ETA: N/A
2018-09-19 15:46:05,833 - tasks - INFO - Copy: 2000/21266 9.4%; Applied: 0; Backlog: 0/1000; Time: 6s(total), 2s(copy); streamer: mysql-bin-changelog.001508:19912097; State: migrating; ETA: 19s
2018-09-19 15:46:06,511 - tasks - INFO - Copy: 6000/21266 28.2%; Applied: 0; Backlog: 0/1000; Time: 7s(total), 3s(copy); streamer: mysql-bin-changelog.001508:23394478; State: migrating; ETA: 7s
2018-09-19 15:46:07,519 - tasks - INFO - Copy: 10000/21266 47.0%; Applied: 0; Backlog: 0/1000; Time: 8s(total), 4s(copy); streamer: mysql-bin-changelog.001508:25395574; State: migrating; ETA: 4s
2018-09-19 15:46:08,511 - tasks - INFO - Copy: 14000/21266 65.8%; Applied: 0; Backlog: 0/1000; Time: 9s(total), 5s(copy); streamer: mysql-bin-changelog.001508:27184059; State: migrating; ETA: 2s
2018-09-19 15:46:09,790 - tasks - INFO - Copy: 18000/21266 84.6%; Applied: 0; Backlog: 0/1000; Time: 10s(total), 6s(copy); streamer: mysql-bin-changelog.001508:29431552; State: migrating; ETA: 1s
2018-09-19 15:46:10,477 - tasks - INFO - 2018-09-19 15:46:10 INFO Row copy complete
2018-09-19 15:46:10,502 - tasks - INFO - Copy: 21582/21266 101.5%; Applied: 0; Backlog: 0/1000; Time: 11s(total), 7s(copy); streamer: mysql-bin-changelog.001508:32258059; State: migrating; ETA: due
2018-09-19 15:46:10,929 - tasks - INFO - Copy: 21582/21582 100.0%; Applied: 0; Backlog: 0/1000; Time: 11s(total), 7s(copy); streamer: mysql-bin-changelog.001508:32327067; State: migrating; ETA: due
2018-09-19 15:46:11,027 - tasks - INFO - 2018-09-19 15:46:11 INFO Grabbing voluntary lock: gh-ost.679483.lock
2018-09-19 15:46:11,066 - tasks - INFO - 2018-09-19 15:46:11 INFO Setting LOCK timeout as 6 seconds
2018-09-19 15:46:11,112 - tasks - INFO - 2018-09-19 15:46:11 INFO Looking for magic cut-over table
2018-09-19 15:46:11,158 - tasks - INFO - 2018-09-19 15:46:11 INFO Creating magic cut-over table `flexdb`.`_ercpt_pos_del`
2018-09-19 15:46:11,240 - tasks - INFO - 2018-09-19 15:46:11 INFO Magic cut-over table created
2018-09-19 15:46:11,240 - tasks - INFO - 2018-09-19 15:46:11 INFO Locking `flexdb`.`ercpt_pos`, `flexdb`.`_ercpt_pos_del`
2018-09-19 15:46:11,298 - tasks - INFO - 2018-09-19 15:46:11 INFO Tables locked
2018-09-19 15:46:11,298 - tasks - INFO - 2018-09-19 15:46:11 INFO Session locking original & magic tables is 679483
2018-09-19 15:46:11,298 - tasks - INFO - 2018-09-19 15:46:11 INFO Writing changelog state: AllEventsUpToLockProcessed:1537371971297744967
2018-09-19 15:46:11,406 - tasks - INFO - 2018-09-19 15:46:11 INFO Waiting for events up to lock
2018-09-19 15:46:11,855 - tasks - INFO - Copy: 21582/21582 100.0%; Applied: 0; Backlog: 0/1000; Time: 12s(total), 7s(copy); streamer: mysql-bin-changelog.001508:35220402; State: migrating; ETA: due
2018-09-19 15:46:12,909 - tasks - INFO - Copy: 21582/21582 100.0%; Applied: 0; Backlog: 0/1000; Time: 13s(total), 7s(copy); streamer: mysql-bin-changelog.001508:38088393; State: migrating; ETA: due
2018-09-19 15:46:13,580 - tasks - INFO - 2018-09-19 15:46:13 INFO Intercepted changelog state AllEventsUpToLockProcessed
2018-09-19 15:46:13,581 - tasks - INFO - 2018-09-19 15:46:13 INFO Handled changelog state AllEventsUpToLockProcessed
2018-09-19 15:46:13,870 - tasks - INFO - Copy: 21582/21582 100.0%; Applied: 0; Backlog: 0/1000; Time: 14s(total), 7s(copy); streamer: mysql-bin-changelog.001508:40968182; State: migrating; ETA: due
2018-09-19 15:46:14,298 - tasks - INFO - 2018-09-19 15:46:14 ERROR Timeout while waiting for events up to lock
2018-09-19 15:46:14,298 - tasks - INFO - 2018-09-19 15:46:14 ERROR 2018-09-19 15:46:14 ERROR Timeout while waiting for events up to lock
2018-09-19 15:46:14,298 - tasks - INFO - 2018-09-19 15:46:14 INFO Looking for magic cut-over table
2018-09-19 15:46:14,298 - tasks - INFO - 2018-09-19 15:46:14 INFO Will now proceed to drop magic table and unlock tables
2018-09-19 15:46:14,298 - tasks - INFO - 2018-09-19 15:46:14 INFO Dropping magic cut-over table
2018-09-19 15:46:14,345 - tasks - INFO - 2018-09-19 15:46:14 INFO Dropping magic cut-over table
2018-09-19 15:46:14,345 - tasks - INFO - 2018-09-19 15:46:14 INFO Dropping table `flexdb`.`_ercpt_pos_del`
2018-09-19 15:46:14,380 - tasks - INFO - 2018-09-19 15:46:14 INFO Releasing lock from `flexdb`.`ercpt_pos`, `flexdb`.`_ercpt_pos_del`
2018-09-19 15:46:14,395 - tasks - INFO - 2018-09-19 15:46:14 INFO Table dropped
2018-09-19 15:46:14,429 - tasks - INFO - 2018-09-19 15:46:14 INFO Tables unlocked
2018-09-19 15:46:14,517 - tasks - INFO - Copy: 21582/21582 100.0%; Applied: 0; Backlog: 1/1000; Time: 15s(total), 7s(copy); streamer: mysql-bin-changelog.001508:44188149; State: migrating; ETA: due
2018-09-19 15:46:15,513 - tasks - INFO - 2018-09-19 15:46:15 INFO Grabbing voluntary lock: gh-ost.679494.lock
2018-09-19 15:46:15,564 - tasks - INFO - 2018-09-19 15:46:15 INFO Setting LOCK timeout as 6 seconds
2018-09-19 15:46:15,614 - tasks - INFO - 2018-09-19 15:46:15 INFO Looking for magic cut-over table
2018-09-19 15:46:15,816 - tasks - INFO - Copy: 21582/21582 100.0%; Applied: 0; Backlog: 0/1000; Time: 16s(total), 7s(copy); streamer: mysql-bin-changelog.001508:47539602; State: migrating; ETA: due
2018-09-19 15:46:16,034 - tasks - INFO - 2018-09-19 15:46:16 INFO Creating magic cut-over table `flexdb`.`_ercpt_pos_del`
2018-09-19 15:46:16,102 - tasks - INFO - 2018-09-19 15:46:16 INFO Magic cut-over table created
2018-09-19 15:46:16,102 - tasks - INFO - 2018-09-19 15:46:16 INFO Locking `flexdb`.`ercpt_pos`, `flexdb`.`_ercpt_pos_del`
2018-09-19 15:46:16,148 - tasks - INFO - 2018-09-19 15:46:16 INFO Tables locked
2018-09-19 15:46:16,149 - tasks - INFO - 2018-09-19 15:46:16 INFO Session locking original & magic tables is 679494
2018-09-19 15:46:16,149 - tasks - INFO - 2018-09-19 15:46:16 INFO Writing changelog state: AllEventsUpToLockProcessed:1537371976148623595

...ad infinitum.

While the rest start looking like this:

2018-09-19 15:46:30,807 - tasks - INFO - Copy: 41000/1030675 4.0%; Applied: 0; Backlog: 0/1000; Time: 18s(total), 12s(copy); streamer: mysql-bin-changelog.001508:98844289; State: migrating; ETA: 4m51s
2018-09-19 15:46:32,341 - tasks - INFO - Copy: 42000/1030675 4.1%; Applied: 0; Backlog: 0/1000; Time: 19s(total), 13s(copy); streamer: mysql-bin-changelog.001508:101466812; State: throttled, lag=1.860169s; ETA: 5m7s
2018-09-19 15:46:32,986 - tasks - INFO - Copy: 42000/1030675 4.1%; Applied: 0; Backlog: 0/1000; Time: 20s(total), 14s(copy); streamer: mysql-bin-changelog.001508:104640997; State: throttled, lag=2.061843s; ETA: 5m30s

And then none of the processes complete. This is repeatable; I just need to run it against enough tables and eventually, I'll hit this concurrency bug and all execution will stop and never start again.

This might be a dupe; there are other bugs labeled as relating to locking and concurrency, but I don't see anything with precisely the same output.

Any idea why? Or how to prevent it?

@shlomi-noach
Copy link
Contributor

@fireboy1919 a few days ago we merged #611. I'm not certain, but it's possible that it could mitigate the problem you're seeing. Would you be able to test master branch? (to generate the binary checkout the repository and run ./build.sh or script/build)

@fireboy1919
Copy link
Author

Yes, it looks like this solves the problem. Thanks!

@shlomi-noach
Copy link
Contributor

Awesome, thanks for updating!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants