Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messaging: Avoid deadlocks related to 0 receiver behavior #10132

Merged
merged 16 commits into from
Apr 28, 2022

Conversation

mattlord
Copy link
Contributor

@mattlord mattlord commented Apr 24, 2022

Description

There's conditional behavior and code paths in the message manager when the number of receivers is 0. The main mutex and the stream mutex locks are used to ensure overall consistency. But:

  1. The order that those locks are taken in was not enforced or consistent
  2. There was a gap in the poller between when the 0 receiver code path was checked — specifically that it was NOT 0 — and the main mutex lock was taken to ensure that this invariant holds during the polling run here:
    // Fast-path. Skip all the work.
    if mm.receiverCount() == 0 {
    return
    }
    mm.streamMu.Lock()
    defer mm.streamMu.Unlock()
    ctx, cancel := context.WithTimeout(tabletenv.LocalContext(), mm.pollerTicks.Interval())
    defer func() {
    mm.tsv.LogError()
    cancel()
    }()
    size := mm.cache.Size()
    bindVars := map[string]*querypb.BindVariable{
    "time_next": sqltypes.Int64BindVariable(time.Now().UnixNano()),
    "max": sqltypes.Int64BindVariable(int64(size)),
    }
    qr, err := mm.readPending(ctx, bindVars)
    if err != nil {
    return
    }
    // Obtain mu lock to verify and preserve that len(receivers) != 0.
    mm.mu.Lock()
    defer mm.mu.Unlock()
    mm.messagesPending = false
    if len(qr.Rows) >= size {
    // There are probably more messages to be sent.
    mm.messagesPending = true
    }
    if len(mm.receivers) == 0 {
    // Almost never reachable because we just checked this.
    return
    }

These two factors can lead to deadlocks when the receiver count goes to/from 0 while the the poller runs:

In unsubscribe() we have the main mutex, and while holding it we then call stopVStream() if the subscriber/receiver count is at 0. Then stopVStream() tries to get the stream mutex.

Concurrently, runPoller() — which by default, or more accurately when using the documented message table structure here via the vt_poller_interval comment directive, runs on a 30 second ticker — gets the stream mutex first, then it later tries to get the main mutex.

Now we are deadlocked. They're both waiting on each other.

This then in turn causes things like PlannedReparentShard to hang because during the transition from primary serving we stop/close the message service, which in turn will wait trying to get the main mutex.

Changes

  1. We clean up the mutex usage. In order to ensure that the cache stays in sync with the database and we are moving forward linearly though the logical GTID stream — updated by the poller and its results streamer, and the binlog streamer — we now use the cacheManagementMu lock. The explicit usage of this new mutex allows us to avoid the deadlocks seen before because it's taken immediately when the poller or binlog streamer process data coming from the database (which is used to update the cache), and we no longer need to take this lock when starting or stopping the binlog streamer itself.
    • After the initial changes made in the PR I could see that we were still prone to deadlocks in various places due to checking the receiver count all the time (I could see this in the race test stack traces, see Notes below), which required the main mutex, while holding the stream mutex. Anywhere we were holding the stream mutex, and checking receiver count — which we did A LOT, for example when processing every row event from the binlog stream via processRowEvent()->Add() — we were at risk of a deadlock. The main mutex became VERY hot because we were checking the receiver count all the time to ensure the fast path (do nothing when there's no receivers).
  2. We add an important predicate to the poller querytime_acked is null — to try and limit the size of scans done on the underlying message table to try and ensure that the poller runs quickly. This aspect is important because the binlog event processing is blocked during this time. It also prevents us from doing unnecessary work as InnoDB does most of the work — all query (WHERE) predicates become index condition pushdowns and the new poller_idx index can be used for the ordering to apply the LIMIT as well — and passes the minimum amount of potentially useful data up the stack InnoDB->MySQL->MessageManager->Cache.
  3. We update the tests so that we're using the new recommended/documented (see docs ticket below) message table structure and thus we're testing behavior with the structure we tell users to use.

Notes

This is able to complete on vitessio/main:

$ make build; failed=0; for i in {1..100}; do if ! go test -race -v -failfast -timeout 1m vitess.io/vitess/go/vt/vttablet/tabletserver/messager; then echo -e '\nUh oh! We have a problem...\n'; failed=1; break; fi; go clean -testcache; done; if [[ ${failed} -eq 0 ]]; then echo -e '\nWe made it! All good...\n'; fi

Whatever changes are made here we should continue to pass that test (it does). It's even more reliable now — you can run it 1,000 times and no flakiness.

Related Issue(s)

Checklist

@mattlord mattlord force-pushed the messaging_deadlock branch from d4ff217 to f31eefa Compare April 24, 2022 20:42
@mattlord mattlord force-pushed the messaging_deadlock branch from f680779 to d177a21 Compare April 25, 2022 00:26
@mattlord mattlord marked this pull request as draft April 25, 2022 00:57
@mattlord mattlord marked this pull request as ready for review April 25, 2022 18:03
@mattlord mattlord requested a review from rohit-nayak-ps April 25, 2022 18:03
@mattlord mattlord changed the title Prevent deadlocks related to 0 receiver behavior Avoid deadlocks related to 0 receiver behavior Apr 25, 2022
@derekperkins
Copy link
Member

Looks like the majority of tests are failing on

--- FAIL: TestMessageStreamingPlan (0.00s)
    plan_test.go:239: 
        	Error Trace:	plan_test.go:239
        	Error:      	Received unexpected error:
        	            	Code: FAILED_PRECONDITION
        	            	'msg' is not a message table
        	Test:       	TestMessageStreamingPlan

This is for safe concurrency with the last receiver unsubscribing

Signed-off-by: Matt Lord <[email protected]>
@derekperkins
Copy link
Member

@mattlord you really rolled through quite a few iterations, thanks again for digging in so deep. I don't think I have enough context around vstream to provide much feedback on the latest design. Once @rohit-nayak-ps approves it, I'm ready to roll it out in prod to test it out.

@mattlord
Copy link
Contributor Author

mattlord commented Apr 27, 2022

Note that the Cluster (20) -> backup_xtrabackup test failure is due to: #10146

@derekperkins
Copy link
Member

@mattlord is this still waiting on @rohit-nayak-ps, or was pairing with @sougou sufficient review?

@mattlord mattlord merged commit 870f13e into vitessio:main Apr 28, 2022
@mattlord mattlord deleted the messaging_deadlock branch April 28, 2022 18:25
@derekperkins
Copy link
Member

🎉 Thanks so much for tracking this down!! I'll be rolling it out later today

@mattlord
Copy link
Contributor Author

🎉 Thanks so much for tracking this down!! I'll be rolling it out later today

Excellent! Please let me know how things go.

tanjinx added a commit to slackhq/vitess that referenced this pull request Aug 30, 2022
@tanjinx tanjinx mentioned this pull request Aug 30, 2022
3 tasks
@mattlord mattlord changed the title Avoid deadlocks related to 0 receiver behavior Messaging: Avoid deadlocks related to 0 receiver behavior Aug 31, 2022
tanjinx added a commit to slackhq/vitess that referenced this pull request Aug 31, 2022
rsajwani pushed a commit to planetscale/vitess that referenced this pull request Nov 8, 2022
* Prevent deadlocks related to 0 receiver behavior

Signed-off-by: Matt Lord <[email protected]>

* Update test tables to use poller_idx

Signed-off-by: Matt Lord <[email protected]>

* Minor changes after mutex usage review in message manager + cache

Signed-off-by: Matt Lord <[email protected]>

* Use atomics for receiver count and messages pending

Signed-off-by: Matt Lord <[email protected]>

* Don't take exclusive lock when in fast path

Signed-off-by: Matt Lord <[email protected]>

* Update tests to use the new recommended message table structure

See: vitessio/website#1015

Signed-off-by: Matt Lord <[email protected]>

* Correct tests

Signed-off-by: Matt Lord <[email protected]>

* Update e2e test to use new recommended table structure

Signed-off-by: Matt Lord <[email protected]>

* Fix TestMessageStreamingPlan test

Signed-off-by: Matt Lord <[email protected]>

* Fix godriver/TestStreamMessaging test

Signed-off-by: Matt Lord <[email protected]>

* Split streamMu into streamProcessingMu and lastPollPositionMu

Signed-off-by: Matt Lord <[email protected]>

* Poller cannot take main lock w/o having X stream processing lock

Signed-off-by: Matt Lord <[email protected]>

* Improve the comments a bit

Signed-off-by: Matt Lord <[email protected]>

* Hold the main mutex during Add

This is for safe concurrency with the last receiver unsubscribing

Signed-off-by: Matt Lord <[email protected]>

* Changes after pair reviewing with Sugu

Signed-off-by: Matt Lord <[email protected]>

* Use my GitHub handle for the self reference

Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Rameez Sajwani <[email protected]>
harshit-gangal pushed a commit that referenced this pull request Nov 22, 2022
…11619)

* Move towards MySQL 8.0 as the default template generation (#11153)

* Move towards MySQL 8.0 as the default template generation

This upgrades the remaining things to Ubuntu 20.04 and makes MySQL 8.0
the default we run tests against. We still have tests for MySQL 5.7 but
those are now explicitly opted into.

This should finish up the Ubuntu 20.04 upgrade and also makes things
easier for the future when we need to upgrade again.

Signed-off-by: Dirkjan Bussink <[email protected]>

* CI: rename shard vtorc_8.0 to vtorc_5.7, change expected test output for 8.0

Signed-off-by: deepthi <[email protected]>

* CI: increase timeout for 8.0 tests on the actual test step from 30 to 45 mins

Signed-off-by: deepthi <[email protected]>

* CI: increase timeout to 45 minutes for mysql57 tests too. We really only need this for vtorc, but I've made the change to the template so all tests get it.

Signed-off-by: deepthi <[email protected]>

* CI: fix vtorc test to work with both 5.7 and 8.0

Signed-off-by: deepthi <[email protected]>

* CI: missed docker flag in mysql57 template, one more fix to vtorc test

Signed-off-by: deepthi <[email protected]>

* removing spaces from pb file

Signed-off-by: Rameez Sajwani <[email protected]>

* removing spaces in pb file part 2

Signed-off-by: Rameez Sajwani <[email protected]>

Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: deepthi <[email protected]>
Signed-off-by: Rameez Sajwani <[email protected]>
Co-authored-by: deepthi <[email protected]>
Co-authored-by: Rameez Sajwani <[email protected]>

* fixing template files to default to mysql80

Signed-off-by: Rameez Sajwani <[email protected]>

* changing update statement to alter

Signed-off-by: Rameez Sajwani <[email protected]>

* Avoid deadlocks related to 0 receiver behavior (#10132)

* Prevent deadlocks related to 0 receiver behavior

Signed-off-by: Matt Lord <[email protected]>

* Update test tables to use poller_idx

Signed-off-by: Matt Lord <[email protected]>

* Minor changes after mutex usage review in message manager + cache

Signed-off-by: Matt Lord <[email protected]>

* Use atomics for receiver count and messages pending

Signed-off-by: Matt Lord <[email protected]>

* Don't take exclusive lock when in fast path

Signed-off-by: Matt Lord <[email protected]>

* Update tests to use the new recommended message table structure

See: vitessio/website#1015

Signed-off-by: Matt Lord <[email protected]>

* Correct tests

Signed-off-by: Matt Lord <[email protected]>

* Update e2e test to use new recommended table structure

Signed-off-by: Matt Lord <[email protected]>

* Fix TestMessageStreamingPlan test

Signed-off-by: Matt Lord <[email protected]>

* Fix godriver/TestStreamMessaging test

Signed-off-by: Matt Lord <[email protected]>

* Split streamMu into streamProcessingMu and lastPollPositionMu

Signed-off-by: Matt Lord <[email protected]>

* Poller cannot take main lock w/o having X stream processing lock

Signed-off-by: Matt Lord <[email protected]>

* Improve the comments a bit

Signed-off-by: Matt Lord <[email protected]>

* Hold the main mutex during Add

This is for safe concurrency with the last receiver unsubscribing

Signed-off-by: Matt Lord <[email protected]>

* Changes after pair reviewing with Sugu

Signed-off-by: Matt Lord <[email protected]>

* Use my GitHub handle for the self reference

Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Rameez Sajwani <[email protected]>

* remove unwanted sleep

Signed-off-by: Rameez Sajwani <[email protected]>

* Fix failures

Signed-off-by: Rameez Sajwani <[email protected]>

* Fix backup tests

Signed-off-by: Rameez Sajwani <[email protected]>

* fix vtgate_gen4 error

Signed-off-by: Rameez Sajwani <[email protected]>

* upgrading to mysql 8.0 using vinalla mysql

Signed-off-by: Rameez Sajwani <[email protected]>

* pin to specific version 8.0.25

Signed-off-by: Rameez Sajwani <[email protected]>

* Try to pin mysql version to 8.0.25 using tar file

Signed-off-by: Rameez Sajwani <[email protected]>

* Fix bug in last commit for 8.0.25

Signed-off-by: Rameez Sajwani <[email protected]>

* fixing template files to use mysql8.0.25

Signed-off-by: Rameez Sajwani <[email protected]>

* fixing community version

Signed-off-by: Rameez Sajwani <[email protected]>

* removing all mysql version before installation

Signed-off-by: Rameez Sajwani <[email protected]>

* moving cluster 11 to self host

Signed-off-by: Rameez Sajwani <[email protected]>

* trying different combination since mysql is not getting installed correctly

Signed-off-by: Rameez Sajwani <[email protected]>

* use tar instead of deb file

Signed-off-by: Rameez Sajwani <[email protected]>

* remove mysql stop statement

Signed-off-by: Rameez Sajwani <[email protected]>

* setting vt_mysql_root

Signed-off-by: Rameez Sajwani <[email protected]>

* moving export to right place

Signed-off-by: Rameez Sajwani <[email protected]>

* change template to accomodate tar file download

Signed-off-by: Rameez Sajwani <[email protected]>

* fixing mysql80 cluster

Signed-off-by: Rameez Sajwani <[email protected]>

* Adding mysql to the path env variable

Signed-off-by: Rameez Sajwani <[email protected]>

* fixing shardedpitr_tls test

Signed-off-by: Rameez Sajwani <[email protected]>

* fix upgrade-downgrade test

Signed-off-by: Rameez Sajwani <[email protected]>

* adjust epected AUTO_INCREMENT value

Signed-off-by: Shlomi Noach <[email protected]>

* removed 'expected_table_structure' files because there are different outputs in mysql57 and in mysql80

Signed-off-by: Shlomi Noach <[email protected]>

* adding mysql version in workflow logs

Signed-off-by: Rameez Sajwani <[email protected]>

* move to utuntu 18 for upgrade-downgrade test

Signed-off-by: Rameez Sajwani <[email protected]>

Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: deepthi <[email protected]>
Signed-off-by: Rameez Sajwani <[email protected]>
Signed-off-by: Matt Lord <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]>
Co-authored-by: deepthi <[email protected]>
Co-authored-by: Matt Lord <[email protected]>
Co-authored-by: Shlomi Noach <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug Report: vttablet hanging when running PlannedReparentShard
4 participants