Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rare goroutine deadlock in M3DB #2127

Closed
jcerniauskas opened this issue Jan 30, 2020 · 0 comments · Fixed by #2128
Closed

Rare goroutine deadlock in M3DB #2127

jcerniauskas opened this issue Jan 30, 2020 · 0 comments · Fixed by #2128

Comments

@jcerniauskas
Copy link
Contributor

I believe #2096 introduced a rare goroutine deadlock in M3DB. There are new nested read locks in the tickAndExpire->BlockStatesSnapshot path. If first read lock in tickAndExpire is acquired and another goroutine wants to acquire a write lock it will make the tick goroutine block on acquiring nested read lock and prevent it from releasing the first one, therefore ending in a deadlock.

https://github.com/m3db/m3/blob/master/src/dbnode/storage/shard.go#L678-L683
https://github.com/m3db/m3/blob/master/src/dbnode/storage/shard.go#L387-L389

Relevant goroutine dump excerpt at the time of deadlock:

goroutine 549173091 [semacquire, 60 minutes]:
sync.runtime_SemacquireMutex(0xc030b6f8cc, 0xc034b3b800, 0x0)
	.../go1.13.4.linux.amd64/src/runtime/sema.go:71 +0x47
sync.(*RWMutex).RLock(...)
	.../go1.13.4.linux.amd64/src/sync/rwmutex.go:50
github.com/m3db/m3/src/dbnode/storage.(*dbShard).BlockStatesSnapshot(0xc030b6f8c0, 0x0, 0x0)
	.../github.com/m3db/m3/src/dbnode/storage/shard.go:387 +0x3aa
github.com/m3db/m3/src/dbnode/storage.(*dbShard).tickAndExpire(0xc030b6f8c0, 0x22882c0, 0xc0ac066a98, 0x0, 0x22b05c0, 0xc02fc46180, 0x0, 0x0, 0x0, 0x0, ...)
	.../github.com/m3db/m3/src/dbnode/storage/shard.go:689 +0x1a8
github.com/m3db/m3/src/dbnode/storage.(*dbShard).Tick(0xc030b6f8c0, 0x22882c0, 0xc0ac066a98, 0xbf84c28fa0be9381, 0x281db7374932, 0x33426e0, 0x22b05c0, 0xc02fc46180, 0x0, 0x0, ...)
	.../github.com/m3db/m3/src/dbnode/storage/shard.go:639 +0x100
github.com/m3db/m3/src/dbnode/storage.(*dbNamespace).Tick.func1()
	.../github.com/m3db/m3/src/dbnode/storage/namespace.go:559 +0x170
github.com/m3db/m3/src/x/sync.(*workerPool).Go.func1(0xc08377d980, 0xc08ff707e0)
	.../github.com/m3db/m3/src/x/sync/worker_pool.go:46 +0x27
created by github.com/m3db/m3/src/x/sync.(*workerPool).Go
	.../github.com/m3db/m3/src/x/sync/worker_pool.go:45 +0x64

...

goroutine 94052 [semacquire, 60 minutes]:
sync.runtime_SemacquireMutex(0xc030b6f8c8, 0x0, 0x0)
	.../go1.13.4.linux.amd64/src/runtime/sema.go:71 +0x47
sync.(*RWMutex).Lock(0xc030b6f8c0)
	.../go1.13.4.linux.amd64/src/sync/rwmutex.go:103 +0x88
github.com/m3db/m3/src/dbnode/storage.(*dbShard).insertSeriesBatch(0xc030b6f8c0, 0xc1f4458000, 0x1, 0x9d, 0x21689cce, 0x250e50befead22)
	.../github.com/m3db/m3/src/dbnode/storage/shard.go:1349 +0x45
github.com/m3db/m3/src/dbnode/storage.(*dbShardInsertQueue).insertLoop(0xc030b7c360)
	.../github.com/m3db/m3/src/dbnode/storage/shard_insert_queue.go:240 +0x250
created by github.com/m3db/m3/src/dbnode/storage.(*dbShardInsertQueue).Start
	.../github.com/m3db/m3/src/dbnode/storage/shard_insert_queue.go:265 +0xc2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant