Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/tracing: disable span reuse under deadlock detector #75423

Merged
merged 1 commit into from
Jan 23, 2022

Conversation

andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Jan 23, 2022

In #73883 we started reusing tracing spans through a sync.Pool. This
causes the mutexes in the spans to be acquired by a goroutine
repeatedly, in a random order. Multiple spans can be locked at the same
time, but the ordering well defined (a child can be locked while the
parent is also locked). This erroneously looks like a possible deadlock
to the deadlock detector. The following test demonstrates the problem -
it's flagged by the deadlock detector even though the code is
deadlock-free.

This patch fixes the issue by disabling span reuse under deadlock
builds.

I guess the moral of the story is that recycling structs with locks
through sync.Pool is incompatible with the deadlock detector? One
option, I guess, is to contribute a way to mark some mutexes as excluded
from the deadlock detector.

func TestXXX(t *testing.T) {
	p := sync.Pool{New: func() interface{} {
		return new(syncutil.Mutex)
	},
	}

	var m1, m2 *syncutil.Mutex
	m1 = p.Get().(*syncutil.Mutex)
	m2 = p.Get().(*syncutil.Mutex)

	m1.Lock()
	m2.Lock()
	m2.Unlock()
	m1.Unlock()

	p.Put(m1)
	p.Put(m2)
	m2 = p.Get().(*syncutil.Mutex)
	m1 = p.Get().(*syncutil.Mutex)

	m1.Lock()
	m2.Lock()
	m2.Unlock()
	m1.Unlock()
}

Fixes #75351 and a million others.

Release note: None

In cockroachdb#73883 we started reusing tracing spans through a sync.Pool. This
causes the mutexes in the spans to be acquired by a goroutine
repeatedly, in a random order. Multiple spans can be locked at the same
time, but the ordering well defined (a child can be locked while the
parent is also locked). This erroneously looks like a possible deadlock
to the deadlock detector. The following test demonstrates the problem -
it's flagged by the deadlock detector even though the code is
deadlock-free.

This patch fixes the issue by disabling span reuse under deadlock
builds.

I guess the moral of the story is that recycling structs with locks
through sync.Pool is incompatible with the deadlock detector? One
option, I guess, is to contribute a way to mark some mutexes as excluded
from the deadlock detector.

func TestXXX(t *testing.T) {
	p := sync.Pool{New: func() interface{} {
		return new(syncutil.Mutex)
	},
	}

	var m1, m2 *syncutil.Mutex
	m1 = p.Get().(*syncutil.Mutex)
	m2 = p.Get().(*syncutil.Mutex)

	m1.Lock()
	m2.Lock()
	m2.Unlock()
	m1.Unlock()

	p.Put(m1)
	p.Put(m2)
	m2 = p.Get().(*syncutil.Mutex)
	m1 = p.Get().(*syncutil.Mutex)

	m1.Lock()
	m2.Lock()
	m2.Unlock()
	m1.Unlock()
}

Fixes cockroachdb#75351 and a million others.

Release note: None
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than disabling span reuse, we could add a syncutil.ReusableMutex or something like that that is always a simple mutex, even for deadlock builds. We should file an issue for github.com/sasha-s/go-deadlock to provide some kind of Reset function and then we could expose that through ReusableMutex.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @erikgrinaker, and @nvanbenschoten)

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking similar things. But let's do this patch first, I'd say.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @erikgrinaker, and @nvanbenschoten)

@andreimatei
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 23, 2022

Build failed:

@andreimatei
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 23, 2022

Build failed:

@andreimatei
Copy link
Contributor Author

keeps flaking on #68395

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 23, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jan 23, 2022

Build succeeded:

@craig craig bot merged commit e8914ab into cockroachdb:master Jan 23, 2022
@andreimatei andreimatei deleted the tracing.fix-deadlock branch January 24, 2022 15:15
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Jan 25, 2022
As explained in cockroachdb#75423, span reuse is incompatible with the deadlock
detector, which flags this reuse as false-positives. That patch
generally disabled reuse when running under the test detector. This
patch skips a few tests that explicitly ask for span reuse.

I'll separately try to improve the deadlock detector such that it can be
used with our code.

Fixes cockroachdb#75433
Fixes cockroachdb#75488

Release note: None
craig bot pushed a commit that referenced this pull request Jan 26, 2022
75439: util/tracing: skip a few tests under deadlock detector r=andreimatei a=andreimatei

As explained in #75423, span reuse is incompatible with the deadlock
detector, which flags this reuse as false-positives. That patch
generally disabled reuse when running under the test detector. This
patch skips a few tests that explicitly ask for span reuse.

I'll separately try to improve the deadlock detector such that it can be
used with our code.

Fixes #75433
Fixes #75488

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv/kvclient/rangecache: TestRangeCacheHandleDoubleSplit failed
3 participants