Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(spanner): early unlock of session pool lock during dumping the tracked session handles to avoid deadlock #5777

Merged
merged 4 commits into from
Mar 23, 2022

Conversation

rahul2393
Copy link
Contributor

Fixes #5710.

@rahul2393 rahul2393 requested review from a team as code owners March 22, 2022 08:42
@product-auto-label product-auto-label bot added size: xs Pull request size is extra small. api: spanner Issues related to the Spanner API. labels Mar 22, 2022
@rahul2393 rahul2393 changed the title fix(spanner): remove session handle lock during dumping the tracked session handles fix(spanner): early unlock of session pool lock during dumping the tracked session handles to avoid deadlock Mar 22, 2022
defer p.mu.Unlock()
element := p.trackedSessionHandles.Front()
// NOTE: No need to keep the sessionPool lock since we don't modify it,
// this will return the snapshot of current elements when error getting sessions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trackedSessionHandles could be being modified in different places like here. https://github.com/googleapis/google-cloud-go/blob/spanner/v1.30.0/spanner/session.go#L83

How do we guarantee that we will dump a consistent snapshot of trackedSessionHandles without locking p.mu?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think getTrackedSessionHandleStacksLocked is only called when the client gets errGetSessionTimeout, so even if trackedSessionHandles later gets updated as a customer I want to know what happen when errGetSessionTimeout was thrown and not the current snapshot, so I think we should release the lock as soon as we read trackedSessionHandles.Front()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codyoss can you please help review this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rahul2393 If I understand correctly, this could print invalid state as stack traces.

Let's say there are the following session handle chains in trackedSessionHandles when the client gets errGetSessionTimeout. And same time, let's say shA was recycled by other goroutine.

shA -> shB -> shC

Then valid state would be either shA -> shB -> shC or shB -> shC depending on timinig.

But if we allow other goroutine to get shA removed from trackedSessionHandles, then the returned value from p.trackedSessionHandles.Front() would loose next pointer as Go's list.List.Remove removes the next pointer from the element. So the stack trace would be just shA for this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, I updated the PR @yfuruyama can you check if its ok

@@ -72,7 +72,9 @@ func (sh *sessionHandle) recycle() {
}
p := sh.session.pool
tracked := sh.trackedSessionHandle
sh.mu.Unlock()
sh.session.recycle()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong, but is there any possibility that sessionHandle.destory() is called during sh.session.recycle() is being called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, as per my understanding destroy is only called when isSessionNotFoundError is returned from server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you find any issue with it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I was just wondering if it's ok to release a lock during sh.session.recycle() call. It looks good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be safer and clearer if you would just store sh.session in a local variable s while the lock is still taken, and then after (original) line 80, call s.recycle(). So you get something like:

	p := sh.session.pool
	tracked := sh.trackedSessionHandle
	s := sh.session
	sh.session = nil
	sh.trackedSessionHandle = nil
	sh.checkoutTime = time.Time{}
	sh.stack = nil
	sh.mu.Unlock()
        s.recycle()

@rahul2393 rahul2393 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 23, 2022
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 23, 2022
Copy link
Contributor

@ansh0l ansh0l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@olavloite olavloite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution itself LGTM, but take a look at my suggestion to reorganize the function a little to avoid releasing and taking the lock multiple times.

@rahul2393 rahul2393 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 23, 2022
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 23, 2022
@rahul2393 rahul2393 merged commit b007836 into googleapis:main Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. size: xs Pull request size is extra small.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

spanner: Enabling TrackSessionHandles causes deadlock
5 participants