-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(spanner): early unlock of session pool lock during dumping the tracked session handles to avoid deadlock #5777
Conversation
…acked session handles to avoid deadlock
spanner/session.go
Outdated
defer p.mu.Unlock() | ||
element := p.trackedSessionHandles.Front() | ||
// NOTE: No need to keep the sessionPool lock since we don't modify it, | ||
// this will return the snapshot of current elements when error getting sessions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trackedSessionHandles
could be being modified in different places like here. https://github.com/googleapis/google-cloud-go/blob/spanner/v1.30.0/spanner/session.go#L83
How do we guarantee that we will dump a consistent snapshot of trackedSessionHandles
without locking p.mu
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think getTrackedSessionHandleStacksLocked is only called when the client gets errGetSessionTimeout
, so even if trackedSessionHandles later gets updated as a customer I want to know what happen when errGetSessionTimeout was thrown and not the current snapshot, so I think we should release the lock as soon as we read trackedSessionHandles.Front()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codyoss can you please help review this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rahul2393 If I understand correctly, this could print invalid state as stack traces.
Let's say there are the following session handle chains in trackedSessionHandles
when the client gets errGetSessionTimeout
. And same time, let's say shA
was recycled by other goroutine.
shA -> shB -> shC
Then valid state would be either shA -> shB -> shC
or shB -> shC
depending on timinig.
But if we allow other goroutine to get shA
removed from trackedSessionHandles
, then the returned value from p.trackedSessionHandles.Front()
would loose next
pointer as Go's list.List.Remove removes the next pointer from the element. So the stack trace would be just shA
for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, I updated the PR @yfuruyama can you check if its ok
spanner/session.go
Outdated
@@ -72,7 +72,9 @@ func (sh *sessionHandle) recycle() { | |||
} | |||
p := sh.session.pool | |||
tracked := sh.trackedSessionHandle | |||
sh.mu.Unlock() | |||
sh.session.recycle() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong, but is there any possibility that sessionHandle.destory() is called during sh.session.recycle()
is being called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, as per my understanding destroy is only called when isSessionNotFoundError is returned from server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you find any issue with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I was just wondering if it's ok to release a lock during sh.session.recycle()
call. It looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be safer and clearer if you would just store sh.session
in a local variable s
while the lock is still taken, and then after (original) line 80, call s.recycle()
. So you get something like:
p := sh.session.pool
tracked := sh.trackedSessionHandle
s := sh.session
sh.session = nil
sh.trackedSessionHandle = nil
sh.checkoutTime = time.Time{}
sh.stack = nil
sh.mu.Unlock()
s.recycle()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution itself LGTM, but take a look at my suggestion to reorganize the function a little to avoid releasing and taking the lock multiple times.
Fixes #5710.