-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spanner: transaction fails with Session not found
error
#1527
Comments
The client library creates and maintains a session pool, and takes measures to keep these sessions alive by doing a This could be mitigated by retrying failed transactions and other server calls that operate on a session by first taking a new session and then retrying. This is however not complete straightforward, as there are roughly 4 different categories of calls that we would need to consider in order of increasing complexity:
|
@110y Do you have any specific use case that reproduces this problem more frequently than others (or even always)? |
Unfortunately, I do not understand why this problem occurs... One of the my case is:
The document says:
and, as far as my understanding is correct, if we are continuously sending requests to the spanner then there are no idle sessions for more than 1 hour since this library manages sessions in FIFO manner and used session will be pushed back to the session pool. That's why I'm wondering this problem and guessing the spanner deletes sessions even though idle time is not over 1 hour. Do you have any thoughts or suggestions? |
@110y To my understanding, it is possible (although not very common) that Cloud Spanner deletes sessions that have been idle for less than 1 hour, which could cause this problem. The reason I asked whether you had a specific use case that would always (or often) cause this problem, was to check whether you were running into some unknown bug in the session pool. Considering your error rate of 1-2 errors per week at 1QPS, I don't think that this is a specific bug in the Go session pool. The Java client library for Cloud Spanner added a protection against this problem a couple of months ago along the lines that I mentioned above. I'll have a look to see if it is feasible to add this protection for the Go client library as well. |
As trial, I send a CL that makes |
Session not found errors should be retried by taking a new session from the pool and retrying the gRPC call when that is possible. This change fixes this for multi-use read-only transactions when the error occurs on the BeginTransaction call. This method can safely be retried on a new session, as the user has not yet been able to execute any queries on the transaction yet. This is also the most probable moment for a Session not found error to occur for a multi-use read-only transaction. A Session not found error halfway through a read-only multi use transaction could in theory be mitigated by starting a new read-only transaction on a new session with the same read timestamp as the transaction that produced the error, retry the query that returned the error on the new transaction and update the internal transaction reference to ensure any future queries that the user executes on the transaction will use the new transaction. This is not included in this change. Updates #1527. Change-Id: Ibc48e558bf07e8066996c6aaad864c4450abae66 Reviewed-on: https://code-review.googlesource.com/c/gocloud/+/44051 Reviewed-by: kokoro <[email protected]> Reviewed-by: Emmanuel Odeke <[email protected]>
Could you please take a look this CL which fix the problem for ReadWriteTransaction? |
@110y |
I've updated my CL based on your comments. |
@olavloite We still suffered from this error frequently. It happens hundreds of times everyday. What's the status of this issue? |
I've fixed my CL: https://code-review.googlesource.com/c/gocloud/+/45910/ to follow latest master. |
@110y and @kazegusuri Thanks for updating your CL and sorry for taking so long to merge this. I'll have a look at this this morning and try to get it in ASAP. |
By the way, do you have a plan to revive this CL which make ROTxn retry on SessionNotFound error? |
@110y Yes (and additional transaction types). |
Updates #1527 Ref: #1527 Change-Id: Iea12342ca098c8056abc2206b91edbeda630e718 Reviewed-on: https://code-review.googlesource.com/c/gocloud/+/45910 Reviewed-by: kokoro <[email protected]> Reviewed-by: Hengfeng Li <[email protected]>
'Session not found' errors on BeginTransaction calls for a read-only transaction should be retried on a new session, and the invalid session should be removed from the session pool. Updates #1527. Change-Id: I49a6cb5e096c8b93c7aec76cdbd1c3d640f50c0d Reviewed-on: https://code-review.googlesource.com/c/gocloud/+/50510 Reviewed-by: kokoro <[email protected]> Reviewed-by: Hengfeng Li <[email protected]>
Single use read-only transactions should be retried if the query returns a 'Session not found' error. These queries can safely be retried on a different session as there is no transaction atomicity that can be violated. The retry must be executed by the partial result sets stream, as the query is sent to the server at the first call to RowIterator.Next. Updates #1527. Change-Id: Ia3c77643e42adb2f88dba2cfd5d1bba7f9dbf3be Reviewed-on: https://code-review.googlesource.com/c/gocloud/+/50511 Reviewed-by: Hengfeng Li <[email protected]>
@olavloite Thanks you for handling issues about spanner. It seems many issues are fixed since last spanner client release as v1.1.0. Do you have a plan to cut a new release for spanner? |
@hengfengli Do you know if there is a date planned for the next release? |
I'll ask @skuruppu and make a release soon. |
@hengfengli, yes please cut a release. |
The release has been cut so closing the issue. Please refer to the release notes. |
@skuruppu I still got |
@kanekv The protection against So if you have a test that explicitly deletes a session on the emulator, and then tries to use that session in the client, then this error is explainable. |
@olavloite Not sure why it happened, I didn't run tests, I had an app connected to emulator instance and after coming back in a couple of hours (making a request after some idle time) it started throwing this error. May be it is indeed specific to emulator. |
@kanekv Thanks for the quick reply. That is interesting information, though. This seems to indicate that the client library is not keeping sessions alive on the emulator. In addition to the |
Updates #1527 Ref: googleapis/google-cloud-go#1527 Change-Id: Iea12342ca098c8056abc2206b91edbeda630e718 Reviewed-on: https://code-review.googlesource.com/c/gocloud/+/45910 Reviewed-by: kokoro <[email protected]> Reviewed-by: Hengfeng Li <[email protected]>
Client
Spanner:
v0.37.4
(But as far as I know, it also occurs when we use latest version)Describe Your Environment
Alpine Docker on GKE
Expected Behavior
If possible, this client library should retries a transaction when it fails with
Session not found
error.Actual Behavior
We sometimes get the
Session not found
errors.This client library retries transactions only when the
Abort
error occurred, so when taken session is not active, it just returns theNotFound
error to callers without retrying.google-cloud-go/spanner/client.go
Lines 394 to 398 in e4bd323
google-cloud-go/spanner/client.go
Lines 439 to 454 in e4bd323
Since this library creates the pool of the sessions, I think it should retry failed transactions caused by
Session not found
by taking another session. But are there any problems to take another session from the pool?The text was updated successfully, but these errors were encountered: