Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanner: Deleted sessions are not removed from the session pool #5827

Closed
taka-oyama opened this issue Jan 27, 2023 · 6 comments · Fixed by #5853
Closed

Spanner: Deleted sessions are not removed from the session pool #5827

taka-oyama opened this issue Jan 27, 2023 · 6 comments · Fixed by #5853
Assignees
Labels
api: spanner Issues related to the Spanner API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@taka-oyama
Copy link
Contributor

taka-oyama commented Jan 27, 2023

Currently, the all sessions stay in the session pool and is not deleted until it expires.
This causes problems when the server side deletes the sessions..

The code keeps reusing the non-existent session and will keep throwing the exception below.

Fatal error: Uncaught Google\Cloud\Core\Exception\NotFoundException: {
    "message": "Session not found: {...}",
    "code": 5,
    "status": "NOT_FOUND",
    "details": [
        {
            "@type": "google.rpc.resourceinfo-bin",
            "resourceType": "type.googleapis.com\/google.spanner.v1.Session",
            "resourceName": "{...}",
            "owner": "",
            "description": "Session does not exist."
        },
        {
            "@type": "grpc-status-details-bin",
            "data": "<Unknown Binary Data>"
        }
    ]
} in /app/vendor/google/cloud-core/src/GrpcRequestWrapper.php:257

I think "Session not found exception"s should trigger code that will delete the stale session from the session pool.

I have prepared a sample code to replicate the issue.

Environment details

  • OS: Alpine Linux 3.16
  • PHP version: PHP 8.1.13
  • Package name and version: google/cloud-spanner: 1.54.0

Steps to reproduce

  1. git clone [email protected]:taka-oyama/google-cloud-php-test.git
  2. git checkout test/spanner-session-not-found
  3. Rename .env.sample to .env and configure it to match your setup.
  4. Run docker-compose build && docker-compose run app -> this will start a bash session inside the container.
  5. Run php session_use.php to create a session a run a single query.
  6. Run php session_delete.php to delete the session on the server (but not from the session pool)
  7. Run php session_use.php again.
  8. Repeat 7 to keep seeing the same error.
@product-auto-label product-auto-label bot added the api: spanner Issues related to the Spanner API. label Jan 27, 2023
@vishwarajanand
Copy link
Contributor

vishwarajanand commented Jan 30, 2023

Hi @taka-oyama, thanks for raising this issue.
CacheSessionPool by design can have server-deleted sessions and one might see failures like Session Not Found. You can call CacheSessionPool::maintain(..) every couple mins (say ~10?) to ensure that the sessions are always ready to use (not orphaned). Related discussions: #2411

Does this workaround sound good to you?

@vishwarajanand vishwarajanand added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Jan 30, 2023
@taka-oyama
Copy link
Contributor Author

taka-oyama commented Jan 31, 2023

We already tried using CacheSessionPool::maintain(..) and still got the error.

I don't think CacheSessionPool::maintain(..) covers the following case.

  • The Spanner database service may delete a session if the session is more than 28 days old.

In our case, we have a dequeue worker running for long extended periods and this was causing our workers to go into a unrecoverable state in which the only way to recover from it was to restart the server.

@vishwarajanand
Copy link
Contributor

Does something like this work for your usecase?
#5853

@taka-oyama
Copy link
Contributor Author

taka-oyama commented Feb 6, 2023

Thanks for implementing a fix.

It should, but this is really difficult to test...

The only way we can is to wait 28 days...?

@vishwarajanand
Copy link
Contributor

@taka-oyama waiting for 28d is not practical.
I have added UTs including testMaintainServerDeletedSessions which sort of mimics the scenarios for us.

@taka-oyama
Copy link
Contributor Author

OK. Thanks.

I think this will work for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
3 participants