Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locking issues are preventing continued operation with Spanner #23

Closed
jshannon63 opened this issue Jan 14, 2021 · 7 comments
Closed

Locking issues are preventing continued operation with Spanner #23

jshannon63 opened this issue Jan 14, 2021 · 7 comments

Comments

@jshannon63
Copy link

jshannon63 commented Jan 14, 2021

We consistently have issues with locking related to our spanner db access. Seeming like these are related to queued jobs. The issues tend to disappear when we use the sync driver for the queued job processing. Here is an example of one of our exception traces which shows the result but not the likely cause of the issue.

Next Illuminate\Database\QueryException: Failed to open lock file. (SQL: select * from `users` where `id` = @p0 limit 1) in /nix/store/krcnq71n6yh724z8dq11lr68lq7gvbwf-composer-laravel-laravel/vendor/colopl/laravel-spanner/src/Colopl/Spanner/Connection.php:441
Stack trace:
#0 /nix/store/krcnq71n6yh724z8dq11lr68lq7gvbwf-composer-laravel-laravel/vendor/laravel/framework/src/Illuminate/Database/Connection.php(629): Colopl\Spanner\Connection->runQueryCallback('select * from `...', Array, Object(Closure))
#1 /nix/store/krcnq71n6yh724z8dq11lr68lq7gvbwf-composer-laravel-laravel/vendor/colopl/laravel-spanner/src/Colopl/Spanner/Connection.php(250): Illuminate\Database\Connection->run('select * from `...', Array, Object(Closure))
#2 /nix/store/krcnq71n6yh724z8dq11lr68lq7gvbwf-composer-laravel-laravel/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(2149): Colopl\Spanner\Connection->select('select * from `...', Array, true)

Any help in understanding and overcoming this issue would be appreciated.

@taka-oyama
Copy link
Collaborator

And are you trying to update the same user multiple times?
And Is this query run within a transaction?

If so, this is probably the correct behavior since Spanner will lock the row if you select it in a read-write transaction.

@jshannon63
Copy link
Author

jshannon63 commented Jan 15, 2021

Where this is happening is within a transaction, but only a single insert statement within the transaction (this is in a custom auth middleware). It only happens when we have background jobs running. The jobs do not write to the users table, only read from it. If we change the jobs to run synchronous then the issue goes away. It seems we only ever have issues with locking when running jobs, and then we have lock issues elsewhere in the code which may be related.

@taka-oyama
Copy link
Collaborator

I believe reading will still lock the row in a read-write transaction.

QueryException should have an underlying exception.
I need to see that in order to understand it further.

You might want to try looking at the lock statistics to see what's happening on the server side.

https://cloud.google.com/spanner/docs/introspection/lock-statistics

@jshannon63
Copy link
Author

jshannon63 commented Feb 20, 2021

We have averted our issue temporarily by sync driver for jobs processing. In our many attempts to utilize jobs for background processing, we have failed with this library due to lock conditions for same row jobs. We even wrote a non overlapping job Middleware to avoid this issue, but at the end of the day, it is really no different than the sync driver. If time allows in the future, I may attempt a correct approach to conquer this problem... but for now it will be synchronous processing.

@taka-oyama
Copy link
Collaborator

I have no way of reproducing this so I'll go ahead and close it for now.
Please send a pull request if you have a solution to this problem.
Thanks.

@taka-oyama
Copy link
Collaborator

I found out why this is the case.
This has to do with the lock file is being written as a file to /tmp/ using Google\Cloud\Core\Lock\FlockLock.

This can happen in the following scenario.

  1. root writes to /tmp to acquire a lock (lock file is written by root with permission set to 644)
  2. www-data tries to access that lock at a later time
  3. www-data does not have the permission to access the file and gets the above error.

The reason I could not reproduce this was because my env used SemaphoreLock, which is the default if you have sysvmsg, sysvsem and sysvshm installed.

So I would recommend that you install the sysv extensions.

@jshannon63
Copy link
Author

jshannon63 commented Oct 25, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants