Locking issues are preventing continued operation with Spanner #23

jshannon63 · 2021-01-14T16:38:54Z

We consistently have issues with locking related to our spanner db access. Seeming like these are related to queued jobs. The issues tend to disappear when we use the sync driver for the queued job processing. Here is an example of one of our exception traces which shows the result but not the likely cause of the issue.

Next Illuminate\Database\QueryException: Failed to open lock file. (SQL: select * from `users` where `id` = @p0 limit 1) in /nix/store/krcnq71n6yh724z8dq11lr68lq7gvbwf-composer-laravel-laravel/vendor/colopl/laravel-spanner/src/Colopl/Spanner/Connection.php:441
Stack trace:
#0 /nix/store/krcnq71n6yh724z8dq11lr68lq7gvbwf-composer-laravel-laravel/vendor/laravel/framework/src/Illuminate/Database/Connection.php(629): Colopl\Spanner\Connection->runQueryCallback('select * from `...', Array, Object(Closure))
#1 /nix/store/krcnq71n6yh724z8dq11lr68lq7gvbwf-composer-laravel-laravel/vendor/colopl/laravel-spanner/src/Colopl/Spanner/Connection.php(250): Illuminate\Database\Connection->run('select * from `...', Array, Object(Closure))
#2 /nix/store/krcnq71n6yh724z8dq11lr68lq7gvbwf-composer-laravel-laravel/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php(2149): Colopl\Spanner\Connection->select('select * from `...', Array, true)

Any help in understanding and overcoming this issue would be appreciated.

The text was updated successfully, but these errors were encountered:

taka-oyama · 2021-01-15T00:42:26Z

And are you trying to update the same user multiple times?
And Is this query run within a transaction?

If so, this is probably the correct behavior since Spanner will lock the row if you select it in a read-write transaction.

jshannon63 · 2021-01-15T14:36:45Z

Where this is happening is within a transaction, but only a single insert statement within the transaction (this is in a custom auth middleware). It only happens when we have background jobs running. The jobs do not write to the users table, only read from it. If we change the jobs to run synchronous then the issue goes away. It seems we only ever have issues with locking when running jobs, and then we have lock issues elsewhere in the code which may be related.

taka-oyama · 2021-01-15T23:25:39Z

I believe reading will still lock the row in a read-write transaction.

QueryException should have an underlying exception.
I need to see that in order to understand it further.

You might want to try looking at the lock statistics to see what's happening on the server side.

https://cloud.google.com/spanner/docs/introspection/lock-statistics

jshannon63 · 2021-02-20T21:11:20Z

We have averted our issue temporarily by sync driver for jobs processing. In our many attempts to utilize jobs for background processing, we have failed with this library due to lock conditions for same row jobs. We even wrote a non overlapping job Middleware to avoid this issue, but at the end of the day, it is really no different than the sync driver. If time allows in the future, I may attempt a correct approach to conquer this problem... but for now it will be synchronous processing.

taka-oyama · 2021-03-11T03:09:42Z

I have no way of reproducing this so I'll go ahead and close it for now.
Please send a pull request if you have a solution to this problem.
Thanks.

taka-oyama · 2021-10-25T09:33:34Z

I found out why this is the case.
This has to do with the lock file is being written as a file to /tmp/ using Google\Cloud\Core\Lock\FlockLock.

This can happen in the following scenario.

root writes to /tmp to acquire a lock (lock file is written by root with permission set to 644)
www-data tries to access that lock at a later time
www-data does not have the permission to access the file and gets the above error.

The reason I could not reproduce this was because my env used SemaphoreLock, which is the default if you have sysvmsg, sysvsem and sysvshm installed.

So I would recommend that you install the sysv extensions.

jshannon63 · 2021-10-25T12:40:37Z

Thank you very much for getting back with us. We did in fact activate semaphore locking using the shared memory extensions for sysv and solved the issue. I feel like many people do not experience this for the reasons you mentioned. We are running on horizontally scaled Google AppEngine instances that were very specifically configured and did not have those extensions installed. Thank you for the work on your spanner library. It is very much appreciated!

…

On Mon, Oct 25, 2021 at 5:33 AM Takayasu Oyama ***@***.***> wrote: I found out why this is the case. This has to do with the lock file is being written as a file to /tmp/ using Google\Cloud\Core\Lock\FlockLock. This can happen in the following scenario. 1. root writes to /tmp to acquire a lock (lock file is written by root with permission set to 644) 2. www-data tries to access that lock at a later time 3. www-data does not have the permission to access the file and gets the above error. The reason I could not reproduce this was because my env used SemaphoreLock, which is the default if you have sysvmsg, sysvsem and sysvshm installed. So I would recommend that you install the sysv extensions. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#23 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEVPXD2QCYAYZO4YUILPT3LUIUP7RANCNFSM4WCWL7RA> .

taka-oyama closed this as completed Mar 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Locking issues are preventing continued operation with Spanner #23

Locking issues are preventing continued operation with Spanner #23

jshannon63 commented Jan 14, 2021 •

edited

Loading

taka-oyama commented Jan 15, 2021

jshannon63 commented Jan 15, 2021 •

edited

Loading

taka-oyama commented Jan 15, 2021

jshannon63 commented Feb 20, 2021 •

edited

Loading

taka-oyama commented Mar 11, 2021

taka-oyama commented Oct 25, 2021

jshannon63 commented Oct 25, 2021 via email •

edited

Loading

Locking issues are preventing continued operation with Spanner #23

Locking issues are preventing continued operation with Spanner #23

Comments

jshannon63 commented Jan 14, 2021 • edited Loading

taka-oyama commented Jan 15, 2021

jshannon63 commented Jan 15, 2021 • edited Loading

taka-oyama commented Jan 15, 2021

jshannon63 commented Feb 20, 2021 • edited Loading

taka-oyama commented Mar 11, 2021

taka-oyama commented Oct 25, 2021

jshannon63 commented Oct 25, 2021 via email • edited Loading

jshannon63 commented Jan 14, 2021 •

edited

Loading

jshannon63 commented Jan 15, 2021 •

edited

Loading

jshannon63 commented Feb 20, 2021 •

edited

Loading

jshannon63 commented Oct 25, 2021 via email •

edited

Loading