Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: prevent overwrite of cache lock value #667

Merged
merged 14 commits into from
Jul 20, 2021

Conversation

chrisrossi
Copy link
Contributor

@chrisrossi chrisrossi commented Jun 2, 2021

Fixes #651 #652 #653

@chrisrossi chrisrossi requested a review from andrewsg as a code owner June 2, 2021 20:58
@chrisrossi chrisrossi requested a review from a team June 2, 2021 20:58
@chrisrossi chrisrossi requested a review from a team as a code owner June 2, 2021 20:58
@product-auto-label product-auto-label bot added the api: datastore Issues related to the googleapis/python-ndb API. label Jun 2, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jun 2, 2021
@chrisrossi
Copy link
Contributor Author

chrisrossi commented Jun 2, 2021

Note, that this is currently based on PR #665 (the fix-656 branch).

@chrisrossi
Copy link
Contributor Author

The spelling is a little different and it's less memcached specific, but this is more or less the solution proposed in #651.

@@ -146,8 +146,15 @@ def lookup(key, options):
entity_pb.MergeFromString(result)

elif use_datastore:
yield _cache.global_lock(cache_key, read=True)
yield _cache.global_watch(cache_key)
lock_acquired = yield _cache.global_lock(cache_key, read=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transient errors that can't resolve and result in an exception should probably just be caught here and key_locked set to True as well, no need to bubble them up and fail things

Copy link

@justinkwaugh justinkwaugh Jun 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In writing this comment I just remembered that handle_transient_errors doesn't raise for read operations, which is the source of the problem described in #652, but which when addressed would then want a try/except here as described above. I had implemented a parameter to handle_transient_errors which would force it to raise in all cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ebb2a1c

I believe this also addresses #652

Copy link

@justinkwaugh justinkwaugh Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed because it will return None, however I do think a try/except is appropriate for non-transient errors. Just because something fails with the cache even catastrophically should not prevent a read thread from going to the db for the value, it should just prevent it from writing back to the cache.

result = cache_call.result()
if result:
for key, future in self.futures.items():
key_result = result.get(key, None)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default value be None or False in the case of set_if_not_exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. In the case of set_if_not_exists, there will be a result for the key, so the default won't be used.

Copy link

@justinkwaugh justinkwaugh Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it's more of an api contractual thing. Both setnx and memcache.add return true if it was set and false if it was not. So what does None mean? If it is just to be considered Falsey then it is probably ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is also used by set not just set_if_not_exists. In the case of set_if_not_exists, it will always follow the path that eventually returns a boolean value. None won't ever be returned for set_if_not_exists.

future.set_result(key_result)
else:
for future in self.futures.values():
future.set_result(None)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here... None or False for set_if_not_exists

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. In the case of set_if_not_exists, there will be a result.

google/cloud/ndb/_cache.py Show resolved Hide resolved
google/cloud/ndb/global_cache.py Outdated Show resolved Hide resolved
@chrisrossi
Copy link
Contributor Author

I think the situation is much improved, but I also think there's still a way for things to get out of sync. Namely, in _datastore_api.lookup, if a key is updated by another thread between the call to _cache.global_lock and _cache.global_watch, we could end up watching a value that we didn't set. If my reasoning is off, here, let me know.

I think to solve this, we can probably modify the watch command to take the value we just set as an expected value so we can make sure when we watch it, that the key is still set to that value. I think it may also be necessary to change the lock value to be unique per thread (using uuid, or similar) so that threads can distinguish their own lock from a lock from another thread.

@justinkwaugh
Copy link

justinkwaugh commented Jun 8, 2021

Yes, and changing the way the lock value works would align with the needs of #653. In reality I don't think it would matter if two read threads conflict in the way you described as it is a race condition that either should be allowed to win, but should a write thread acquire a lock no read should ever watch it. And then for write threads it is ok for multiple to lock as long as it is not considered unlocked until the last write thread finishes.

@chrisrossi
Copy link
Contributor Author

Yes, and changing the way the lock value works would align with the needs of #653. In reality I don't think it would matter if two read threads conflict in the way you described as it is a race condition that either should be allowed to win, but should a write thread acquire a lock no read should ever watch it.

Exactly. And it looks like it would address #653 as well.

@chrisrossi
Copy link
Contributor Author

Possibly instead of a unique lock value per thread, we could just have two lock values, one for read and one for write.

@justinkwaugh
Copy link

justinkwaugh commented Jun 8, 2021

Possibly instead of a unique lock value per thread, we could just have two lock values, one for read and one for write.

The challenge is just whether or not you want write locks to be re-entrant (by multiple threads) or not. If so then they need to act like a semaphore to ensure that you only "unlock" at the end of all concurrent write threads finishing.

@justinkwaugh
Copy link

Thanks for doing all this work, with #653 fixed I believe it will have correct cache consistency, which is a far cry from what it was a few weeks ago.

@chrisrossi
Copy link
Contributor Author

Possibly instead of a unique lock value per thread, we could just have two lock values, one for read and one for write.

The challenge is just whether or not you want write locks to be re-entrant (by multiple threads) or not. If so then they need to act like a semaphore to ensure that you only "unlock" at the end of all concurrent write threads finishing.

Right another idea I had was instead of deleting, using CAS to write an "unlocked" or "deleted" value,

@chrisrossi chrisrossi closed this Jun 9, 2021
@chrisrossi
Copy link
Contributor Author

Didn't mean to close.

@chrisrossi chrisrossi reopened this Jun 9, 2021
@chrisrossi chrisrossi marked this pull request as draft June 9, 2021 19:07
@justinkwaugh
Copy link

justinkwaugh commented Jun 9, 2021

Right another idea I had was instead of deleting, using CAS to write an "unlocked" or "deleted" value,

That can only work if CAS is also used to set the lock value, which is a pessimistic sort of lock, otherwise one write thread can lock and then another write thread can lock and unlock before the first is done. In reality we are not trying to prevent write threads from concurrent locking, just trying to ensure that the cache is locked to prevent read threads from updating it for the duration of any write operation / transaction.

@chrisrossi chrisrossi marked this pull request as ready for review June 15, 2021 20:31
@tseaver tseaver mentioned this pull request Jul 19, 2021
@chrisrossi
Copy link
Contributor Author

How is this scenario handled:

  • Reader 1 gets a cache miss.
  • Reader 1 writes a read lock.
  • Writer writes a write lock.
  • Reader 1 reads value v1 from data-store.
  • Writer writes value v2 to data-store.
  • Writer deletes the write lock.
  • Reader 2 gets a cache miss.
  • Reader 2 writes a read lock.
  • Reader 1 writes stale v1 to the data-store. CAS allows this because
    the value is the same.

It looks like we need read lock ids.

BTW, I think there should be tests that exercise scenarios like this
and those identified in the issues.

FTR, I thought of this scenario while writing locking documentation.
I'm a huge fan of documentation-driven development. :)

I'll start thinking about some integration tests specifically to target these scenarios. Naming them will be fun.

@jimfulton
Copy link
Contributor

How is this scenario handled:
...
I'll start thinking about some integration tests specifically to target these scenarios. Naming them will be fun.

I was mainly pointing out that I think there's a bug. Do you think the scenario I mentioned is handled correctly?

@justinkwaugh
Copy link

justinkwaugh commented Jul 19, 2021 via email

@justinkwaugh
Copy link

justinkwaugh commented Jul 19, 2021 via email

@jimfulton
Copy link
Contributor

What I should say is even if a value that is the same is added, because it was deleted and re added it will have a new cas value

I am not a Redis expert, so I'm willing to believe that at some low level, Redis assigns watches ids that can be used to detect something like this :) , however, there's no such watch id floating around in the ndb Python code (or in the Python redis API AFAICT). There are just keys and values.

Here's a script that demonstrates the problem:

from google.cloud.ndb.global_cache import RedisCache
from google.cloud.ndb import _cache
from google.cloud.ndb import Client

cache = RedisCache.from_environment()
client = Client()
context = client.context(global_cache=cache)
context.__enter__()

cache_key = 'key'
cache.delete([cache_key]) # Make sure it's absent

# Read client 1
assert _cache.global_get(cache_key).result() is None
read_lock = _cache.global_lock_for_read(cache_key).result()
assert read_lock == _cache._LOCKED_FOR_READ
assert _cache.global_watch(cache_key, read_lock).result() is None
# Read b'foo' from datastore

# Write client
write_lock = _cache.global_lock_for_write(cache_key).result()
assert _cache.global_get(cache_key).result() == _cache._LOCKED_FOR_WRITE + write_lock
# write b'bar' to datastore
assert _cache.global_unlock_for_write(cache_key, write_lock).result() is None

# Read client 2
# Look ma, no lock!
assert _cache.global_get(cache_key).result() is None
read_lock2 = _cache.global_lock_for_read(cache_key).result()
assert read_lock2 == _cache._LOCKED_FOR_READ
assert _cache.global_watch(cache_key, read_lock2).result() is None

# Read client 1
_cache.global_compare_and_swap(cache_key, b'foo', expires=999).result()
assert _cache.global_get(cache_key).result() == b'foo'
# Cache now has old value

Am I missing something?

@justinkwaugh
Copy link

justinkwaugh commented Jul 20, 2021 via email

@justinkwaugh
Copy link

justinkwaugh commented Jul 20, 2021 via email

@chrisrossi
Copy link
Contributor Author

How is this scenario handled:
...
I'll start thinking about some integration tests specifically to target these scenarios. Naming them will be fun.

I was mainly pointing out that I think there's a bug. Do you think the scenario I mentioned is handled correctly?

Sorry, I meant to respond earlier. @justinkwaugh is correct about memcache. I was going to test Redis behavior this morning, as it's not clear what it would do.

As far as tests, I was just referring back to an earlier comment you made about testing all these complex scenarios, which had been nagging me as well. Obviously, orchestrating all the different parts to test the intersection of concurrency and fault tolerance is complex, which is why it hasn't been done yet. The more of these that come up, though, the more worthwhile it seems to go ahead and bite the bullet and try to figure it out.

If we're reasonably satisfied this PR is working, though, I can always make that a separate effort under a new Issue/PR.

@jimfulton
Copy link
Contributor

I am unfamiliar with redis and was looking at it from a memcache viewpoint which does use separate values. I’m a little surprised that WATCH/EXEC would not fail the transaction based on the documentation, but if it shown to not work then I guess not.

Even if memcache does make it possible to catch this, our global cache API, GlobalCache, doesn't allow us to leverage it.

Anyway, the fix is easy.

I still think we need to find a way to test these scenarios (as my script "tests" the bug).

@chrisrossi
Copy link
Contributor Author

I am unfamiliar with redis and was looking at it from a memcache viewpoint which does use separate values. I’m a little surprised that WATCH/EXEC would not fail the transaction based on the documentation, but if it shown to not work then I guess not.

Even if memcache does make it possible to catch this, our global cache API, GlobalCache, doesn't allow us to leverage it.

Anyway, the fix is easy.

I still think we need to find a way to test these scenarios (as my script "tests" the bug).

I agree.

@justinkwaugh
Copy link

justinkwaugh commented Jul 20, 2021 via email

@jimfulton
Copy link
Contributor

...

Sorry, I meant to respond earlier. @justinkwaugh is correct about memcache. I was going to test Redis behavior this morning, as it's not clear what it would do.

See my script. I'm 97% sure it would demonstrate the same problem with memcache, as our API doesn't expose any sort of watch id (assuming that memcache has something like that).

As far as tests, I was just referring back to an earlier comment you made about testing all these complex scenarios, which had been nagging me as well. Obviously, orchestrating all the different parts to test the intersection of concurrency and fault tolerance is complex, which is why it hasn't been done yet. The more of these that come up, though, the more worthwhile it seems to go ahead and bite the bullet and try to figure it out.

This code is really complicated and the locking logic is spread out. We test bits and pieces of it in isolation and rely on reasoning for correctness. I don't really trust that approach.

It would be hard to test this at the user-facing level, because it would be too hard to control timing of calls... Although maybe not with the tests providing their own "event loop" logic.

I think my code was able to demonstrate a locking bug pretty easily. Maybe it would be possible to provide a higher-level test, using user-level calls if the tests controlled the event-loop/future evaluations. Let me know if you'd like me to take a crack at that.

If we're reasonably satisfied this PR is working, though, I can always make that a separate effort under a new Issue/PR.

I've demonstrated a bug. I defer whether to fix it in a separate PR.

@justinkwaugh
Copy link

justinkwaugh commented Jul 20, 2021 via email

@jimfulton
Copy link
Contributor

Note that your test is not actually using separate clients, or threads. Doing an EXEC command will automatically unwatch the keys, so a set of serial operations beyond that will not be part of the transaction

Good point!

Modifying the script to use different redis clients:

from google.cloud.ndb.global_cache import RedisCache
from google.cloud.ndb import _cache
from google.cloud.ndb import Client

cacher1 = RedisCache.from_environment()
cacher2 = RedisCache.from_environment()
cachew = RedisCache.from_environment()

cache_key = 'key'
cacher1.delete([cache_key]) # Make sure it's absent

client = Client()

with client.context(global_cache=cacher1):
    # Read client 1
    assert _cache.global_get(cache_key).result() is None
    read_lock = _cache.global_lock_for_read(cache_key).result()
    assert read_lock == _cache._LOCKED_FOR_READ
    assert _cache.global_watch(cache_key, read_lock).result() is None
    # Read b'foo' from datastore

with client.context(global_cache=cachew):
    # Write client
    write_lock = _cache.global_lock_for_write(cache_key).result()
    assert _cache.global_get(cache_key).result() == (
        _cache._LOCKED_FOR_WRITE + write_lock)
    # write b'bar' to datastore
    assert _cache.global_unlock_for_write(cache_key, write_lock).result() is None

with client.context(global_cache=cacher2):
    # Read client 2
    # Look ma, no lock!
    assert _cache.global_get(cache_key).result() is None
    read_lock2 = _cache.global_lock_for_read(cache_key).result()
    assert read_lock2 == _cache._LOCKED_FOR_READ
    assert _cache.global_watch(cache_key, read_lock2).result() is None

with client.context(global_cache=cacher1):
    # Read client 1
    r = _cache.global_compare_and_swap(cache_key, b'foo', expires=999).result()
    assert r  # Fails because r is False

The compare and swap fails. So really, it's the client itself that keeps track of the watch state internally.

So it looks like there isn't a bug, at least for Redis.

@justinkwaugh
Copy link

justinkwaugh commented Jul 20, 2021 via email

@chrisrossi
Copy link
Contributor Author

What I should say is even if a value that is the same is added, because it was deleted and re added it will have a new cas value

I am not a Redis expert, so I'm willing to believe that at some low level, Redis assigns watches ids that can be used to detect something like this :) , however, there's no such watch id floating around in the ndb Python code (or in the Python redis API AFAICT). There are just keys and values.

Here's a script that demonstrates the problem:

from google.cloud.ndb.global_cache import RedisCache
from google.cloud.ndb import _cache
from google.cloud.ndb import Client

cache = RedisCache.from_environment()
client = Client()
context = client.context(global_cache=cache)
context.__enter__()

cache_key = 'key'
cache.delete([cache_key]) # Make sure it's absent

# Read client 1
assert _cache.global_get(cache_key).result() is None
read_lock = _cache.global_lock_for_read(cache_key).result()
assert read_lock == _cache._LOCKED_FOR_READ
assert _cache.global_watch(cache_key, read_lock).result() is None
# Read b'foo' from datastore

# Write client
write_lock = _cache.global_lock_for_write(cache_key).result()
assert _cache.global_get(cache_key).result() == _cache._LOCKED_FOR_WRITE + write_lock
# write b'bar' to datastore
assert _cache.global_unlock_for_write(cache_key, write_lock).result() is None

# Read client 2
# Look ma, no lock!
assert _cache.global_get(cache_key).result() is None
read_lock2 = _cache.global_lock_for_read(cache_key).result()
assert read_lock2 == _cache._LOCKED_FOR_READ
assert _cache.global_watch(cache_key, read_lock2).result() is None

# Read client 1
_cache.global_compare_and_swap(cache_key, b'foo', expires=999).result()
assert _cache.global_get(cache_key).result() == b'foo'
# Cache now has old value

Am I missing something?

I think so, yes. This test doesn't really test concurrency because all of the "threads" are being run in the same context, so they're sharing the same threadlocal "RedisCache.pipes".

I think this demonstrates that Redis protects against concurrent writes of the same value, and the code currently does the Right Thing (tm):

import redis

client = redis.StrictRedis()
client.set(b"foo", b"boo")
assert client.get(b"foo") == b"boo"

# Normal case, no race condition
pipe = client.pipeline()
pipe.watch(b"foo")
pipe.multi()
pipe.set(b"foo", b"bar")
pipe.execute()
pipe.reset()

assert client.get(b"foo") == b"bar"

# Concurrently set the same value
pipe = client.pipeline()
pipe.watch(b"foo")
client.set(b"foo", b"baz")  # Set outside of watch pipeline
pipe.multi()
pipe.set(b"foo", b"baz")    # Set same value in watch pipeline

try:
    pipe.execute()
    assert False, "Shouldn't reach this line"

except redis.exceptions.WatchError:
    assert True, "Should raise WatchError"

finally:
    pipe.reset()

That said, I'm sympathetic to the idea that since we have a more or less pluggable architecture here that it would be better to not make this part of the contract that the cache implementers must fulfill, especially if the "fix" is relatively easy. I'm +0 on that, I suppose.

@chrisrossi
Copy link
Contributor Author

And just for good measure, here it is with two pipelines, more like what would actually happen irl:

import redis

client = redis.StrictRedis()
client.set(b"foo", b"boo")
assert client.get(b"foo") == b"boo"

# Normal case, no race condition
pipe = client.pipeline()
pipe.watch(b"foo")
pipe.multi()
pipe.set(b"foo", b"bar")
pipe.execute()
pipe.reset()

assert client.get(b"foo") == b"bar"

# Concurrently set the same value
pipe1 = client.pipeline()
pipe.watch(b"foo")

pipe2 = client.pipeline()
pipe2.watch(b"foo")

pipe1.multi()
pipe1.set(b"foo", b"baz")

pipe2.multi()
pipe2.set(b"foo", b"baz")

pipe1.execute()

try:
    pipe2.execute()
    assert False, "Shouldn't reach this line"

except redis.exceptions.WatchError:
    assert True, "Should raise WatchError"

pipe1.reset()
pipe2.reset()

assert client.get(b"foo") == b"baz"

@chrisrossi
Copy link
Contributor Author

See my script. I'm 97% sure it would demonstrate the same problem with memcache, as our API doesn't expose any sort of watch id (assuming that memcache has something like that).

If you look at the Memcache implementation, we get and keep track of a CAS id that comes from memcache. I'm pretty sure we don't have this problem with memcache. (Or Redis.)

This code is really complicated and the locking logic is spread out. We test bits and pieces of it in isolation and rely on reasoning for correctness. I don't really trust that approach.

It would be hard to test this at the user-facing level, because it would be too hard to control timing of calls... Although maybe not with the tests providing their own "event loop" logic.

I think my code was able to demonstrate a locking bug pretty easily. Maybe it would be possible to provide a higher-level test, using user-level calls if the tests controlled the event-loop/future evaluations. Let me know if you'd like me to take a crack at that.

Well, I'm not sure that it did. Testing this properly is actually a fairly hard problem.

@chrisrossi
Copy link
Contributor Author

Sorry for replying to already out of date comments. Yet another concurrency issue. ;-)

@chrisrossi
Copy link
Contributor Author

That said, I'm sympathetic to the idea that since we have a more or less pluggable architecture here that it would be better to not make this part of the contract that the cache implementers must fulfill, especially if the "fix" is relatively easy. I'm +0 on that, I suppose.

It was easy, so why not? b77c2b9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the googleapis/python-ndb API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cloud NDB - Get operation can blindly overwrite key lock in memcache leading to cache inconsistency
4 participants