Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a thread-safe implementation of cached_property #1316

Merged
merged 6 commits into from
Apr 1, 2021

Conversation

thedrow
Copy link
Member

@thedrow thedrow commented Mar 15, 2021

Fixes #1303.

@thedrow thedrow added this to the 5.1.0 milestone Mar 15, 2021
@thedrow thedrow self-assigned this Mar 15, 2021
@thedrow thedrow requested a review from matusvalo March 15, 2021 15:04
@lgtm-com
Copy link

lgtm-com bot commented Mar 15, 2021

This pull request introduces 2 alerts when merging 8948c86 into c7068ef - view on LGTM.com

new alerts:

  • 2 for Unused import

@thedrow thedrow force-pushed the thread-safe-cached-property branch from 8948c86 to 923369f Compare March 15, 2021 15:29
@thedrow thedrow marked this pull request as draft March 15, 2021 15:38
@thedrow
Copy link
Member Author

thedrow commented Mar 15, 2021

It turns out that our cached_property implementation needs to support a setter as well.

https://github.com/celery/celery/blob/8c5e9888ae10288ae1b2113bdce6a4a41c47354b/celery/app/amqp.py#L569-L576

@lgtm-com
Copy link

lgtm-com bot commented Mar 15, 2021

This pull request introduces 2 alerts when merging 923369f into c7068ef - view on LGTM.com

new alerts:

  • 2 for Unused import

@thedrow thedrow marked this pull request as ready for review March 17, 2021 14:15
@thedrow thedrow requested a review from a team March 17, 2021 14:34
auvipy
auvipy previously approved these changes Mar 18, 2021
Comment on lines +5 to +11
try:
from functools import _NOT_FOUND, cached_property as _cached_property
except ImportError:
# TODO: Remove this fallback once we drop support for Python < 3.8
from cached_property import threaded_cached_property as _cached_property

Caches the return value of the get method on first call.
_NOT_FOUND = object()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion]: Create local NOT_FOUND sentinel class instead of using a non-public class from functools.

Looks like _NOT_FOUND is a private class used in the functools module. We should not be using it. Instead we can define our own sentinel class:

class MISSING: pass

# ...
# Use it in __delete__
if self.__del and value is not MISSING:
    self.__del(instance, value)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we inherit from the stdlib cached_property we can't use our own sentinel, right?
It's already using _NOT_FOUND.

Comment on lines +29 to +38
def __set__(self, instance, value):
if instance is None:
return self
if self.__set is not None:
value = self.__set(obj, value)
obj.__dict__[self.__name__] = value

def __delete__(self, obj, _sentinel=object()):
if obj is None:
with self.lock:
if self.__set is not None:
value = self.__set(instance, value)

cache = instance.__dict__
cache[self.attrname] = value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question]: Why do we have to override set? functools.cached_property and cached_property both implement only get to calculate the value only once.

Do we really need to re-define the value on cached properties?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that's how it's implemented and used in Celery.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xirdneh Any thoughts?

Copy link
Member

@matusvalo matusvalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether this PR is fixing the issue. The problem of cached_property in celery is that the cache is shared between the threads. Current PR is adding just locks but it lacks storing the cache in thread local storage.

>>> from kombu.utils import cached_property
>>> from threading import Thread
>>>
>>> class Foo:
...     @cached_property
...     def foo(self):
...         print('foo')
...         return 5
...
>>> foo = Foo()
>>>
>>> def call_foo():
...     foo.foo
...
>>>
>>> Thread(target=call_foo).start()
foo
>>> foo.foo        # Here we should see `foo` printed out since the cache should not be hit since it is called from different thread than previous call.
5

@thedrow
Copy link
Member Author

thedrow commented Mar 23, 2021

Wait, we need thread-local storage instead of using locks?
Is this always the case?

@matusvalo
Copy link
Member

To be honest it is difficult to say but in Celery there is a lot of usage of cached_property and at least sometimes it is causing that threads are using global resource which should be thread-local see following comment: celery/celery#5145 (comment) or my previous fix: celery/celery#6416.

Rationale is following:

  1. if celery is using multiprocessing parallelisation, each process has 1 thread storing in thread local storage => this should be equal to current state when it is stored as global variable. So moving to thread-local storage should not break this scenario.
  2. if celery is using multithreading parallelisation, each thread will have it's own resource which should fix problems when there is resource shared which should not shared.

@matusvalo
Copy link
Member

I was thinking more about cachedproperty and I have changed my mind. cachedproperty is storing cached information to the instance of the method. This is correct behavior and we should keep it that way. In celery, we have mainly problem when there is one singleton shared across threads and it contains methods with cachedproperty decorator. This is dangerous because we can end up using single resource with multiple thread - see celery/celery#6416..

So I am aproving this PR. @thedrow the only change I am suggesting is adding docstring to the cachedproperty class explaning where are cached data stored - in the instance.

@thedrow
Copy link
Member Author

thedrow commented Mar 25, 2021

We can add a version of cached_property which is thread-local later on.

@thedrow thedrow merged commit 5605182 into master Apr 1, 2021
@thedrow thedrow deleted the thread-safe-cached-property branch April 1, 2021 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kombu.utils.objects.cached_property is not thread safe.
4 participants