-
-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed mutable default value in _inference_tip_cache #1139
Removed mutable default value in _inference_tip_cache #1139
Conversation
The mutable default is problematic when astroid is used as a library, because it effectively becomes a memory leak, see pylint-dev#792. This commit moves the cache to the global namespace and adds a public API entry point to clear it.
_cache: typing.Dict[typing.Tuple[InferFn, NodeNG], typing.Any] = {} | ||
|
||
|
||
def clear_inference_tip_cache(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, I can add a single clear_caches()
entry point which also clears LRU caches on various methods in astroid
(as discussed in #792). Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using the @lru_cache(maxsize=?)
decorator to handle that ? I think it would remove the need for the API. But maybe we'd need a mechanism to set the max size instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need some extra logic to account for the return value being a generator, right?
Otherwise, lru_cache
would do the job. I'd still prefer for us to have an explicit way to clear the cache, because regardless of capacity, references to nodes will keep (potentially expensive) object around in memory for longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know lru_cache well I did not know of the potential problem with generator. Just thought that we might as well use a specialized library to have a sane limit without any human intervention (Maybe 8 Go ?). (We can also have the API on top of it by the way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I was unclear. I was referring to the bit of logic in the existing implementation, where the cached function does itertools.tee
to make the cached generator "re-usable". We could, of course, eagerly materialize the generator into a list
/tuple
and cached that, instead. Does that sg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the itertools.tee
implementation well enough to guess if it will be better one way or the other. A way to test the speed of an implementation is to use the benchmark test in pylint by doing a local install of astroid and comparing the result. But I think as long as we set a limit to the cache we store it will probably be better than the current situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think itertools.tee
has to cache the elements somewhere, and logically if one of the resulting iterators is fully consumed, itertools.tee
will have to keep a full copy of the original iterable around internally. With this in mind, it seems we can just drop the itertools.tee
and always return a list iterator.
As for lru_cache
, I'm afraid we cannot use it here because the current implementation only uses the first argument as key (because the other one is context
?), and lru_cache
does not allow that.
This commit is not expected to affect the behavior, and if anything, should improve memory usage, because result is only materilized once (before it was also stored in-full inside itertools.tee).
eeaf655
to
90cbac3
Compare
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for refactoring this and permitting to clear the cache. Do you have an opinion on what need to be done to make lru_cache clear automatic ? Maybe a decorator that take a string of all the argument given to _inference_tip_cached
and is cached itself ?
The answer to that will depend on the exact API for inference tips. Is it right that inference tip functions always take exactly two arguments -- a node and a context? If so, what is context exactly? Does it remain constant during an inference session? Is it fine to hash it via |
Thank you for your answer, I opened #1150 for the next step and reflection about |
@Pierre-Sassoulas thanks for merging the PR! Happy to work on the |
Hey sorry for now answering this sooner. I thought I might as well partially answer: The inference tip always have two arguments (a node and a context). I asked other active maintainers about constant context during astroid's execution and it look like no one knows precisely. So I would have to look at the only place of truth (The CODE 😄) to answer that, but did not find the time. |
Thanks Pierre! Happy to dig through the code myself and address #1150 :) |
Description
The mutable default is problematic when astroid is used as a library,
because it effectively becomes a memory leak, see #792.
This commit moves the cache to the global namespace and adds a public
API entry point to clear it.
Type of Changes
Related Issue
This PR partially addresses #792. More changes are needed to make memory consumption stable. See discussion in the issue.