Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed mutable default value in _inference_tip_cache #1139

Merged

Conversation

superbobry
Copy link
Contributor

Description

The mutable default is problematic when astroid is used as a library,
because it effectively becomes a memory leak, see #792.

This commit moves the cache to the global namespace and adds a public
API entry point to clear it.

Type of Changes

Type
🐛 Bug fix
✨ New feature
🔨 Refactoring
📜 Docs

Related Issue

This PR partially addresses #792. More changes are needed to make memory consumption stable. See discussion in the issue.

The mutable default is problematic when astroid is used as a library,
because it effectively becomes a memory leak, see pylint-dev#792.

This commit moves the cache to the global namespace and adds a public
API entry point to clear it.
_cache: typing.Dict[typing.Tuple[InferFn, NodeNG], typing.Any] = {}


def clear_inference_tip_cache():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I can add a single clear_caches() entry point which also clears LRU caches on various methods in astroid (as discussed in #792). Wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using the @lru_cache(maxsize=?) decorator to handle that ? I think it would remove the need for the API. But maybe we'd need a mechanism to set the max size instead ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need some extra logic to account for the return value being a generator, right?

Otherwise, lru_cache would do the job. I'd still prefer for us to have an explicit way to clear the cache, because regardless of capacity, references to nodes will keep (potentially expensive) object around in memory for longer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know lru_cache well I did not know of the potential problem with generator. Just thought that we might as well use a specialized library to have a sane limit without any human intervention (Maybe 8 Go ?). (We can also have the API on top of it by the way).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I was unclear. I was referring to the bit of logic in the existing implementation, where the cached function does itertools.tee to make the cached generator "re-usable". We could, of course, eagerly materialize the generator into a list/tuple and cached that, instead. Does that sg?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the itertools.tee implementation well enough to guess if it will be better one way or the other. A way to test the speed of an implementation is to use the benchmark test in pylint by doing a local install of astroid and comparing the result. But I think as long as we set a limit to the cache we store it will probably be better than the current situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think itertools.tee has to cache the elements somewhere, and logically if one of the resulting iterators is fully consumed, itertools.tee will have to keep a full copy of the original iterable around internally. With this in mind, it seems we can just drop the itertools.tee and always return a list iterator.

As for lru_cache, I'm afraid we cannot use it here because the current implementation only uses the first argument as key (because the other one is context?), and lru_cache does not allow that.

This commit is not expected to affect the behavior, and if anything, should
improve memory usage, because result is only materilized once (before it was
also stored in-full inside itertools.tee).
@superbobry superbobry force-pushed the inference-tip-mutable-default branch from eeaf655 to 90cbac3 Compare August 22, 2021 21:26
Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for refactoring this and permitting to clear the cache. Do you have an opinion on what need to be done to make lru_cache clear automatic ? Maybe a decorator that take a string of all the argument given to _inference_tip_cached and is cached itself ?

@superbobry
Copy link
Contributor Author

The answer to that will depend on the exact API for inference tips. Is it right that inference tip functions always take exactly two arguments -- a node and a context? If so, what is context exactly? Does it remain constant during an inference session? Is it fine to hash it via id()?

@Pierre-Sassoulas
Copy link
Member

Thank you for your answer, I opened #1150 for the next step and reflection about lru_cache. This is already a lot better than the old way of doing it.

@Pierre-Sassoulas Pierre-Sassoulas merged commit 3d0a025 into pylint-dev:main Aug 27, 2021
@superbobry
Copy link
Contributor Author

@Pierre-Sassoulas thanks for merging the PR! Happy to work on the lru_cache version in a follow up, if you give a bit more context on how inference tips are invoked (see questions in my previous comment).

@superbobry superbobry deleted the inference-tip-mutable-default branch August 28, 2021 18:21
@Pierre-Sassoulas
Copy link
Member

Hey sorry for now answering this sooner. I thought I might as well partially answer: The inference tip always have two arguments (a node and a context). I asked other active maintainers about constant context during astroid's execution and it look like no one knows precisely. So I would have to look at the only place of truth (The CODE 😄) to answer that, but did not find the time.

@superbobry
Copy link
Contributor Author

Thanks Pierre! Happy to dig through the code myself and address #1150 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants