Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: remove note on disk space for caching #5534

Merged
merged 5 commits into from
May 24, 2022
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/source/topics/provenance/caching.rst
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,8 @@ Limitations and Guidelines
While AiiDA's hashes include the version of the Python package containing the calculation/data classes, it cannot detect cases where the underlying Python code was changed without increasing the version number.
Another scenario that can lead to an erroneous cache hit is if the parser and calculation are not implemented as part of the same Python package, because the calculation nodes store only the name, but not the version of the used parser.

#. Note that while caching saves unnecessary computations, it does not save disk space: the output nodes of the cached calculation are full copies of the original outputs.
#. While caching saves unnecessary computations, it does not directly prevent duplication of data: the cached calculation and its output nodes are duplicated.
In practice, however, AiiDA's file repository implementation will detect that any files associated with these nodes are already present and simply point to those, reducing duplication to metadata stored at the database level.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#. While caching saves unnecessary computations, it does not directly prevent duplication of data: the cached calculation and its output nodes are duplicated.
In practice, however, AiiDA's file repository implementation will detect that any files associated with these nodes are already present and simply point to those, reducing duplication to metadata stored at the database level.
#. While caching saves unnecessary computations, it does not necessarily prevent duplication of data: the cached calculation and its output nodes are duplicated in the storage.
Whether the duplicated nodes actually result in the _size_ of the storage increasing, depends on the storage implementation, which may implement automatic deduplication mechanisms to save space.
This is actually the case for the default storage implementation `psql_dos`; this storage automatically detects files that already exist and will not store them again.

Copy link
Member Author

@ltalirz ltalirz May 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I wanted to phrase it slightly differently but I've tried to incorporate your points


#. Finally, When modifying the hashing/caching behaviour of your classes, keep in mind that cache matches can go wrong in two ways:

Expand Down