Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hook and exit_code keyword for more nuanced cache validation. #3637

Merged
merged 2 commits into from
Dec 12, 2019

Conversation

greschd
Copy link
Member

@greschd greschd commented Dec 10, 2019

Fixes #3368.

Adds a method is_valid_cache to the Process class, which is called with a ProcessNode instance as an argument. This method is called by ProcessNode.is_valid_cache. It allows implementing
custom behavior in Process sub-classes, such as the CalcJob classes implemented in plugins.

As a default, the Process.is_valid_cache checks whether the exit code returned from the process has a - newly introduced - invalidates_cache attribute set to True. This is a simple way of marking the as cache invalid for a specific exit code only.

Question: Should all the documentation go into the "Caching: implementation details" section, or do we want to mention the spec.exit_code option in the "main" caching section?

@greschd
Copy link
Member Author

greschd commented Dec 10, 2019

Mentioning also @sphuber, who helped me with the design.

@greschd greschd force-pushed the add_valid_cache_hook_at_calcjob branch from 67ebf2b to 2a206ff Compare December 10, 2019 17:47
@ltalirz
Copy link
Member

ltalirz commented Dec 10, 2019

Should all the documentation go into the "Caching: implementation details" section, or do we want to mention the spec.exit_code option in the "main" caching section?

Let's put it this way - since the documentation is going to be restructured into tutorials/howtos/topics/reference, this information is certainly not going to go into the "how to enable caching".

So, I would keep it in the implementation section for the moment. We'll decide what to do with this section later.

@greschd greschd changed the title Add hook and exit_code keyword for more nuanced cache validation. [WIP] Add hook and exit_code keyword for more nuanced cache validation. Dec 10, 2019
@greschd greschd force-pushed the add_valid_cache_hook_at_calcjob branch 2 times, most recently from d70b40d to 4495486 Compare December 10, 2019 19:06
@greschd greschd changed the title [WIP] Add hook and exit_code keyword for more nuanced cache validation. Add hook and exit_code keyword for more nuanced cache validation. Dec 10, 2019
@greschd
Copy link
Member Author

greschd commented Dec 10, 2019

Ready for review now.

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @greschd some minor changes and we need to discuss if the change in ExitCode which is technically breaking is acceptable

aiida/engine/processes/exit_code.py Show resolved Hide resolved
aiida/engine/processes/exit_code.py Show resolved Hide resolved
aiida/engine/processes/process.py Outdated Show resolved Hide resolved
# Catch errors when getting the process_class, or when it does not have
# a 'is_valid_cache' method.
try:
is_valid_cache_func = self.process_class.is_valid_cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this also throw other errors, specifically something like EntryPointError if the class cannot be loaded?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yes, indeed it can. It's not quite clear to me what the correct response to that is, though. If the process class can not be loaded due to an entry point error, essentially we can no longer perform the checks that might be implemented in that class. Should that fall back to being a valid cache, or invalid one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perhaps err on the safe side and consider it to be invalid. If the user really want them to be cached they should simply install the relevant plugin. I am just wondering if in this case we should emit a warning otherwise these calculations may be silently skipped for caching and might be difficult to track down

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we keep catching ValueError and return True (because that also happens in calcfunctions, I think), and for the rest log a warning and return False?

Side note: This seems like a pretty weird edge case -- you're just launching a new process, but then its process class somehow isn't available. How does this even happen? Not saying we shouldn't handle it, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems I have misunderstood this before: The ValueError happens whenever the process_class can not be loaded.

This clashes with the existing calcfunctions tests not because calcfunctions are inherently problematic, but because those tests define new calcfunctions inside the class. Since these can not be imported, they are not allowed to cache. That could be fixed by moving all calcfunctions to the module level.

Since calcfunctions should anyway be importable for the daemon to find them, I think it is a safe choice to disable caching when they are not importable. It is definitely a change in behavior though, so if you want to be strict on backwards-compatibility, it should be enabled instead.

docs/source/developer_guide/core/caching.rst Show resolved Hide resolved
aiida/orm/nodes/process/process.py Show resolved Hide resolved
@greschd greschd requested a review from sphuber December 11, 2019 17:09
@greschd greschd force-pushed the add_valid_cache_hook_at_calcjob branch from b327b80 to b75a649 Compare December 12, 2019 08:34
Adds a method `is_valid_cache` to the `Process` class, which is
called with a `ProcessNode` instance as an argument. This method
is called by `ProcessNode.is_valid_cache`. It allows implementing
custom behavior in `Process` sub-classes, such as the `CalcJob`
classes implemented in plugins.

As a default, the `Process.is_valid_cache` checks whether the
exit code returned from the process has a -- newly introduced --
`invalidates_cache` attribute set to `True`. This is a simple
way of marking the as cache invalid for a specific exit code only.

Note that this is a backwards-incompatible change, which can
break code that uses unpacking of ExitCode, for example

    status, message = ExitCode(...)

The new ways of controlling caching are documented in the
developer's documentation.
@greschd greschd force-pushed the add_valid_cache_hook_at_calcjob branch from b75a649 to 91e85c5 Compare December 12, 2019 10:16
This makes the calcfuntions importable, which is now required for
them to be used in caching.

Also catching only ValueError when getting the process_class,
instead of a broad except.
@greschd greschd requested a review from sphuber December 12, 2019 13:51
@greschd
Copy link
Member Author

greschd commented Dec 12, 2019

I've implemented the changes in the calcfunctions tests - for me this is good to go.

Let me know if you'd rather have the version where inaccessible process_class returns True, and I'll revert the test changes and implement that instead.

@sphuber sphuber merged commit f68b304 into aiidateam:develop Dec 12, 2019
@sphuber
Copy link
Contributor

sphuber commented Dec 12, 2019

Thanks a lot @greschd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit caching to successful calculations?
3 participants