-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit caching to successful calculations? #3368
Comments
We actually considered this problem and decided to also consider failed calculations as cacheable. The use-case comes from developing workchains. Caching is really useful to not have to repeat the calculations themselves when rerunning a workchain. However, often you want to test the error handling here which is only triggered by "failed" calculations. Excluding them from the caching then forces them to be rerun. The difference with the old system is that the old "FAILED" state really meant all problems, not just soft failures indicated by the parser (convergence not reached) but also just plain exceptions etc. The new "failed" concept of |
Yeah, but it might.. the reason I've encountered this is fixing some intermittent issue on the computer, after which the calculation still failed because it was cached. I don't know how to make this calculation usable again with the current setup. In my opinion, being cautious with caching is the preferable approach -- i.e., only cache when you're certain the calculation consistently behaves that way. |
As a plugin developer, you can always override the |
In https://github.com/aiidateam/aiida-quantumespresso/blob/develop/aiida_quantumespresso/calculations/pw.py for example, the 3xx error codes would be of the kind that should probably not be cached, while 4xx and 5xx could be. |
You can remove the hash extra from the calculation you don't want to be cached from. Not ideal mechanic but that is the only way to "disable" individual nodes from being considered. Or is that not what you were asking here?
I see, fair enough, but I still think that this is going to be a big limiting factor on the use-case for workchain development. Without caching, debugging workchains is not really feasible.
Except the problem is that this is a So in conclusion, I think you raise a good point: caching, when used in production, needs to be used carefully and always err on the safe side. However, I still think that it is currently hugely important in debugging modes, and only allowing successful calculations might be too restrictive. I am not sure how this could be turned on through configuration either. |
Yeah, manually disabling it would work (as would just deleting the broken calc), but it's indeed not a great solution.
Ah yes, that seems to be an unintended consequence of the node / calc split. So maybe the |
More generally, I think there needs to be a way for plugin developers to disable caching for a given calculation class -- which was the purpose of EDIT Yes, you can do that in configuration, but the end user shouldn't have to care - just putting |
This can be done though, through the configuration file. We had the same problem there that after the process/node split, the original syntax of that file, which expected node sub classes didn't work. But I fixed that and you can now specify a process class and those will now not be cached. |
Sorry, should've made a separate comment instead of edit: I think the end user shouldn't have to care about this -- The broader point is that I think |
In the workchain development, would restricting it to certain exit codes be enough? Or are there cases where you need to test more generic errors as well? Another option for that would be to introduce different levels of caching, "normal" for production use, "aggressive" for workchain development -- differing in how it handles these cases. |
I am also interested in this - allowing some calculations's caching to be turned off (or invalided) for specific |
I think this could also be solved by giving the In the particular case of the wallclock, it depends on whether you can restart from an interrupted calculation whether caching should be done or not. If you can, then this restart is something that should be done at the workflow level - starting again from scratch would be a waste of resources. In any case, this is probably something I will look into at the coding week. |
In the previous AiiDA version, caching checked that a calculation was in FINISHED state before using it as a basis for caching -- the purpose of this being that it allows failed calculations to be re-run.
With the updates to 1.0, the concept of calculation states was overhauled, and now a calculation can be finished even if it failed (with non-zero exit codes).
I believe the caching implementation should be adapted so that it's only triggered when the exit status is zero.
@sphuber @giovannipizzi do you agree with this?
The text was updated successfully, but these errors were encountered: