Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashing of nodes #119

Closed
aiida-bot opened this issue Nov 6, 2014 · 2 comments · Fixed by #652
Closed

Hashing of nodes #119

aiida-bot opened this issue Nov 6, 2014 · 2 comments · Fixed by #652
Assignees
Labels

Comments

@aiida-bot
Copy link

Originally reported by: Andrea Cepellotti (Bitbucket: acepellotti, GitHub: cepellotti)


Implement a node hashing. In this way, it would be possible to immediately recognise whether a node has been already stored in the DB, or if a calculation has been run already, which would return a result immediately.
Note also, this would massively speed up the debugging of workflows (repeating many times the same things): we might immediately understand (create a method for it) whether a wf step has calculations to be executed or not.


@aiida-bot
Copy link
Author

Original comment by Andrea Cepellotti (Bitbucket: acepellotti, GitHub: cepellotti):


Moreover, to be sure that results are the same, we should make sure that the code and the parsers are not changed. For the code, we should introduce probably an md5sum, and we should also somehow check whether the parser has changed or not.

@aiida-bot
Copy link
Author

Original comment by Andrea Cepellotti (Bitbucket: acepellotti, GitHub: cepellotti):


As noted by Andrius:
... I would propose reusing existing nodes instead of creating anew on each request to cope with node duplication. I have implemented a measure to control this issue in verdi data {upf,cif} import {upf,cif} by using get_or_create() classmethod and would suggest moving get_or_create() to the Data class. I am not quite sure whether get_or_create() should return only parentless nodes or all and would like to invite for discussion on this. If I understand it correctly, reusing data nodes with parents might introduce connection between even unrelated calculations, that's why I would at first reuse only parentless nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants