Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use calcfunction for getitem in For loop workgraph #283

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

agoscinski
Copy link
Contributor

@agoscinski agoscinski commented Aug 28, 2024

To track the provenance in the for loop we now use a calcfunction to track provenance for the For loop workgraph. I would like to allow more types than integers for the sequence, but I am not sure how to generically test for types that are supported by aiida core. Some orm types have a _type class attribute, but not all of them.

TODO:

  • add notebook reason why to choose this over a normal for loop (less tasks created), or at least move this to an issue

@superstar54
Copy link
Member

Hi @agoscinski , thanks for the work.

There is a function orm.to_aiida_type which can serialize the data automatically, and you can check which Python data types it support.

It's also would be good if the user can supply a list of AiiDA data as the input for sequence, e.g.,

[structure0, structure1, ...]

Because AiiDA does not support a list of AiiDA data as input, internally, we can change it to a dict with keys, the keys keep the order information.

{"sequence_0": structure0, ...}

and then, in the WorkGraph engine, if the sequence is a dict, we get the AiiDA data directly without serialization.

@agoscinski
Copy link
Contributor Author

Thanks @superstar54 applied your suggestion. The test pass, but it does not test the other use cases you mentioned. I tried to implement a dictionary, but it is a bit opaque to me how to change the iterator to be compatible with dictionaries that use hashable keys. It seems the iterator is hard coded to only work for collections that can be accessed by incremental increasing numbers like lists. Maybe we should generalize this for arbitrary generators. Not sure if this PR is the right place.

@agoscinski agoscinski marked this pull request as ready for review September 2, 2024 10:09
Copy link
Member

@superstar54 superstar54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @agoscinski , thanks for the work!

Please test for the changes.

As mentioned in my previous comment, if you use a list of orm.data as the sequence, it will raise an error, because we need to change the list to a dict.

Therefore, you need also modify the aiida_workgraph.workgraph.WorkGraph to convert it into a dict. Then inside the engine, the sequence is a dict.

@calcfunction
def __getitem__(sequence, count):
value = sequence[count.value]
# only convert value f not already orm type because sequence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# only convert value f not already orm type because sequence
# only convert value if not already orm type because sequence

@codecov-commenter
Copy link

codecov-commenter commented Sep 17, 2024

Codecov Report

Attention: Patch coverage is 78.37838% with 8 lines in your changes missing coverage. Please review.

Project coverage is 80.70%. Comparing base (5937b88) to head (81b5bac).
Report is 76 commits behind head on main.

Files with missing lines Patch % Lines
aiida_workgraph/workgraph.py 65.21% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #283      +/-   ##
==========================================
+ Coverage   75.75%   80.70%   +4.95%     
==========================================
  Files          70       66       -4     
  Lines        4615     5157     +542     
==========================================
+ Hits         3496     4162     +666     
+ Misses       1119      995     -124     
Flag Coverage Δ
python-3.11 80.62% <78.37%> (+4.96%) ⬆️
python-3.9 80.66% <78.37%> (+4.92%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@agoscinski
Copy link
Contributor Author

agoscinski commented Sep 17, 2024

As mentioned in my previous comment, if you use a list of orm.data as the sequence, it will raise an error, because we need to change the list to a dict.

That is for non json serializable data types like orm.StructureData for others it will automatically convert it and is not a problem. My problem with the dynamic workaround is that in this case it is shown differently in the provenance graph and also will create a lot of links.

That was the initial idea

@calcfunction
def __getitem__(seq, idx):
    return orm.to_aiida_type(seq[idx.value])

node = __getitem__([1,2,3], 0)
generate_node_graph(node.pk)
Screenshot 2024-09-17 at 11 34 09

but the dynamic input ports approach gives

import numpy as np

@calcfunctione
def __getitem__(**kwargs):
    idx = kwargs.pop('idx')
    list_of_orms = list(kwargs.values())
    return list_of_orms[idx.value].clone()

inp = {f'list_of_orms_{i}': orm.StructureData(cell=np.random.rand(3,3)) for i in range(3)}
inp['idx'] = orm.Int(0)
node = __getitem__(**inp)
generate_node_graph(node.pk)
Screenshot 2024-09-17 at 11 35 36

There is also the new StructureData object that might be json serializable because of the BaseModel that resolves this issue (EDIT: no Miki tried). I am also concerned by performance, since we now have to convert a Generator to a list which can be quite time consuming. I feel more and more that we should drop this PR. With the For workgraph we can maybe go more into a similar direction as with the while loop and just focus and a nice GUI representation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants