Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProcessFunction: Add support for variadic arguments #5691

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Oct 6, 2022

Fixes #1172

Up till now, variadic arguments, i.e., arguments defined as *args in a
function signature which collects any remaining positional arguments,
were not supported for process functions. The main reason was that it
wasn't immediately clear what the link label should be for these inputs.

For normal positional arguments we can take the name of the argument
declaration in the function signature, and for keyword arguments we take
the keyword with which the argument is passed in the function invocation.
But for variadic arguments there is no specific argument name, not in
the function signature, nor in the function invocation. However, we can
simply create a link label. We just have to ensure that it doesn't clash
with the link labels that will be generated for the positional and
keyword arguments.

Here the link label will be determined with the following format:

`{label_prefix}_{index}`

The label_prefix is determined the name of the variadic argument. If a
function is declared as function(a, *args, **kwargs) the prefix will
be equal to args and if it is function(*some_var_args) it will be
some_var_args. The index will simply be the zero-base index of the
argument within the variadic arguments tuple. This would therefore give
link labels args_0, args_1 etc. in the first example.

If there would be a clash of labels, for example with the function def:

def function(args_0, *args):

which when invoked as:

function(1, *(2, 3))

would generate the labels args_0 for the first positional argument,
but also args_0 for the first variadic argument. This clash is
detected and a RuntimeError is raised instructing the user to fix it.

@sphuber sphuber requested review from ltalirz and unkcpz October 6, 2022 09:16
@chrisjsewell
Copy link
Member

Hey @sphuber do you have a specific use case in mind for this?

I would point to the zen python "explicit is better than implicit" and ask why can't we just allow for kwargs and avoid such implicit behaviour?

@sphuber
Copy link
Contributor Author

sphuber commented Oct 6, 2022

Hey @sphuber do you have a specific use case in mind for this?

See #1172 . Apparently there were multiple use-cases from users. I don't have one personally but I can definitely see the point. I don't see a strong reason not to allow it. I mean, Python itself supports variadic arguments, so why would we prohibit it?

And finally, users are not forced to use it. If they prefer to be more explicit with positional and keyword arguments, they can do so without any issues.

Together with #5688 to support automatic base type input serialization, I think process functions can become even easier to use. You would be able to

@calcfunction
def sum_integers(*args):
    return sum(args)

assert sum_integers(1, 2, 3) == 6

@sphuber
Copy link
Contributor Author

sphuber commented Oct 6, 2022

By the way, once agreed upon the interface, I will update the documentation.

@sphuber sphuber force-pushed the feature/1172/process-function-variadic-arguments branch from a902244 to afe4e74 Compare October 6, 2022 12:36
@chrisjsewell
Copy link
Member

chrisjsewell commented Oct 6, 2022

See #1172 . Apparently there were multiple use-cases from users.

This is not a use case to use varadic arguments though, its an argument to use a list of nodes an input, and similarly we should make it easy to use a dict of nodes as an input.

There is an obvious way how to specify in the function construction, what the inputs should be (also allowing for type validation): type annotations:

@calcfunction
def sum_integers(integers: list[orm.Int]) -> orm.Int:
    return orm.Int(sum(*(i.value for i in integers)))

You could even extend this kind of type validation further in the future with: https://docs.python.org/3/library/typing.html#typing.Annotated

Secondly, with the use of input/output lists and dicts, it should be easier to "reconstruct" them.
At present the link table of AiiDA is quite restrictive: https://github.com/aiidateam/aiida-core/blob/1c5dfef09d18895ce327b4b145466373f9f05118/aiida/storage/psql_dos/models/node.py#L149

I think it might be beneficial to add a metadata colmn, where you could signify that this link is part of a list or dict, and the index/key for that list/dict

@danielhollas
Copy link
Collaborator

danielhollas commented Oct 7, 2022

Thanks for working on this! I spent quite some time banging my hand against this restriction. 🤕

Hey @sphuber do you have a specific use case in mind for this?

Just as an example of my use case, functionality like this is needed if the number of inputs is not known beforehand. Right now instead of simple calcfunction I need to use a WorkChain with an input with dynamic namespace. Here's an example where I simply want to combine multiple StructureData node into a single TrajectoryData node.

https://github.com/danielhollas/aiidalab-ispg/blob/13f294438422a4eccfe2647baca621860d2416fd/workflows/aiidalab_atmospec_workchain/__init__.py#L55

(happy to hear if there's an easier way)

@sphuber
Copy link
Contributor Author

sphuber commented Oct 7, 2022

Thanks for the input @danielhollas . You could in principle use kwargs for your use case. You just have to create a dictionary with arbitrary keys as an input for your StructureData. Something like this:

@calcfunction
def structures_trajectory(**structures):
    return TrajectoryData([structure for structure in structures.values()])

structures = {
    'structure_a': StructureData(),
    'structure_b': StructureData(),
    'structure_c': StructureData(),
}

structures_trajectory(**structures)

With the proposed new syntax, one could do the following:

@calcfunction
def structures_trajectory(*structures):
    return TrajectoryData(structures)

structures = (StructureData(), StructureData())

structures_trajectory(*structures)

I personally think both are fine. As you can see the current existing solution is quite reasonable, it is just that quite a number of users were stumped to find out that varargs are not supported when that is possible in normal Python. I think it couldn't hurt to support the option for those who want it.

@ltalirz
Copy link
Member

ltalirz commented Oct 7, 2022

I agree with Chris that the underlying feature people are looking for here is not so much variadic arguments but being able to work with, and seamlessly pass around compounds (lists, dicts, ...) of AiiDA objects.

That the provenance graph does not have a concept of a list of objects (or, in other words, that you can't have an orm.List of AiiDA objects) is quite a significant limitation of the AiiDA language that seems worth tackling.

As for the specific issue in this thread, given that we need to create named links for function inputs, I think requiring inputs to functions to be named is reasonable; I think it just needs to be well documented with examples for canonical use cases like the one above.

@sphuber
Copy link
Contributor Author

sphuber commented Oct 7, 2022

As for the specific issue in this thread, given that we need to create named links for function inputs, I think requiring inputs to functions to be named is reasonable; I think it just needs to be well documented with examples for canonical use cases like the one above.

@ltalirz
I think as the various responses from users testify, using varargs is something that people would naturally do. Even if it is documented somewhere that this is not supported with a counter-example based on kwargs, people are unlikely to find this beforehand and not run into this limitation.

I agree that ideally lists and dictionaries of inputs would be supported, but it is not clear how much work it would take to implement this. Is there a real objection to add this functionality in the meantime? Is there a real downside? Adding this now doesn't prevent us from adding direct support for lists and dictionaries later on, would it?

@sphuber sphuber force-pushed the feature/1172/process-function-variadic-arguments branch from b7824c4 to 571577f Compare October 7, 2022 12:47
@ltalirz
Copy link
Member

ltalirz commented Oct 7, 2022

Is there a real objection to add this functionality in the meantime?

My objection would be that the logic in the implementation and the labels it generates are non-obvious. Once it is in aiida-core, users can start relying on it, and it becomes difficult to change.

An alternative stop-gap solution would be to replace the error message variadic arguments are not supported with something more helpful like

Variadic arguments are not supported. Instead of a list `*args`, pass a dictionary `**kwargs`, whose keys will be used as link labels.

Anyhow, I'm not going to block the PR if people feel strongly that support for *args is needed.

@chrisjsewell
Copy link
Member

My objection would be that the logic in the implementation and the labels it generates are non-obvious. Once it is in aiida-core, users can start relying on it, and it becomes difficult to change.

Yeh this was kinda my feeling also 😬

@danielhollas
Copy link
Collaborator

danielhollas commented Oct 7, 2022

You could in principle use kwargs for your use case. You just have to create a dictionary with arbitrary keys as an input for your StructureData

Oh cool! I didn't know that! Thanks so much @sphuber Nevertheless, I'd say it is still weird to use a dictionary for passing in a list, imo weirder than having surprising link labels, but I don't know enough about how the link labels are generally used by users. In any case, arg_x seems like a very straightforward naming, and the potential clashes with positional arguments should be very rare, why would anybody name the positional argument as arg_0 for example?

An alternative stop-gap solution would be to replace the error message variadic arguments are not supported with something more helpful like

This would go a long way tbh, please implement this regardless of the result of the discussion here. 😊

@danielhollas
Copy link
Collaborator

I'd also say that calcfunctions, workfunctions are so much more lightweight than CalcJobs / WorkChains, so making them as useful and as painless to use is a goal worth pursuing.

@sphuber
Copy link
Contributor Author

sphuber commented Oct 7, 2022

Anyhow, I'm not going to block the PR if people feel strongly that support for *args is needed.

For me it is also not crucial to have this functionality, I have personally never felt the need, but that is probably because I, as a developer, was aware of the limitation and the alternatives. I simply picked up the existing issue since I worked on the process functions anyway (for the automatic input serialization) and since there seemed to be multiple people asking for it, I just implemented it.

In any case, arg_x seems like a very straightforward naming, and the potential clashes with positional arguments should be very rare, why would anybody name the positional argument as arg_0 for example?

This was exactly my thinking. In most cases, people don't even really care about the generated link labels, but in this case, having args_0 etc. seemed perfectly logical to me. The exception in the case of a clash should indeed be super rare and practically never occur, but I put in the code to cover all the bases and not have exceptions in those rare cases.

Anyway, I think all has been said what needs to be said. I propose I quickly present this during the next AiiDA meeting and then we can see what other people think. If they share your concern, then I am perfectly fine with leaving it as is (with an update of the docs explaining this use-case and pointing to the **kwargs usage).

@giovannipizzi
Copy link
Member

Thanks @sphuber, and all for the feedback. Here is my take.

  • There are two issues.
    • One is allowing this behaviour (I see why people want it, there is an example above where this is useful and we don't really care of the label so I would support this PR (with the comment/change below)
    • Then, there is the issue of "list of AiiDA nodes" and "dicts of AiiDA nodes". I totally see the need, but this is more complex to fix (and only partially overlapping - in the case of @danielhollas, I think he would be fine with this PR and does not really need to define a node being the list of multiple StructureData nodes). Other cases instead would require this concept to be in the graph and immutable. So for this latter case, we should open another issue and investigate what needs to be changed (e.g., one of the many possible approaches: create a new new node type "DictOfNodes", and define a new link type "MemberOf" between data nodes - but we should really discuss in another issue).

So, I would be OK with merging this, with one change: I don't like names being created with UUID and in a non-reproducible/understandable way. I suggest that if a potential name clash is found (e.g. one has *args plus any other argument starting with args_ followed by a positive integer), we just raise (similar to what we are now doing with the message Variadic arguments are not supported), explaining the limitation. This should be enough to avoid unexpected behaviour, and at the same point to have the developed fix this at their first use of the (calc/work)function they just wrote.
Probably nobody will ever see this exception, but if one sees it, it's a 5-second thing to change the variable name accordingly.

@sphuber sphuber force-pushed the feature/1172/process-function-variadic-arguments branch from 571577f to 22c59dc Compare December 7, 2022 12:21
@@ -103,6 +99,24 @@ Note that the inputs **have to be passed as keyword arguments** because they are
If the inputs would simply have been passed as positional arguments, the engine could have impossibly determined what label to use for the links that connect the input nodes with the calculation function node.
For this reason, invoking a 'dynamic' function, i.e. one that supports ``**kwargs`` in its signature, with more positional arguments that explicitly named in the signature, will raise a ``TypeError``.

.. versionadded:: 2.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version should be updated?

@giovannipizzi
Copy link
Member

Thanks! Please udpate the versionadded (see my other comment) and also the main PR message (it still mentions UUIDs). Then consider it approved by me!

Up till now, variadic arguments, i.e., arguments defined as `*args` in a
function signature which collects any remaining positional arguments,
were not supported for process functions. The main reason was that it
wasn't immediately clear what the link label should be for these inputs.

For normal positional arguments we can take the name of the argument
declaration in the function signature, and for keyword arguments we take
the keyword with which the argument is passed in the function invocation.
But for variadic arguments there is no specific argument name, not in
the function signature, nor in the function invocation. However, we can
simply create a link label. We just have to ensure that it doesn't clash
with the link labels that will be generated for the positional and
keyword arguments.

Here the link label will be determined with the following format:

    `{label_prefix}_{index}`

The `label_prefix` is determined the name of the variadic argument. If a
function is declared as `function(a, *args, **kwargs)` the prefix will
be equal to `args` and if it is `function(*some_var_args)` it will be
`some_var_args`. The index will simply be the zero-base index of the
argument within the variadic arguments tuple. This would therefore give
link labels `args_0`, `args_1` etc. in the first example.

If there would be a clash of labels, for example with the function def:

    def function(args_0, *args):

which when invoked as:

    function(1, *(2, 3))

would generate the labels `args_0` for the first positional argument,
but also `args_0` for the first variadic argument. This clash is
detected and a `RuntimeError` is raised instructing the user to fix it.
@sphuber sphuber force-pushed the feature/1172/process-function-variadic-arguments branch from 22c59dc to 36118c8 Compare December 14, 2022 11:13
@sphuber sphuber merged commit 68cd30d into aiidateam:main Dec 14, 2022
@sphuber sphuber deleted the feature/1172/process-function-variadic-arguments branch December 14, 2022 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use of lists as arguments to a workfunction
5 participants