Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that ProcessNode.get_builder_restart fully restores all inputs including non_db ones #4089

Closed
adegomme opened this issue May 20, 2020 · 14 comments · Fixed by #5801
Closed
Assignees
Milestone

Comments

@adegomme
Copy link
Contributor

adegomme commented May 20, 2020

I have a simple workchain to wrap a calculation. I used expose_inputs to forward everything, and it was using a namespace until recently.
But I would like to get rid of the namespace in this case, and simply forward every input to the calculation job, to get rid of the extra layer of indirection. But metadata.options seems to be impossible to forward directly, as validation for the options key only works for calculations.

The offending line seems to be:

aiida/engine/processes/process.py in _setup_metadata(self)
650 elif name == 'options':
651 for option_name, option_value in metadata.items():
--> 652 self.node.set_option(option_name, option_value)
653 else:
654 raise RuntimeError('unsupported metadata key: {}'.format(name))

AttributeError: 'WorkChainNode' object has no attribute 'set_option'

set_option is only defined in calcjob.py, but not in workchain.py (or process.py). Adding it manually seems to just work in my case. I'm not sure if this has unwanted side effects, though, I only tried a local job for now.

@sphuber
Copy link
Contributor

sphuber commented May 20, 2020

I see what you are trying to do, but I would advise against and anyway as you have noticed now it is not possible. So the exception is more or less intentional. By exposing inputs in the top-level namespace, you are overriding all ports that already exist if they have the same name. In your case then you are overriding the metadata port of the WorkChain with that of the CalcJob. When you then pass options to the WorkChain you are indeed not actually passing them to the CalcJob but to the WorkChain which doesn't have the concept of options, and so fails.

I am not sure if we should add support for this, because it isn't really what it is supposed to be doing. I would indeed expect that adding a set_option to WorkChainNode would work, because all it is doing is setting an attribute, however, I am not sure this is a good idea. The attributes that we would be setting on the workchain node are not actually for the workchain but for the calcjob. If there is a way that we can change the code in an intelligible way to "understand" this, I am more than willing to discuss it. But for now I would really recommend to keep the namespace. What do you think is problematic about having the namespace? You call it indirection, but it really just makes the inputs nested in a dictionary.

Thinking about this, maybe there is a more generic flaw or problem with expose_inputs and the fact that all processes share the metadata input. Exposing inputs of a workchain in another workchain without a namespace also has the same effect that the port of the wrapped workchain overrides that of the wrapping workchain, but since the metadata for workchains doesn't contain any particular sub ports, this "problem" is not really noticed. Not sure where the real problem with the design is or how to address it.

@greschd what do you think about this?

@bosonie
Copy link

bosonie commented May 20, 2020

Sorry for jumping in, but I though a bit about this issue and I briefly add my idea.
I'm not a fan of namespaces because I think they are often redundant, so I don't use them, but I completely agree with @sphuber that the metadata port should never be overridden. It seems natural for me, for instance, to give a different label to the WorkChain compared to the underneath CalcJob.
What I thought to be a good compromise is to expose all the inputs, excluding the metadata and then add an extra port where one gathers the options for the underneath CalcJob. Something like:

    class WC(WorkChain):
      spec.expose_inputs(CalcJob, exclude=('metadata',))
      spec.input("options_for_calcjob", valid_type=DataFactory("dict"))

And of course then in WC you need to place "options_for_calcjob" into the metadata of CalcJob. And at this point I think: why don't we exclude by default metadata when we do expose_inputs??

@adegomme
Copy link
Contributor Author

adegomme commented May 20, 2020

Thanks for the replies, I understand the concerns and potential risks, and indeed I just wanted to see if it was possible not to have to use builder.mycode.metadata.options.resources and such, which I found not really user friendly, as too deeply nested.

Excluding metadata was indeed my go-to workaround in this case, before trying to play with set_option directly. I had a little trouble forwarding it to the job underneath (Dict/dict/Attributdict issues, mainly, all sorted out), but I now have a working way to pass run_opts to the workchain, which will become metadata for the job.

I agree that maybe excluding metadata when using expose_inputs would be a good idea, and so is maybe adding a more explicit error when someone tries to forward them as I did. But that's not really critical either way as the current solution satisifies me.

@greschd
Copy link
Member

greschd commented May 20, 2020

I agree with @bosonie that it's often nicer to expose some inputs at the top-level, but as you've noticed it needs some care as to how these inputs interact with the existing inputs. Just for completeness' sake, note that you can also mix top-level and namespaced inputs. Here's a complete example:

from aiida import orm
from aiida.engine import WorkChain, CalcJob


class CJ(CalcJob):
    @classmethod
    def define(cls, spec):
        super().define(spec)
        spec.input('a', valid_type=orm.Int)
        spec.input('b', valid_type=orm.Float)
        spec.input('c', valid_type=orm.Int)


class WC(WorkChain):
    @classmethod
    def define(cls, spec):
        super().define(spec)
        spec.expose_inputs(CJ, include=('a', 'b'))
        spec.expose_inputs(CJ, namespace='calcjob', exclude=('a', 'b'))

        spec.outline(cls.run)

    def run(self):
        self.report(self.exposed_inputs(CJ, namespace='calcjob'))

That could be run with

run(WC,
    a=orm.Int(1),
    b=orm.Float(1.2),
    calcjob={
        'code': orm.Code.get(label='testcode'),
        'c': orm.Int(3),
        'metadata': {
            'options': {
                'resources': {
                    'num_machines': 1,
                    'num_mpiprocs_per_machine': 1
                }
            }
        }
    })

and the exposed_inputs call will collect inputs both from the namespace and the top-level. This can be especially useful when you want to forward dynamic inputs - although in that case you will have to manually forward the dynamic inputs like so:

from collections import ChainMap

from aiida import orm
from aiida.engine import WorkChain, CalcJob


class CJ(CalcJob):
    @classmethod
    def define(cls, spec):
        super().define(spec)
        spec.input('a', valid_type=orm.Int)
        spec.input('b', valid_type=orm.Float)
        spec.input('c', valid_type=orm.Int)
        spec.inputs.dynamic = True


class WC(WorkChain):
    @classmethod
    def define(cls, spec):
        super().define(spec)
        spec.expose_inputs(CJ, include=('a', 'b'))
        spec.expose_inputs(CJ, namespace='calcjob', exclude=('a', 'b'))
        spec.inputs['calcjob'].dynamic = True

        spec.outline(cls.run)

    def run(self):
        self.report(
            ChainMap(self.exposed_inputs(CJ, namespace='calcjob'),
                     self.inputs.calcjob))

Back to the question at hand, how to deal with the metadata. In my view, exposing something at the top-level means that it has the same meaning for both the wrapping and the wrapped process. I'm not sure if this might ever make sense for metadata - maybe if it's workchains wrapping other workchains sharing e.g. metadata.store_provenance makes sense?

In general, we can't know if a particular input should be shared (top-level, potentially overriding existing input ports) or not - and thus we leave this choice up to the developer. What makes this specific case troublesome however is that metadata isn't defined explicitly - the developer won't think about its existence, and it gets sneakily exposed at the top-level.
Which leads to the question: Is metadata the only input behaving in this special way, or are there others?

If we decide that metadata sharing isn't a use case we want / need to support, we could add a concept of "unoverridable" inputs, that can't be "exposed over" (if they exist already expose_inputs fails). That would make the error explicit, telling the developer that metadata needs to be excluded.

Excluding metadata by default as @bosonie suggests could also be a way forward, although there are a few hurdles in how the interface works right now:

  • You can not specify both include and exclude - if we set a default for exclude, you'd have to remove the default when using include. Or we'd have to implement some complicated logic to deal with this case
  • The case where you expose in a namespace still should expose metadata as well, to support the use case of @sphuber and what is shown in the example above.

@greschd
Copy link
Member

greschd commented May 20, 2020

And to make @bosonie's approach simpler: We could add a "rename" option to the "expose_inputs", like

spec.expose_inputs(CalcJob, rename=[('metadata', 'calcjob_metadata')])

that would rename the port and automatically translate back in exposed_inputs.

If I understood correctly you'd want it to work with a nested port:

spec.expose_inputs(CalcJob, rename=[('metadata.options', 'calcjob_options')])

Not sure how simple that would be to implement.

@sphuber
Copy link
Contributor

sphuber commented May 20, 2020

Thanks for all the inputs guys. I agree with a number of the points made:

  • An exposed port at the top-level should have the same meaning for both wrapping and wrapped process
  • The metadata.options is a clear exception and is so far the only port defined by AiiDA itself that behaves different depending on the process type.
  • The fact that metadata is defined as any other port, yet works different from other input ports, is a weak point of the design
  • Especially given that it is defined by the base Process class and so people may not know about its existence, accidentally overriding it

As for solutions, I am not a fan of the solution proposed by @bosonie because it makes the options an actual input to the workchain instead of the calculation job. The whole point of the exposed inputs functionality (among other things of course) was to allow to get rid of having to wrap options in an actual node and then unwrap them. You lose the auto-completion and validation of the CalcJob process spec this way, which affects usability and discoverability in lots of ways.

Finally, I think that having a namespace should not be seen as a burden, but actually as an aid to remind one that those inputs are not meant for the called process, but one of the sub processes. I think that reminding ourselves that exposing does not really mean "re-purposing". The wrapper workchain is merely allowing the ports of its child process to "poke through" the interface of its own, and it is not claiming those ports to be its own.

@greschd
Copy link
Member

greschd commented May 20, 2020

Finally, I think that having a namespace should not be seen as a burden, but actually as an aid to remind one that those inputs are not meant for the called process, but one of the sub processes. I think that reminding ourselves that exposing does not really mean "re-purposing". The wrapper workchain is merely allowing the ports of its child process to "poke through" the interface of its own, and it is not claiming those ports to be its own.

This depends a bit on what the intention of the wrapping process is: If the goal is to be "opaque" to the user, such that the user shouldn't really have to care about which sub-process will be called, I can see that the namespace is a bit of an eye sore.
In practice, I think it's rarely achievable to be completely opaque, in part because the user anyway needs to select resources, codes etc. for the child process.

All that's to say, I think there are valid reasons to use either full namespacing, toplevel-only exposing, or a mixed approach.

To make the metadata problem more obvious, what do you think about the idea of disallowing exposing over an existing metadata port?

@sphuber
Copy link
Contributor

sphuber commented May 20, 2020

To make the metadata problem more obvious, what do you think about the idea of disallowing exposing over an existing metadata port?

I think that is fine, but I think this makes the solutions to wrapping a CalcJob only uglier. You would either have to do:

@classmethod
def define(cls, spec):
    spec.expose_inputs(SomeCalcJob, exclude=('metadata',))
    spec.expose_inputs(SomeCalcJob, include=('metadata',), namespace='sub')

inputs ={
    'code': ...
    'parameters': ..
    'sub': {
        'metadata': {
            'options': {}
        }
    }
}
submit(WrapperWorkChain, **inputs)

or something like

@classmethod
def define(cls, spec):
    spec.expose_inputs(SomeCalcJob, exclude=('metadata',))
    spec.inputs('calcjob_options', type=orm.Dict)

inputs ={
    'code': ...
    'parameters': ..
    'calcjob_options': orm.Dict(dict={}),
}
submit(WrapperWorkChain, **inputs)

I find that both those options are more complex than:

@classmethod
def define(cls, spec):
    spec.expose_inputs(SomeCalcJob, namespace='sub')


inputs ={
    'sub': {
        'code': ...
        'parameters': ..
        'metadata': {
            'options': {}
        }
    }
}
submit(WrapperWorkChain, **inputs)

With the latter note, how you can literally take the input dictionary as one would have passed it straight to the CalcJob and just nest it in sub. In the first two options, you always have to change and mix and match. The same goes of course inside the wrapping workchain implementation when actually preparing the inputs to launch the sub process.

The second option I still consider to be the worst, because as said before, you loose the entire specification of the options defined on the CalcJob base class. By wrapping them opaquely in a Dict node, you no longer have the granular auto-documenting, discovery and validation.

@greschd
Copy link
Member

greschd commented May 20, 2020

Yeah, I agree with your style assessment here. Personally, I'd use 1) only if there are multiple sub-workchains that all share the same input (e.g., structure). This allows you to avoid duplication, by exposing the same input from all sub-workchains. But I'd only ever use it with the include option (specifying the shared inputs), not with exclude.

To clarify what I was proposing though: I would consider all three options correct, even if maybe not recommended. What I think should eventually (after deprecation, and maybe with an override switch) produce an error is straight up

@classmethod
def define(cls, spec):
    spec.expose_inputs(SomeCalcJob)

because it shadows an existing metadata port.

The option

@classmethod
def define(cls, spec):
    spec.expose_inputs(SomeCalcJob, namespace='sub')

would of course still be allowed, because there's no existing metadata port there.

In terms of implementation, we could add an overridable (default True) attribute to the ports, which gets checked in the absorb method of the namespaces.

@zhubonan
Copy link
Contributor

One problem with exposed_inputs and use that for passing metadata is that WorkChainNode.get_builder_restart does not work at the workchain level, since the metadata.option is non_db, e.g. not backed up as a Node.

On the other hand, wraping it into a options input port does not have this problem, although arguably it losses tab completion and automatic validation priory to starting the workchain.

Is it possible to have get_builder_restart work with exposed metadata.options?

@sphuber
Copy link
Contributor

sphuber commented Jan 18, 2022

I think this may actually be possible, because the CalcJobNode has the get_options method, which returns the dictionary that was used as the metadata.options namespace when it was run. I am not a 100% if this is complete though, but from a quick glance it looks close.

Then there is just the trick that the get_builder_restart will have to find all the locations in the namespace where there is a metadata.options input, find the corresponding CalcJobNode and then fill it with the dictionary returned by get_options.

@zhubonan
Copy link
Contributor

zhubonan commented Jan 18, 2022

Yes, but more complicated workchain may have exposed multiple metadata ports, for example, as band structure workchain where the SCF and non-SCF calculations have different resources requested. In this case, the results get_options call to the children CalcJobNode needs to linked with the correct input ports. But the linkage is not inferrable from spec of the WorkChainNode alone.

I think one solution can be to be get_bulder_restart calling certain methods of the CalcJob (e.g. self.process_class.<some_method>) to establish linkage. The fall back would still be using that of the first calculation launched, if there is only as single expose metadata.

Or we can have the linkage stored under some attribute of the WorkChainNode, since the only use case of is to reconcstruct the inputs.

Then we can ditch passing metadata.options as Dict....

@sphuber
Copy link
Contributor

sphuber commented Jan 18, 2022

Actually, this won't work, even if we record the explicit link (which CalcJobNode corresponds to which namespace in the exposed inputs) because the inputs that are passed to the workchain can actually be modified inside the workchain, and so the options that ultimately get passed to the CalcJob cannot be guaranteed to be the same as those that were passed to the workchain.

@zhubonan
Copy link
Contributor

Actually, this won't work, even if we record the explicit link (which CalcJobNode corresponds to which namespace in the exposed inputs) because the inputs that are passed to the workchain can actually be modified inside the workchain, and so the options that ultimately get passed to the CalcJob cannot be guaranteed to be the same as those that were passed to the workchain.

Good point. Then I think the only way out is to record these non_db inputs passed as some attributes of the WorkChainNode, just like that of CalcJobNode.

@sphuber sphuber self-assigned this Nov 25, 2022
@sphuber sphuber changed the title set_option for workchains ? Ensure that ProcessNode.get_builder_restart fully restores all inputs including non_db ones Dec 6, 2022
@sphuber sphuber added this to the v2.3.0 milestone Dec 14, 2022
@sphuber sphuber added type/accepted feature approved feature request and removed requires discussion labels Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants