Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that ProcessNode.get_builder_restart fully restores all inputs including metadata inputs #5801

Merged

Commits on Dec 14, 2022

  1. ProcessBuilder: Fix bug in _recursive_merge

    The `_recursive_merge` method could raise a `KeyError` for a process
    builder that contains a dynamic port namespace. In this case it would be
    possible to merge in a value that contains a nested dictionay which
    would try to access directly `dictionary[key]`, where `dictionary` is
    the namespace of the builder, and so hit a `KeyError` since this nested
    namespace didn't exist yet in the dynamical namespace.
    
    The solution is to explicitly check if the `key` already exists in the
    builder's namespace, and if not, we assig the entire `value` to it, as
    there is no nothing to recurse into.
    sphuber committed Dec 14, 2022
    Configuration menu
    Copy the full SHA
    3b579ae View commit details
    Browse the repository at this point in the history
  2. ProcessBuilder: Move _prune method to standalone utility

    The method is a generic utility to operate on mappings and is not
    specific to the `ProcessBuilder`. It needs to be used elsewhere soon so
    therefore it is moved to a standalone function in a new utility module
    `aiida.engine.processes.utils`.
    sphuber committed Dec 14, 2022
    Configuration menu
    Copy the full SHA
    86265ae View commit details
    Browse the repository at this point in the history
  3. Port: Add the is_metadata keyword

    In the engine redesign from `v0.x` to `v1.0` the process interface
    needed a way to distinguish inputs that are lined up to the process node
    as nodes themselves and those that are stored directly on the process
    node, for example as an attribute. To this end the `non_db` keyword was
    added to the `InputPort`. This keyword was used for the `metadata` input
    namespace, to designate that these inputs would not be `Data` instances.
    
    The name is quite unfortunate, however, since the inputs of these ports
    actually do get stored in the database, contrary to what the keyword
    suggests. The most straightforward solution would be to rename the
    keyword, however, for better or worse, the keyword has been adopted by
    (a limited amount of) plugin packages. A known application is to pass in
    `Group` instances which are not storable as nodes, but can be passed
    through a `non_db` port. Hypothetically, users may have used the port as
    well to pass in sensitive data that should never be stored. Renaming the
    port or changing its behavior is therefore likely to break existing
    code.
    
    Instead, the `is_metadata` keyword is added to the `InputPort` and
    `PortNamespace` through the `WithMetadata` mixin. When set to `True`
    this keyword functions as the original intention of the `non_db` flag
    and an `is_metadata` port signals that its value will be stored in the
    database but directly on the `ProcessNode` instead of being linked as a
    `Data` node. The naming makes sense, because the only use of this
    keyword is by the `metadata` input namespace that all `Process` classes
    have. The inputs in this namespace are stored, through custom logic in
    the `Process` and `CalcJob` class, in the attributes of the node or
    dedicated columns of the node database model, such as the label and the
    description.
    
    This addition leaves the behavior of `non_db` inputs unchanged and so
    plugin packages that use this keyword should continue to function as
    before.
    sphuber committed Dec 14, 2022
    Configuration menu
    Copy the full SHA
    48cf609 View commit details
    Browse the repository at this point in the history
  4. Process: Store JSON-serializable metadata inputs on the node

    The promise of AiiDA's provenance is that all inputs to a `Process` are
    stored as nodes in the provenance graph linked to a node representing
    the process. From the very early beginning, however, there needed to be
    exceptions of inputs to processes that were not nodes. Notable examples
    were the various "options" set for calculation jobs. Even in AiiDA v0.x
    these "settings" as they were called back then, were stored in the
    attributes of the node.
    
    In the AiiDA v1.0 redesign, where all inputs of a process are defined
    through the process spec, these non-node inputs were implemented by
    allowing input ports to be made non-database storable, indicated by the
    `non-db` argument in the port declaration. These inputs would be passed
    to the `Process` instance and would be available during its lifetime,
    but would not be stored in the database. Once again there are exceptions
    as certain inputs defined by `aiida-core` are stored on the node, but in
    various places. Notable examples are the `label` and `description` of
    the process, and the `metadata.options` of the `CalcJob` class.
    
    This historical decision has as a direct result in that it is difficult
    if not impossible in certain cases to reconstruct the exact input
    dictionary that was used to run a `Process` from the data stored on the
    `ProcessNode`. From a provenance point of view, this is a huge weakpoint
    and is what is being corrected here.
    
    The input ports marked `non_db=True` on the base process classes
    provided by `aiida-core`, `Process` and `CalcJob` were changed to use the
    new `is_metadata` keyword instead in the previous commit, to remove the
    inconsistency between the naming and behavior. In this commit, all inputs
    that correspond to `is_metadata` ports and are JSON-serializable are stored
    in the attributes of the process node under the key `metadata_inputs`.
    All `is_metadata` input ports that are defined on process base classes by
    `aiida-core`, such as `Process` and `CalcJob` *are* JSON serializable.
    However, plugin packages can implement process classes with ports that
    accept inputs that are not JSON serializable, which is why this
    additional condition has to be added. But all inputs defined by
    `aiida-core` should be covered.
    sphuber committed Dec 14, 2022
    Configuration menu
    Copy the full SHA
    06d53a6 View commit details
    Browse the repository at this point in the history
  5. ProcessBuilder: Include metadata inputs in get_builder_restart

    The `get_builder_restart` method on the `ProcessNode` base class would
    return a `ProcessBuilder` with the inputs set to those attached to that
    node instance. The `CalcJobNode` would override this to add the metadata
    options as well. This would be ok for `CalcJobNode`s, but if a restart
    builder was created from a `WorkChainNode` that calls a `CalcJobNode` it
    would have also received options, but those would not be restored. One
    could think that when calling `get_restart_builder` on a `WorkChainNode`
    that we can just go down the callstack, find all the `CalcJobNode`s and
    set the options in the respective input namespaces. But this would not
    exactly reproduce the original inputs, as the options that a calculation
    job has received could have been a modified version of the original
    options passed to the workchain, changed in the logic of the workchain.
    
    Instead, now that the exact metadata inputs for each process are stored
    in the attribute of the node, added in the previous commit, it is this
    dictionary that is used to restore the exact original inputs. It not
    only addresses the problem of incorrect `CalcJob` options, but it also
    restores any other metadata inputs such as the `label` and `description`.
    sphuber committed Dec 14, 2022
    Configuration menu
    Copy the full SHA
    d46d77b View commit details
    Browse the repository at this point in the history