Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_builder_restart() failing due to absence of resources #5977

Closed
mbercx opened this issue Apr 22, 2023 · 7 comments · Fixed by #6284
Closed

get_builder_restart() failing due to absence of resources #5977

mbercx opened this issue Apr 22, 2023 · 7 comments · Fixed by #6284
Labels

Comments

@mbercx
Copy link
Member

mbercx commented Apr 22, 2023

Describe the bug

When trying to use get_builder_restart() from a more convoluted work chain I wrote, the operation failed with a ValueError related to the options.metadata.resources:

In [3]: epw_successful = load_node(1983806)

In [4]: epw_successful.get_builder_restart()

[...] Trimmed for brevity, full traceback below

~/envs/aiida-super/code/aiida-core/aiida/engine/processes/builder.py in __setattr__(self, attr, value)
    123                 validation_error = port.validate(value)  # type: ignore[union-attr]
    124                 if validation_error:
--> 125                     raise ValueError(f'invalid attribute value {validation_error.message}')
    126 
    127             # If the attribute that is being set corresponds to a port that is a ``PortNamespace`` we need to make sure

ValueError: invalid attribute value input `metadata.options.resources` is required but is not specified
Full Traceback
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-d04f3d61c05f> in <module>
----> 1 epw_successful.get_builder_restart()

~/envs/aiida-super/code/aiida-core/aiida/orm/nodes/process/process.py in get_builder_restart(self)
    197         """
    198         builder = self.process_class.get_builder()
--> 199         builder._update(self.base.links.get_incoming(link_type=(LinkType.INPUT_CALC, LinkType.INPUT_WORK)).nested())  # pylint: disable=protected-access
    200         builder._merge(self.get_metadata_inputs() or {})  # pylint: disable=protected-access
    201 

~/envs/aiida-super/code/aiida-core/aiida/engine/processes/builder.py in _update(self, *args, **kwds)
    205             for key, value in args[0].items():
    206                 if isinstance(value, Mapping):
--> 207                     self[key].update(value)
    208                 else:
    209                     self.__setattr__(key, value)

/usr/lib/python3.8/_collections_abc.py in update(self, other, **kwds)
    830         if isinstance(other, Mapping):
    831             for key in other:
--> 832                 self[key] = other[key]
    833         elif hasattr(other, "keys"):
    834             for key in other.keys():

~/envs/aiida-super/code/aiida-core/aiida/engine/processes/builder.py in __setitem__(self, item, value)
    155 
    156     def __setitem__(self, item, value):
--> 157         self.__setattr__(item, value)
    158 
    159     def __delitem__(self, item):

~/envs/aiida-super/code/aiida-core/aiida/engine/processes/builder.py in __setattr__(self, attr, value)
    123                 validation_error = port.validate(value)  # type: ignore[union-attr]
    124                 if validation_error:
--> 125                     raise ValueError(f'invalid attribute value {validation_error.message}')
    126 
    127             # If the attribute that is being set corresponds to a port that is a ``PortNamespace`` we need to make sure

ValueError: invalid attribute value input `metadata.options.resources` is required but is not specified

But it does seem to work for a PwBaseWorkChain example:

In [7]: base_success = load_node(1985828)

In [8]: base_success.get_builder_restart()
Out[8]: 
Process class: PwBaseWorkChain
Inputs:
clean_workdir: false
kpoints: 'Kpoints mesh: 10x10x10 (+0.0,0.0,0.0)'
kpoints_distance: 0.2
kpoints_force_parity: false
max_iterations: 5
metadata:
  call_link_label: scf
  store_provenance: true
pw:
  code: Quantum ESPRESSO pw.x v6.8 for JQ's Wannier workflows
  metadata:
    call_link_label: CALL
    dry_run: false
    options:
      account: mr32
      append_text: ''
      custom_scheduler_commands: ''
      environment_variables_double_quotes: false
      import_sys_environment: true
      input_filename: aiida.in
      max_wallclock_seconds: 1800
      mpirun_extra_params: []
      output_filename: aiida.out
      parser_name: quantumespresso.pw
      prepend_text: ''
      resources:
        num_machines: 1
      scheduler_stderr: _scheduler-stderr.txt
[...]

Steps to reproduce

See above, will do some more testing myself and report back.

Expected behavior

Get a nice and fully populated builder, including metadata and all.

Your environment

  • Operating system [e.g. Linux]: Ubuntu 20.04.1 LTS (Focal Fossa)
  • Python version [e.g. 3.7.1]: 3.8.10
  • aiida-core version [e.g. 1.2.1]:
@mbercx mbercx changed the title Add options to get_builder_restart() failing for complex wort chain due to absence of resources Apr 22, 2023
@mbercx mbercx changed the title get_builder_restart() failing for complex wort chain due to absence of resources get_builder_restart() failing due to absence of resources Apr 22, 2023
@sphuber
Copy link
Contributor

sphuber commented Apr 23, 2023

I think this should have already been fixed in v2.3 with #5801 . The problem was that the metadata inputs were not being stored in a way that would allow to restore them fully. The only downside is that this won't work retro-actively, but for any workchains run with v2.3 and up the problem should be fixed. It is not really possible to provide any data migrations to make this work retroactively.

@mbercx
Copy link
Member Author

mbercx commented Apr 23, 2023

Alright, let me give a try with a freshly run work chain in v2.3 and get back to you! In case it's fixed, I will close.

@sphuber
Copy link
Contributor

sphuber commented Apr 27, 2023

Did you get a chance to try @mbercx ? Can this be closed?

@mbercx
Copy link
Member Author

mbercx commented Apr 27, 2023

Hmm, I just gave it another try, and it still seems broken. Restarted the daemon and everything. Still, will try again in a fresh environment and see if I can reproduce it there.

So leave it open a bit longer, I'll report back by the AiiDA meeting next week (made a note in the HackMD).

@sphuber
Copy link
Contributor

sphuber commented Apr 28, 2023

for the workchain where it seems to be broken, can you check the attributes of that node. It should contain the metadata_inputs attribute. That should contain all the inputs that were passed to the metadata namespace.

@t-reents
Copy link

t-reents commented Feb 9, 2024

Hi @sphuber @mbercx

I was just about to start a discussion about this on AiiDA discourse, before I found this issue. Probably, you discussed it in the AiiDA meeting back then, but since this issue is still open, and I still encounter it in AiiDA 2.4 and 2.5, I'll just add the output that you mentioned to this discussion.

This example is based on a PwRelaxWorkChain, for which the node.get_builder_restart() fails, while it perfectly works for the child PwBaseWorkChains. The output of the attributes of the broken PwRelaxWorkChain looks as follows:

{'sealed': True,
 'version': {'core': '2.5.1', 'plugin': '4.4.0'},
 'exit_status': 0,
 'process_label': 'PwRelaxWorkChain',
 'process_state': 'finished',
 'metadata_inputs': {'base': {'pw': {'metadata': {'dry_run': False,
     'options': {'account': 'mr32',
      'withmpi': True,
      'resources': {'num_machines': 1,
       'num_cores_per_mpiproc': 1,
       'num_mpiprocs_per_machine': 128},
      'queue_name': 'debug',
      'append_text': '',
      'parser_name': 'quantumespresso.pw',
      'prepend_text': '',
      'input_filename': 'aiida.in',
      'output_filename': 'aiida.out',
      'scheduler_stderr': '_scheduler-stderr.txt',
      'scheduler_stdout': '_scheduler-stdout.txt',
      'mpirun_extra_params': [],
      'max_wallclock_seconds': 1800,
      'import_sys_environment': True,
      'submit_script_filename': '_aiidasubmit.sh',
      'custom_scheduler_commands': '#SBATCH --constraint=mc',
      'environment_variables_double_quotes': False},
     'call_link_label': 'CALL',
     'store_provenance': True}},
   'metadata': {'call_link_label': 'CALL', 'store_provenance': True}},
  'metadata': {'call_link_label': 'iteration_01_relax',
   'store_provenance': True}},
 'stepper_state_info': '3:results'}

I reproduced some parts of the source code along the pathway of get_builder_restart and the error is already raised at this point:

builder._update(self.base.links.get_incoming(link_type=(LinkType.INPUT_CALC, LinkType.INPUT_WORK)).nested())

So this happens already before the metadata inputs are merged:
builder._merge(self.get_metadata_inputs() or {})

Independent of whether this has been already discussed or not, I would be really interested in the reasoning behind this. So far, I couldn't really understand why this is happening, in case more nested levels are present. In the end, the update function is basically called iteratively, so I don't really see the difference.

Thanks in advance!

@sphuber
Copy link
Contributor

sphuber commented Feb 9, 2024

Thanks for the report @t-reents . Could you perhaps export the PwRelaxWorkChain in question and send it over to me? (You access to slack right?) I can then debug it better and find the fix. I think it may have to do that the code is merging the metadata after merging the normal inputs. But validation happens on every update of the builder and so at the first update, when the normal inputs are assigned, the metadata is incomplete and the validation triggers. Solution might be to merge normal inputs and metadata first, before assigning it to the builder in one go. But having a case to reproduce it would make this a lot easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants