`XspectraParser`: Parse parameters from CalcJob inputs #853

PNOGillespie · 2022-10-27T17:41:05Z

Addresses a request made in PR: #852

Updates the method for the XspectraParser class to parse the following parameters from the input of the CalcJob class, rather than the stdout:

The polarisation (epsilon) vectors used in the calculation
The coordinate system used for the calculation (either crystallographic or cartesian)
Whether the calculation was a re-plot or a full calculation (xonly_plot)

Keyword plot_only in output_parameters is also changed to only_plot_spectrum in order to clarify its meaning.

The changes were made necessary as, during further testing involving calculation restarts, it was found that XSpectra stores the polarisation and k-vector used for the calculation in its save file using a cartesian basis regardless of the setting used in the calculation itself. When performing a re-start or re-plot calculation the code reports the polarisation/k-vector stored in the save file, on a cartesian basis, which causes confusion for the Parser - particularly in the case of a restarted calculation as the polarisation vector is then reported twice: once at the initial read of the input file and then again when the save file is read (and in potentially different coordinate systems, depending on calculation settings). It was therefore decided to move the parsing of these parameters from the raw output to just taking them directly from the CalcJob inputs.

sphuber · 2022-10-27T22:01:28Z

src/aiida_quantumespresso/parsers/xspectra.py

@@ -87,6 +87,28 @@ def parse(self, **kwargs):
        xy_data.set_y(y_arrays, y_names, y_units)

        parsed_data, logs = parse_stdout_xspectra(filecontent=out_file, codename='XSpectra')
+
+        # Parse some additional info which the stdout does not reliably report
+        input_params = self.node.get_incoming().get_node_by_label('parameters').get_dict()


To get an input node, node.inputs is easier to read and more elegant.

Suggested change

input_params = self.node.get_incoming().get_node_by_label('parameters').get_dict()

parameters = self.node.inputs.parameters.get_attribute('INPUT_XSPECTRA')

Since you also only ever use the INPUT_XSPECTRA, might as well get that subdictionary straight away.

sphuber · 2022-10-27T22:01:57Z

src/aiida_quantumespresso/parsers/xspectra.py

+        input_params = self.node.get_incoming().get_node_by_label('parameters').get_dict()
+
+        epsilon_vector = [
+            float(input_params['INPUT_XSPECTRA'][f'xepsilon({n})']) for n in [1, 2, 3]


Suggested change

float(input_params['INPUT_XSPECTRA'][f'xepsilon({n})']) for n in [1, 2, 3]

float(parameters[f'xepsilon({n})']) for n in [1, 2, 3]

Should maybe consider parameters.get(f'xepsilon({n})', default) here to prevent KeyError if the key is not defined. But don't know what a sensible default could be here.

sphuber · 2022-10-27T22:05:38Z

src/aiida_quantumespresso/parsers/xspectra.py

+        if 'xonly_plot' in input_params['INPUT_XSPECTRA']:
+            parsed_data['only_plot_spectrum'] = input_params['INPUT_XSPECTRA']['xonly_plot']
+        else:
+            parsed_data['only_plot_spectrum'] = False


This common pattern can be simplified with the get(key, default) paradigm:

Suggested change

if 'xonly_plot' in input_params['INPUT_XSPECTRA']:

parsed_data['only_plot_spectrum'] = input_params['INPUT_XSPECTRA']['xonly_plot']

else:

parsed_data['only_plot_spectrum'] = False

parsed_data['only_plot_spectrum'] = parameters.get('xonly_plot', False)

sphuber · 2022-10-27T22:10:11Z

src/aiida_quantumespresso/parsers/xspectra.py

+        if 'xcoordcrys' in input_params['INPUT_XSPECTRA']:
+            if input_params['INPUT_XSPECTRA']['xcoordcrys']:
+                parsed_data['vector_coord_system'] = 'crystal'
+            else:
+                parsed_data['vector_coord_system'] = 'cartesian'
+        else:
+            parsed_data['vector_coord_system'] = 'crystal'


This logic seems a bit weird to me, is it correct? Basically vector_coord_system is getting set to crystal if xcoordcrys is not set or xcoordcrys is "truthy" (i.e. not None, empty string etc.). Otherwise it is cartesian.

I see in the input doc that this is a boolean. Why convert to a string? Why not simply do

parsed_data['xcoordcrys'] = parameters.get('xcoordcrys', True)

Actually, now that I read all this. These are all input parameters defined by the user. Why do we "parse" them and add them to the output parameters? If they are already in the input, why not get them from there if needed?

sphuber · 2022-10-27T22:11:00Z

src/aiida_quantumespresso/parsers/xspectra.py

+    # We need to assign this before the loop over each line due to a bug spotted in the code
+    # when `only_plot=True`. Since the epsilon vector is reported before the value of
+    # `xonly_plot`, we need a starting value before the loop begins.
+    # parsed_data['plot_only'] = False


The code seems commented out? Is the comment actually correct? Or can this be removed?

sphuber · 2022-10-27T22:14:36Z

tests/parsers/test_xspectra.py

+            'xepsilon(1)': 1,
+            'xepsilon(2)': 0,
+            'xepsilon(3)': 0,


Are these keys now guaranteed to be in the parameters always? Because if not, the newly added line 95 in the parser will throw a KeyError if one of these is not defined.

tests/parsers/test_xspectra.py

PNOGillespie · 2022-10-28T08:29:05Z

Hi @sphuber, thanks for the review. I'll add an explanation here since there are a number of comments directed at the idea of parsing of the vector coordinate system and the polarisation vectors from the inputs - particularly:

Actually, now that I read all this. These are all input parameters defined by the user. Why do we "parse" them and add them to the output parameters? If they are already in the input, why not get them from there if needed?

The main reason for the changes proposed are for user convenience, as knowing the polarisation vector and its coordinate system used for the calculation is crucial information that (I felt, at least) should be added to the output parameters in order to make them easier to access. This is particularly useful for the polarisation vector since the Parser automatically compiles each value of xepsilon(n) individually into a list object. Similarly, stating in the output parameters whether the coordinate system is "crystal" or "cartesian" makes it more obvious which system was used, and stating if the calculation was a re-plot or a full calculation also helps.

Ultimately, the two options to address the issues with parsing were: (1) look in the inputs to get the parameters, to avoid the problems stated in parsing from the stdout, or (2) remove the parsing of these parameters altogether and leave it to the user to search for them manually.

Unless I've missed a trick somewhere, is there a way to access these inputs as easy as one could access the output parameter dictionary?

sphuber · 2022-10-28T08:53:33Z

Thanks for the reply @PNOGillespie that helps a lot.

Unless I've missed a trick somewhere, is there a way to access these inputs as easy as one could access the output parameter dictionary?

Well, if the user defines them in the input parameters, then it would be as easy as node.inputs.parameters.get_attribute('INPUT_XSPECTRA')['xcoordcrys'].

But as I understand now, these parameters are not required to be defined in the input parameters because a lot of them have defaults that are automatically set by xspectra.x. So if you think, some of these input parameters are important to have for the user, I can see why you might want to have the parser add them in the output parameters.

However, that means you have to a be a lot more careful and I would recommend the following rules:

If an input is optional for xspectra.x don't assume in the parser that it will be in the input parameters (such as you have done for xepsilon({n})) because that will raise KeyErrors
I wouldn't change the name of the input. So if in xspectra.x it is called xcoordcrys I wouldn't use the vector_coord_system key. If you do this, you will have to start documenting all the conversions, which is a lot of work.

So to summarize I would change the parsing code to:

parameters = self.node.inputs.parameters.get_attribute('INPUT_XSPECTRA')
xepsilon_defaults = {
    '1': 0,
    '2': 0,
    '3': 1,
}
parsed_data['xepsilon'] = [float(parameters.get(f'xepsilon({n})', xepsilon_defaults[n]) for n in (1, 2, 3)]
parsed_data['xcoordcrys'] = parameters.get('xcoordcrys', True)
parsed_data['xonly_plot'] = parameters.get('xonly_plot', False

PNOGillespie · 2022-10-28T09:24:23Z

Thanks, I think these are reasonable suggestions for the Parser. I'll bundle these into a commit with your other suggestions and push it shortly.

But as I understand now, these parameters are not required to be defined in the input parameters because a lot of them have defaults that are automatically set by xspectra.x. So if you think, some of these input parameters are important to have for the user, I can see why you might want to have the parser add them in the output parameters.

Each of these do have defaults, though one thing to bear in mind is that most analysis using XSpectra involves computing the spectra for different polarisation vectors - particularly in the case of crystal systems with low symmetry (e.g. Monoclinic) or ones which contain defects since the spectrum will not be invariant of polarisation vector (see past works on SiO₂ & La₂CuO₄ as well as NiO) - so it would be reasonable to expect the user to always define this themselves. In particular, calculations for different polarisation vectors and analysing polarisation dependence is part of the XspectraCoreWorkChain logic and is a key feature that XSpectra has in its favour.

In any case, I'll make sure the defaults are set properly

Requested for aiidateam#853 For parameters Parsed from the calculation input, the commit changes their keys to those used in the input parameters: * `epsilon_vectors` => `xepsilon` * `vector_coord_system` => `xcoordcrys` * `only_plot_spectrum` => `xonly_plot` Also adds default values for these keys which match those set by `xspectra.x` and updates the relevant parser tests.

sphuber

Thanks a lot @PNOGillespie , just one final problem that would actually cause exceptions. Once you fix this, I will merge.

I just want to still note that with this new method of parsing the parameters from the inputs and resorting to default if not defined, it means that the parser can set the wrong defaults if the defaults in xspectra.x change. For example if QE 7.0 changes default of xcoordcrys to False, then the Parser would still set True if not explicitly defined by the user. Again, not sure how likely the defaults are subject to change, but something to be aware of.

src/aiida_quantumespresso/parsers/xspectra.py

PNOGillespie · 2022-10-28T14:38:10Z

Thanks again @sphuber.

I just want to still note that with this new method of parsing the parameters from the inputs and resorting to default if not defined, it means that the parser can set the wrong defaults if the defaults in xspectra.x change. For example if QE 7.0 changes default of xcoordcrys to False, then the Parser would still set True if not explicitly defined by the user. Again, not sure how likely the defaults are subject to change, but something to be aware of.

I understand, it's something I bear in mind while I'm working on the implementation. If it's any comfort, development work on xspectra.x itself seems to have stopped quite a while ago (which maybe explains why it doesn't use an xml file as well) so it's likely that the FORTRAN code is frozen in its current state and won't change for some time yet.

sphuber

Thanks a lot @PNOGillespie

Updates the method for the `XspectraParser` class to parse the following parameters from the input of the `CalcJob` class rather than the stdout: * The polarisation (epsilon) vectors used in the calculation * The coordinate system used for the calculation (either `crystallographic` or `cartesian`) * Whether the calculation was a re-plot or a full calculation (`xonly_plot`) The keyword `plot_only` in the `output_parameters` is also changed to `only_plot_spectrum` in order to clarify its meaning. The changes were made necessary as, during further testing involving calculation restarts, it was found that XSpectra stores the polarisation and k-vector used for the calculation in its save file using a cartesian basis regardless of the setting used in the calculation itself. When performing a re-start or re-plot calculation the code reports the polarisation/k-vector stored in the save file, on a cartesian basis, which causes confusion for the Parser - particularly in the case of a restarted calculation as the polarisation vector is then reported twice: once at the initial read of the input file and then again when the save file is read (and in potentially different coordinate systems, depending on calculation settings). It was therefore decided to move the parsing of these parameters from the raw output to just taking them directly from the `CalcJob` inputs.

sphuber reviewed Oct 27, 2022

View reviewed changes

PNOGillespie requested a review from sphuber October 28, 2022 13:44

sphuber requested changes Oct 28, 2022

View reviewed changes

src/aiida_quantumespresso/parsers/xspectra.py Outdated Show resolved Hide resolved

sphuber previously approved these changes Oct 28, 2022

View reviewed changes

sphuber dismissed their stale review via fedf953 October 28, 2022 15:02

sphuber force-pushed the fix/xspectra-parsers branch from 2cd742b to fedf953 Compare October 28, 2022 15:02

sphuber self-requested a review October 28, 2022 15:04

sphuber previously approved these changes Oct 28, 2022

View reviewed changes

sphuber dismissed their stale review via 4db2759 October 28, 2022 15:07

sphuber force-pushed the fix/xspectra-parsers branch from fedf953 to 4db2759 Compare October 28, 2022 15:07

sphuber self-requested a review October 28, 2022 15:14

sphuber approved these changes Oct 28, 2022

View reviewed changes

sphuber merged commit a6fb502 into aiidateam:main Oct 28, 2022

PNOGillespie deleted the fix/xspectra-parsers branch October 28, 2022 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`XspectraParser`: Parse parameters from CalcJob inputs #853

`XspectraParser`: Parse parameters from CalcJob inputs #853

PNOGillespie commented Oct 27, 2022

sphuber Oct 27, 2022

sphuber Oct 27, 2022

sphuber Oct 27, 2022

sphuber Oct 27, 2022

sphuber Oct 27, 2022

sphuber Oct 27, 2022

sphuber Oct 27, 2022

sphuber Oct 27, 2022

PNOGillespie commented Oct 28, 2022

sphuber commented Oct 28, 2022

PNOGillespie commented Oct 28, 2022

sphuber left a comment

PNOGillespie commented Oct 28, 2022

sphuber left a comment

	input_params = self.node.get_incoming().get_node_by_label('parameters').get_dict()
	parameters = self.node.inputs.parameters.get_attribute('INPUT_XSPECTRA')

	float(input_params['INPUT_XSPECTRA'][f'xepsilon({n})']) for n in [1, 2, 3]
	float(parameters[f'xepsilon({n})']) for n in [1, 2, 3]

XspectraParser: Parse parameters from CalcJob inputs #853

XspectraParser: Parse parameters from CalcJob inputs #853

Conversation

PNOGillespie commented Oct 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PNOGillespie commented Oct 28, 2022

sphuber commented Oct 28, 2022

PNOGillespie commented Oct 28, 2022

sphuber left a comment

Choose a reason for hiding this comment

PNOGillespie commented Oct 28, 2022

sphuber left a comment

Choose a reason for hiding this comment

`XspectraParser`: Parse parameters from CalcJob inputs #853

`XspectraParser`: Parse parameters from CalcJob inputs #853