Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XspectraParser: Parse parameters from CalcJob inputs #853

Merged
merged 1 commit into from
Oct 28, 2022

Conversation

PNOGillespie
Copy link
Contributor

Addresses a request made in PR: #852

Updates the method for the XspectraParser class to parse the following parameters from the input of the CalcJob class, rather than the stdout:

  • The polarisation (epsilon) vectors used in the calculation
  • The coordinate system used for the calculation (either crystallographic or cartesian)
  • Whether the calculation was a re-plot or a full calculation (xonly_plot)

Keyword plot_only in output_parameters is also changed to only_plot_spectrum in order to clarify its meaning.

The changes were made necessary as, during further testing involving calculation restarts, it was found that XSpectra stores the polarisation and k-vector used for the calculation in its save file using a cartesian basis regardless of the setting used in the calculation itself. When performing a re-start or re-plot calculation the code reports the polarisation/k-vector stored in the save file, on a cartesian basis, which causes confusion for the Parser - particularly in the case of a restarted calculation as the polarisation vector is then reported twice: once at the initial read of the input file and then again when the save file is read (and in potentially different coordinate systems, depending on calculation settings). It was therefore decided to move the parsing of these parameters from the raw output to just taking them directly from the CalcJob inputs.

@@ -87,6 +87,28 @@ def parse(self, **kwargs):
xy_data.set_y(y_arrays, y_names, y_units)

parsed_data, logs = parse_stdout_xspectra(filecontent=out_file, codename='XSpectra')

# Parse some additional info which the stdout does not reliably report
input_params = self.node.get_incoming().get_node_by_label('parameters').get_dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get an input node, node.inputs is easier to read and more elegant.

Suggested change
input_params = self.node.get_incoming().get_node_by_label('parameters').get_dict()
parameters = self.node.inputs.parameters.get_attribute('INPUT_XSPECTRA')

Since you also only ever use the INPUT_XSPECTRA, might as well get that subdictionary straight away.

input_params = self.node.get_incoming().get_node_by_label('parameters').get_dict()

epsilon_vector = [
float(input_params['INPUT_XSPECTRA'][f'xepsilon({n})']) for n in [1, 2, 3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
float(input_params['INPUT_XSPECTRA'][f'xepsilon({n})']) for n in [1, 2, 3]
float(parameters[f'xepsilon({n})']) for n in [1, 2, 3]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should maybe consider parameters.get(f'xepsilon({n})', default) here to prevent KeyError if the key is not defined. But don't know what a sensible default could be here.

Comment on lines 107 to 110
if 'xonly_plot' in input_params['INPUT_XSPECTRA']:
parsed_data['only_plot_spectrum'] = input_params['INPUT_XSPECTRA']['xonly_plot']
else:
parsed_data['only_plot_spectrum'] = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This common pattern can be simplified with the get(key, default) paradigm:

Suggested change
if 'xonly_plot' in input_params['INPUT_XSPECTRA']:
parsed_data['only_plot_spectrum'] = input_params['INPUT_XSPECTRA']['xonly_plot']
else:
parsed_data['only_plot_spectrum'] = False
parsed_data['only_plot_spectrum'] = parameters.get('xonly_plot', False)

Comment on lines 99 to 105
if 'xcoordcrys' in input_params['INPUT_XSPECTRA']:
if input_params['INPUT_XSPECTRA']['xcoordcrys']:
parsed_data['vector_coord_system'] = 'crystal'
else:
parsed_data['vector_coord_system'] = 'cartesian'
else:
parsed_data['vector_coord_system'] = 'crystal'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic seems a bit weird to me, is it correct? Basically vector_coord_system is getting set to crystal if xcoordcrys is not set or xcoordcrys is "truthy" (i.e. not None, empty string etc.). Otherwise it is cartesian.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see in the input doc that this is a boolean. Why convert to a string? Why not simply do

parsed_data['xcoordcrys'] = parameters.get('xcoordcrys', True)

Actually, now that I read all this. These are all input parameters defined by the user. Why do we "parse" them and add them to the output parameters? If they are already in the input, why not get them from there if needed?

Comment on lines 184 to 187
# We need to assign this before the loop over each line due to a bug spotted in the code
# when `only_plot=True`. Since the epsilon vector is reported before the value of
# `xonly_plot`, we need a starting value before the loop begins.
# parsed_data['plot_only'] = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code seems commented out? Is the comment actually correct? Or can this be removed?

Comment on lines 13 to 15
'xepsilon(1)': 1,
'xepsilon(2)': 0,
'xepsilon(3)': 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these keys now guaranteed to be in the parameters always? Because if not, the newly added line 95 in the parser will throw a KeyError if one of these is not defined.

tests/parsers/test_xspectra.py Outdated Show resolved Hide resolved
@PNOGillespie
Copy link
Contributor Author

Hi @sphuber, thanks for the review. I'll add an explanation here since there are a number of comments directed at the idea of parsing of the vector coordinate system and the polarisation vectors from the inputs - particularly:

Actually, now that I read all this. These are all input parameters defined by the user. Why do we "parse" them and add them to the output parameters? If they are already in the input, why not get them from there if needed?

The main reason for the changes proposed are for user convenience, as knowing the polarisation vector and its coordinate system used for the calculation is crucial information that (I felt, at least) should be added to the output parameters in order to make them easier to access. This is particularly useful for the polarisation vector since the Parser automatically compiles each value of xepsilon(n) individually into a list object. Similarly, stating in the output parameters whether the coordinate system is "crystal" or "cartesian" makes it more obvious which system was used, and stating if the calculation was a re-plot or a full calculation also helps.

Ultimately, the two options to address the issues with parsing were: (1) look in the inputs to get the parameters, to avoid the problems stated in parsing from the stdout, or (2) remove the parsing of these parameters altogether and leave it to the user to search for them manually.

Unless I've missed a trick somewhere, is there a way to access these inputs as easy as one could access the output parameter dictionary?

@sphuber
Copy link
Contributor

sphuber commented Oct 28, 2022

Thanks for the reply @PNOGillespie that helps a lot.

Unless I've missed a trick somewhere, is there a way to access these inputs as easy as one could access the output parameter dictionary?

Well, if the user defines them in the input parameters, then it would be as easy as node.inputs.parameters.get_attribute('INPUT_XSPECTRA')['xcoordcrys'].

But as I understand now, these parameters are not required to be defined in the input parameters because a lot of them have defaults that are automatically set by xspectra.x. So if you think, some of these input parameters are important to have for the user, I can see why you might want to have the parser add them in the output parameters.

However, that means you have to a be a lot more careful and I would recommend the following rules:

  • If an input is optional for xspectra.x don't assume in the parser that it will be in the input parameters (such as you have done for xepsilon({n})) because that will raise KeyErrors
  • I wouldn't change the name of the input. So if in xspectra.x it is called xcoordcrys I wouldn't use the vector_coord_system key. If you do this, you will have to start documenting all the conversions, which is a lot of work.

So to summarize I would change the parsing code to:

parameters = self.node.inputs.parameters.get_attribute('INPUT_XSPECTRA')
xepsilon_defaults = {
    '1': 0,
    '2': 0,
    '3': 1,
}
parsed_data['xepsilon'] = [float(parameters.get(f'xepsilon({n})', xepsilon_defaults[n]) for n in (1, 2, 3)]
parsed_data['xcoordcrys'] = parameters.get('xcoordcrys', True)
parsed_data['xonly_plot'] = parameters.get('xonly_plot', False

@PNOGillespie
Copy link
Contributor Author

Thanks, I think these are reasonable suggestions for the Parser. I'll bundle these into a commit with your other suggestions and push it shortly.

But as I understand now, these parameters are not required to be defined in the input parameters because a lot of them have defaults that are automatically set by xspectra.x. So if you think, some of these input parameters are important to have for the user, I can see why you might want to have the parser add them in the output parameters.

Each of these do have defaults, though one thing to bear in mind is that most analysis using XSpectra involves computing the spectra for different polarisation vectors - particularly in the case of crystal systems with low symmetry (e.g. Monoclinic) or ones which contain defects since the spectrum will not be invariant of polarisation vector (see past works on SiO2 & La2CuO4 as well as NiO) - so it would be reasonable to expect the user to always define this themselves. In particular, calculations for different polarisation vectors and analysing polarisation dependence is part of the XspectraCoreWorkChain logic and is a key feature that XSpectra has in its favour.

In any case, I'll make sure the defaults are set properly

PNOGillespie added a commit to PNOGillespie/aiida-quantumespresso that referenced this pull request Oct 28, 2022
Requested for aiidateam#853

For parameters Parsed from the calculation input, the commit changes
their keys to those used in the input parameters:
* `epsilon_vectors` => `xepsilon`
* `vector_coord_system` => `xcoordcrys`
* `only_plot_spectrum` => `xonly_plot`

Also adds default values for these keys which match those set by
`xspectra.x` and updates the relevant parser tests.
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @PNOGillespie , just one final problem that would actually cause exceptions. Once you fix this, I will merge.

I just want to still note that with this new method of parsing the parameters from the inputs and resorting to default if not defined, it means that the parser can set the wrong defaults if the defaults in xspectra.x change. For example if QE 7.0 changes default of xcoordcrys to False, then the Parser would still set True if not explicitly defined by the user. Again, not sure how likely the defaults are subject to change, but something to be aware of.

src/aiida_quantumespresso/parsers/xspectra.py Outdated Show resolved Hide resolved
@PNOGillespie
Copy link
Contributor Author

Thanks again @sphuber.

I just want to still note that with this new method of parsing the parameters from the inputs and resorting to default if not defined, it means that the parser can set the wrong defaults if the defaults in xspectra.x change. For example if QE 7.0 changes default of xcoordcrys to False, then the Parser would still set True if not explicitly defined by the user. Again, not sure how likely the defaults are subject to change, but something to be aware of.

I understand, it's something I bear in mind while I'm working on the implementation. If it's any comfort, development work on xspectra.x itself seems to have stopped quite a while ago (which maybe explains why it doesn't use an xml file as well) so it's likely that the FORTRAN code is frozen in its current state and won't change for some time yet.

sphuber
sphuber previously approved these changes Oct 28, 2022
sphuber
sphuber previously approved these changes Oct 28, 2022
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @PNOGillespie

Updates the method for the `XspectraParser` class to parse the following
parameters from the input of the `CalcJob` class rather than the stdout:

 * The polarisation (epsilon) vectors used in the calculation
 * The coordinate system used for the calculation
   (either `crystallographic` or `cartesian`)
 * Whether the calculation was a re-plot or a full calculation (`xonly_plot`)

The keyword `plot_only` in the `output_parameters` is also changed to
`only_plot_spectrum` in order to clarify its meaning.

The changes were made necessary as, during further testing involving
calculation restarts, it was found that XSpectra stores the polarisation
and k-vector used for the calculation in its save file using a cartesian
basis regardless of the setting used in the calculation itself. When
performing a re-start or re-plot calculation the code reports the
polarisation/k-vector stored in the save file, on a cartesian basis,
which causes confusion for the Parser - particularly in the case of a
restarted calculation as the polarisation vector is then reported twice:
once at the initial read of the input file and then again when the save
file is read (and in potentially different coordinate systems, depending
on calculation settings). It was therefore decided to move the parsing
of these parameters from the raw output to just taking them directly
from the `CalcJob` inputs.
@sphuber sphuber merged commit a6fb502 into aiidateam:main Oct 28, 2022
@PNOGillespie PNOGillespie deleted the fix/xspectra-parsers branch October 28, 2022 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants