Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ REFACTOR: move basic parsing into BaseParser class #756

Merged
merged 9 commits into from
Jun 10, 2023

Conversation

mbercx
Copy link
Member

@mbercx mbercx commented Nov 23, 2021

Fixes #750
Fixes #797

Large refactoring of the basic parsing features to deduplicate code and rely on regular expressions via re instead of iterations over lines. The main changes are:

  • The Parser class is renamed to BaseParser class to avoid confusion with the aiida.core Parser class.
  • The BaseParser.parse_stdout_from_retrieved method can be used by all subclasses to obtain the stdout, basic parsed data and logs from retrieving and parsing the stdout.
  • The error and warning messages are defined as class attributes. Base error messages are defined on the BaseParser class, which can be extended on the subclasses via the class_[error,warning]_map attributes.
  • A BaseParser.check_base_errors() method is added, which checks for basic errors that indicate the stdout cannot be parsed for more data.
  • Shared exit codes are moved to the NamelistsCalculation spec from that of the subclasses.
  • Several changes in the logic of returning exit codes and emitting the logs. Some of the discussions on these topics have not been fully concluded, and will be revisited when integrating the BaseParser into the PwParser, see:
    Revisit BaseParser when integrating into PwParser #953

@mbercx
Copy link
Member Author

mbercx commented Jan 27, 2022

Some timing information for a very long aiida.out file (~140k lines) without errors:

  • Old parse_output_base:
14.4 ms ± 297 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  • New parse_output_base:
10.7 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

and a very short (131 lines) file with errors:

  • Old parse_output_base:
11.4 µs ± 262 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
  • New parse_output_base:
23.2 µs ± 271 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

So for large files the new parser is actually faster, but for small files the parsing is slower. We're taking microseconds here though, so I don't think this is a massive issue? At least I think it's worth the simplified code.

@mbercx mbercx force-pushed the fix/748/base-stdout-parsing branch from b08b206 to 0522e9e Compare January 28, 2022 02:19
@mbercx mbercx changed the title WIP: refactoring base stdout parsing ♻️ REFACTOR: move basic parsing into BaseParser class Jan 28, 2022
@mbercx mbercx marked this pull request as ready for review January 28, 2022 02:49
@mbercx mbercx requested a review from sphuber January 28, 2022 02:50

:param stdout: the stdout content as a string.
"""
out_info_dict['out_file'] = stdout.split('\n')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know using the out_info_dict is messy as hell, but that'll be fixed in #749. :)

@mbercx
Copy link
Member Author

mbercx commented Jan 28, 2022

@sphuber currently the PwParser and CpParser don't reuse the base parsing methods yet, but these are rather large and messy parsers I'd like to clean up in separate PRs for v4.0.

@mbercx mbercx force-pushed the fix/748/base-stdout-parsing branch 2 times, most recently from e27573f to 92d7bf3 Compare January 28, 2022 10:16
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @mbercx great work! This really cleans things up nicely, just have some minor requests/suggestions

aiida_quantumespresso/calculations/projwfc.py Outdated Show resolved Hide resolved
aiida_quantumespresso/calculations/pw2gw.py Outdated Show resolved Hide resolved
aiida_quantumespresso/parsers/base.py Outdated Show resolved Hide resolved
aiida_quantumespresso/parsers/base.py Outdated Show resolved Hide resolved
aiida_quantumespresso/parsers/base.py Outdated Show resolved Hide resolved
@@ -10,6 +10,6 @@ dos:
- states/eV
- states/eV
parameters:
code_version: v.6.4.1
code_version: 6.4.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep this the same to not break backwards-compatibility?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, perhaps. I preferred removing the v., because it's cleaner and more consistent with e.g. the plugin version added to the metadata. But it's not a big deal, so probably backwards compatibility is the stronger argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had been reverted to also include the v.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, I'd be less shy about breaking backwards compatibility regarding this version number. It's probably not being used at all atm, and removing the v. is cleaner in my view.

tests/parsers/test_neb.py Outdated Show resolved Hide resolved
aiida_quantumespresso/parsers/base.py Outdated Show resolved Hide resolved
@mbercx mbercx force-pushed the fix/748/base-stdout-parsing branch from 92d7bf3 to 6373e0b Compare January 28, 2022 12:22
@mbercx mbercx force-pushed the fix/748/base-stdout-parsing branch 2 times, most recently from 0c8c48a to 3edacd7 Compare May 6, 2023 09:32
@mbercx
Copy link
Member Author

mbercx commented May 6, 2023

@sphuber she's almost ready, just a bit more cleaning. I would also:

  1. Remove the v. for the code versions.
  2. Remote the wall_time. It's a useless string and there is wall_time_seconds.


if filename_frequencies not in retrieved.base.repository.list_object_names():
return self.exit(self.exit_codes.ERROR_OUTPUT_STDOUT_READ)
Copy link
Member Author

@mbercx mbercx May 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removed on purpose:

  1. I don't think this ever happens if the stdout is "complete"
  2. It should use its own exit code, and I don't want to define one if [1] is true.

@mbercx mbercx force-pushed the fix/748/base-stdout-parsing branch 4 times, most recently from c9fe647 to 4e0caf7 Compare May 7, 2023 06:53

:param exit_code: an `ExitCode`
:return: the exit code
"""
if logs:
self.emit_logs(logs)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering what to do here. I liked the idea of always passing the logs to the exit method, so the logs are always emitted before the parsing stops. But this lead to a few questions:

  1. What does the emit_logs method really do? It seems to only use getattr, how does it really "emit" anything?
  2. What's the purpose of this logger exactly?
  3. this reminds me, perhaps we should end each parse method with an explicit return of a zero exit code via the exit method? Explicit is better than implicit, and then it passes through this method which emits the logs (now).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the emit_logs method really do? It seems to only use getattr, how does it really "emit" anything?

Believe it or not, it emits log messages 😄 It is easy to miss but the line

getattr(self.logger, level)(stripped)

will retrieve a method of the Logger instance stored at self.logger. Since level is a log-level in lower-case, e.g., warning or error, getattr(self.logger, level) will return the bound method warning or error in that case. We then call it passing the message ...(stripped). So essentially it is doing something like self.logger.warning(stripped).

What's the purpose of this logger exactly?

It is configured with the associated CalcJobNode whose outputs are being parsed and so any log messages are stored as Log entries linked to the node. They will show up in the output of verdi process report for example. This is useful information for the user.

this reminds me, perhaps we should end each parse method with an explicit return of a zero exit code via the exit method? Explicit is better than implicit, and then it passes through this method which emits the logs (now).

Yes, but we should then also remember to first check whether an exit status was already set by the scheduler plugin. For example if the scheduler parser detected the job was killed due to an OOW error, the associated exit status (I believe 120) is already set on the node. So before we return ExitCode(0), we should check if self.node.exit_status is not None in which case we probably want to return None to keep that status, because returning ExitCode(0) will override it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the clarification, @sphuber! 🙏

So essentially it is doing something like self.logger.warning(stripped).

Right, of course, it's quite clear after this explanation. I guess for this kind of use case, a CalcJobNode.get_logger(level) method would be more readable than a logger property. ^^

Now that I understand better, I'd be inclined to merge the emit_logs and exit methods into one method with signature:

def exit(self, logs: AttributeDict = None, exit_code: ExitCode = None) -> ExitCode:

which:

  1. Emits any logs left in the logs input, if provided.
  2. If exit_code has been provided, emit its message via self.logger.
  3. If not, check if there is already an exit_status set on the node and log this message (or will this have been done already?)
  4. Return
    • exit_code if it was provided.
    • None if the node already has an exit_status.
    • ExitCode(0) else

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, of course, it's quite clear after this explanation. I guess for this kind of use case, a CalcJobNode.get_logger(level) method would be more readable than a logger property. ^^

Don't think that would be a correct name, because you cannot return a logger for a particular level, you log a message at a particular level through a logger. That being said, I think you can easily make the current code more readable. I think the Logger class has a log method where you can pass the desired level as an argument, so that line would become self.logger.log(level, stripped). Pretty sure that should work.

Now that I understand better, I'd be inclined to merge the emit_logs and exit methods into one method with signature:

I would keep emit_logs as a separate function, because it can be useful to call it directly elsewhere still. Fine to move the call in all parsers to the exit method.

If not, check if there is already an exit_status set on the node and log this message (or will this have been done already?)

This will already have been done by the engine before calling the Parser. So nothing to do here but to "keep" it if the parser itself cannot provide a more specific exit code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adapted the method as discussed, see 6529ea6.

I would keep emit_logs as a separate function, because it can be useful to call it directly elsewhere still. Fine to move the call in all parsers to the exit method.

If we always end the parse method with the exit method for clarity, and always emit all the logs there, what would be the use case of calling the emit_logs method separate?

@mbercx
Copy link
Member Author

mbercx commented May 7, 2023

@sphuber ok, I refactored my initial refactor quite a bit, so perhaps this needs a full review again, apologies. It was also quite a bit of work to fix all the merge conflicts after rebase, so I would now really push to get this PR in.

A couple of quick notes that I decided upon:

  • The code version in the output_parameters shouldn't have a v. prefix. It's redundant.
  • The wall_time output is similarly redundant and less useful than wall_time_seconds.

Now a more elaborate design discussion on the changes here and TODO's for parsers.

Keeping the parse method readable

Understanding what the parser parses when and which exit codes it returns at what point is difficult with many of the parser classes. We want to make sure that:

  1. The "best" exit code is returned, i.e. the one that is most specific or allows for useful error handling.
  2. As much data as possible is parsed and added to the outputs of the calculation job.

To improve clarity and make these two requirements possible, I think the parse method should:

  1. Not contain any actual parsing. Any real parsing should happen in a separate method.
  2. Instantiate a single logging container that is passed through these parsing methods.
  3. Take care of decisions on when to return which exit code.
  4. Output data at the appropriate time.

Note that this PR does not fully implement these ideas for all parser classes. It's already big enough as it is. But it already shows how this would be possible for e.g. the open_grid.x parser.

Other improvements

  • Exit code registry: Although this PR already make an effort of making sure exit codes are consistent across the calculation job classes, it's still tricky to keep track of all the exit codes they are used any why they are there or have a certain number. I think we should create an (automated) exit code registry in the documentation where we keep track of all this. Although two calculation jobs can in principle have the same exit code for different issues, we should probably also avoid this.

  • Stop using strings for exit codes: Now we use strings to keep track of the exit codes. I think this is a bit fragile and doesn't allow for nice features like tab-completion. It would be better to have a different approach, but I'd have to think a bit on how to do this.

  • PhCalculation and PpCalculation aren't subclasses from NamelistCalculation, and hence have to redefine the basic exit codes. So do PwCalculation and CpCalculation. There may be reasons to treat these calculation jobs differently, but we should investigate if we can't further consolidate this calculation job hierarchy.

@mbercx mbercx requested a review from sphuber May 7, 2023 07:55
@mbercx mbercx force-pushed the fix/748/base-stdout-parsing branch from 4e0caf7 to b3655a9 Compare May 10, 2023 06:17
@mbercx
Copy link
Member Author

mbercx commented May 10, 2023

Because of the size of the PR, and the fact that even though the tests pasts it can definitely break things, perhaps:

@qiaojunfeng
Copy link
Collaborator

* @qiaojunfeng can you have a look at the `open_grid.x` parsing changes?

Had a quick look and seems good to me!
Anyhow open_grid is very simple so if the tests passed then I wouldn't worry.

@mbercx mbercx force-pushed the fix/748/base-stdout-parsing branch from 6529ea6 to 765e0de Compare May 12, 2023 08:30
"""Read and parse the ``stdout`` content of a Quantum ESPRESSO calculation.

:param logs: Logging container that will be updated during parsing.
:returns: size 3 tuple: (``stdout`` content, parsed data, updated logs).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return the logs if they are passed in by reference and so are updated in place? Typically not a huge fan of tuple return values as it is often not immediately clear what is being returned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see your point. To me, I like making it clear that the logs input is updated by also passing it as an output. I guess it also related to the question I pose here, and makes me inclined to simply generate separate logging containers in each parser method, but that also feels than ideal...

Maybe we should keep track of the logs on a class variable? That seems sensible. Initially I also wasn't sure why we needed this extra logging item because we can just actually use the logger. Then I figured this is so we can keep track of errors and then pick and choose which one to return as an exit code. But can't we just inspect the logger as well? Feels like we doing double work.

Re returning tuples: I can see your point, but what is the alternative for returning multiple outputs? A dictionary is even more opaque to me. Also: with the hover features in VSCode it's quite clear what is being returned: 🙃

Screenshot 2023-05-18 at 20 00 20

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might not believe it, but I am actually VSCode myself now. But I think that in general making code design decisions based on a feature of the particular editor that we ourselves just happen to be using, is a bad idea, because not everyone is using the same tools. But fine, let's keep this as is.

Copy link
Member Author

@mbercx mbercx Jun 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good-palpatine

src/aiida_quantumespresso/parsers/base.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/parsers/base.py Show resolved Hide resolved
src/aiida_quantumespresso/parsers/base.py Show resolved Hide resolved
src/aiida_quantumespresso/parsers/base.py Outdated Show resolved Hide resolved
'ERROR_OUTPUT_STDOUT_PARSE'
]:
if exit_code in logs.error:
exception = logs.error[logs.index(exit_code) + 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting approach. I can see how this is the most straightforward as you don't have to adapt the structure of the container and can just keep a list, but it is quite fragile I think. Ideally, the container should become a dictionary, with the key the error code, and the value can be a dictionary with additional information, for example argument to format the exit message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, when implementing this I was already 95% sure it wouldn't pass review. Yes, I think adapting the logging container probably makes more sense. Let me resolve the other issues first and give this some thought, also in relation to the other discussions on the container. It's clear we should rethink this aspect.


def exit(self, exit_code):
"""Log the exit message of the give exit code with level `ERROR` and return the exit code.
def check_base_errors(self, logs: AttributeDict) -> Optional[ExitCode]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be problems with this approach in the future. For example the PwParser, it may be possible that the stdout could not be parsed, but the XML can and so we can still parse and return useful data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in the end the Parser implementation still has full control over what to do, no? This method simply does these basic checks and does some extra formatting. It might not be necessary anymore if we redesign the structure of the logging container, however.

Comment on lines +155 to 164
if exit_code is not None:
self.logger.error(exit_code.message)
elif self.node.exit_status is not None:
exit_code = ExitCode(self.node.exit_status, self.node.exit_message)
else:
exit_code = ExitCode(0)

return exit_code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is not correct. Imagine the following situation: job gets submitted, but the node crashes. This is parsed by the SlurmScheduler and it sets 140:ERROR_NODE_FAILURE on the CalcJobNode. Since the job crashed, stdout will either be missing or probably incomplete and so the base output parser will return the ERROR_STDOUT_MISSING exit code (or comparable) and in this logic, this will therefore always override the scheduler status. But I would argue that the scheduler exit code is more useful in this case and we definitely want to keep it.

I think ultimately, each concrete parser implementation needs to parse the output and check itself whether a scheduler exit code already exists and determine whether it can provide a more specific/informative exit code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently not causing the tests to fail, because there only is such a test for PwParser and there you haven't made use of this method since you wanted to not touch it. But this also means that we probably won't be able to use it. Maybe we can leave this for now then as it is, but we will have to anticipate changing it in the future when other parsers will want to become aware of scheduler exit codes and not override them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. I think the logic was indeed that in case the developer wants to return a scheduler exit code they should look for and return it in the parse method. This one just checks that in case the developer doesn't want to do this for whatever reason, the exit code is still added in case self.exit() is called at the end.

But I suppose scheduler failures will almost always be more informative/useful?

src/aiida_quantumespresso/parsers/dos.py Outdated Show resolved Hide resolved
@mbercx mbercx force-pushed the fix/748/base-stdout-parsing branch from fa3d23f to c88a681 Compare June 4, 2023 08:54
@sphuber
Copy link
Contributor

sphuber commented Jun 4, 2023

@mbercx did you want to get this merged before the release? If so, is there anything to be done from your side? Otherwise I will approve. We can deal with the open points later. I also would like to have this done so I can finalize my refactor PR for PwParser.

@mbercx
Copy link
Member Author

mbercx commented Jun 8, 2023

@mbercx did you want to get this merged before the release? If so, is there anything to be done from your side? Otherwise I will approve. We can deal with the open points later. I also would like to have this done so I can finalize my refactor PR for PwParser.

I've opened #953 to keep track of the discussions that are left here. I think it makes sense to simply reconsider these decisions when integrating the BaseParser in the PwParser, we'll have a clearer idea of what to do during this process, I think.

@mbercx
Copy link
Member Author

mbercx commented Jun 8, 2023

@sphuber it seems the autoapi extension doesn't always play nice with type hints. To avoid going down this rabbit hole here, I simply added the problematic classes to nitpick_ignore. I'll probably look into using autoapi2 soon anyways:

https://pypi.org/project/sphinx-autoapi2/

@mbercx mbercx requested a review from sphuber June 8, 2023 19:00
@sphuber
Copy link
Contributor

sphuber commented Jun 9, 2023

Thanks @mbercx let's get this merged

@mbercx mbercx merged commit 1c223c7 into aiidateam:main Jun 10, 2023
@mbercx mbercx deleted the fix/748/base-stdout-parsing branch June 10, 2023 08:28
@yakutovicha
Copy link
Collaborator

Hi 👋

Just wondering, are you guys planning on making a major release after this PR is merged?

There were API changes, as far as I can see.

@sphuber
Copy link
Contributor

sphuber commented Jun 20, 2023

We were getting close to releasing v4.4.0: #951 @mbercx want to finish that one up and release it?

@mbercx
Copy link
Member Author

mbercx commented Jun 22, 2023

want to finish that one up and release it?

Yessir! Sorry for the slow response.

There were API changes, as far as I can see.

The internal parsing API was changed quite a bit, but I think the vast majority of the changes are backwards incompatible. Looking at the data regression files for the tests, the only changes in parsing output is the code_version and wall_time_seconds/wall_time. Which API changes are you worried about breaking e.g. the QEapp?

@yakutovicha
Copy link
Collaborator

yakutovicha commented Jun 22, 2023

Which API changes are you worried about breaking e.g. the QEapp?

Mostly about the pp parser (that is the one I noticed). The parse_gaussian method, for instance. Since the name doesn't start with an underscore, it is part of the public API of the class, strictly speaking.

I didn't closely look at the PR, but there might be others.

However, it depends on how closely you want to follow the semantic versioning principles.

@mbercx
Copy link
Member Author

mbercx commented Jun 22, 2023

However, it depends on how closely you want to follow the semantic versioning principles.

Yeah, that's fair. Perhaps we should be clearer in defining the public API. I don't expect users to use the Parser classes directly, since in many cases you need to parse from a node (which perhaps we should change over time, but then you definitely break the API).

The parser classes are also quite messy, and we've been working on improving that. Fixing this technical debt would be more important than strictly maintaining backwards compatibility for a part of the API that I don't think users are relying on directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants