Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor multiple warning levels for same data validation filter #461

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

dc-almeida
Copy link
Collaborator

Closes #439
Allows for a more compact writing of multiple warning level criteria (if warning level is ommitted, it defaults to error). Backwards compatible with more verbose syntax.
Stores the criteria for each level in descending order of severity (useful for upcoming PR that skips validation for lower warnings if a higher one failed).

@dc-almeida dc-almeida added the enhancement New feature or request label Jan 21, 2025
@dc-almeida dc-almeida self-assigned this Jan 21, 2025
@dc-almeida dc-almeida marked this pull request as ready for review January 21, 2025 10:06
@danielhuppmann
Copy link
Member

The PR looks good to me, but I'm wondering how the "skipping" will work in the follow-up PR?

The current approach separates one validation-item with multiple criteria/warning-levels in separate items - how will the validation know that an alternative variant of the same item was already triggered?

I'm wondering if a better solution wouldn't be the other direction:

  1. if the legacy-format is used, translate to validation-item to the new multiple-criteria-variant
  2. extend the validation to go over the criteria within a validation-item until one is triggered and write to log

@dc-almeida
Copy link
Collaborator Author

Makes sense, and yes, makes the descending severity validation sequence easier to implement.

Question? in translating the legacy format to the new format, say the file contains multiple (separate) criteria for the same validation filter, but these may appear in any order in the file. It should be checked if the same type of item has been passed already and update it instead of creating a new one? That is, even if starting as separate legacy items, they end up aggregated in the object.

@danielhuppmann
Copy link
Member

In translating the legacy format to the new format, say the file contains multiple (separate) criteria for the same validation filter, but these may appear in any order in the file.

Make it easy for us, implementation-wise:

  • Don't worry about overlaps of criteria in the simple format. If there are multiple criteria items that check the same thing, there will be multiple warnings.
  • Instead of actively sorting the multiple criteria in one validation item, just do a validation-step of the criteria that the sub-items must be given in descending order.

Copy link
Member

@danielhuppmann danielhuppmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm confused about the order of validation-criteria? But it would be useful to have another variant of the test where only the low-warning-level criteria is triggered...

tests/test_cli.py Outdated Show resolved Hide resolved

failed_validation_message = (
"Data validation with error(s)/warning(s) "
f"""(file {(DATA_VALIDATION_TEST_DIR / f"validate_warning_{file}.yaml").relative_to(Path.cwd())}):
Criteria: variable: ['Primary Energy'], year: [2010], upper_bound: 5.0, lower_bound: 1.0
model scenario region variable unit year value warning_level
0 model_a scen_a World Primary Energy EJ/yr 2010 6.0 error
1 model_a scen_b World Primary Energy EJ/yr 2010 7.0 error
1 model_a scen_b World Primary Energy EJ/yr 2010 7.0 error"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to add a third variant here where the relevant data-value of simple_df is changed to check that only the error is printed in one case and only the low-warning-level is printed in another case.

Co-authored-by: Daniel Huppmann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow easier setting of multiple warning levels for same criterion in DataValidator
2 participants