Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/stray tags #163

Merged
merged 15 commits into from
Aug 24, 2022
Merged

Conversation

luciecastella
Copy link
Contributor

@luciecastella luciecastella commented Aug 23, 2022

Closes #128

The program now checks if processed codes still contain '{'

For now, the check is done in definition.py and not in testing.py as discussed in the Issue because it uses existing function calls and checking in testing.py would imply more code lines (I think). @danielhuppmann @phackstock do you think it is fine like this or should I relocate the test?

@phackstock
Copy link
Contributor

In principle this should work but I would say that this check would be more logical as a validator in the CodeList class. It's always good to catch errors as early as possible so I think putting it there would be good.

@phackstock
Copy link
Contributor

Also testing errors more closely to where they originate is always a good idea.
What I mean by that is in your current case the error originates in DataStructureDefinition so you could use class directly instead of going through assert_valid_structure. Additionally, for testing this you don't need any scenario or region code lists. I would leave them out.

@danielhuppmann
Copy link
Member

Looks good to me in principle!

@phackstock
Copy link
Contributor

Thanks @luciecastella for implementing the suggested changes. I would have two additional recommendations then it should be fine from my side.

  1. The check if there are any stray tags should be inside a pydantic validator. pydantic is a type checking library that we use for a number of components of the nomenclature package. In my opinion there are two reasons why the check should be implemented as a validator. The first being the way that nomenclature collects all errors during validation and outputs them all at the same time so you don't have to go back multiple times in order to fix all mistakes in a codelist. The second reason is for code consistency. All checks on whether or not we consider a codelist valid are part of a validator so it's easy to see all the different checks we run. The pydantic documentation is quite good in explaining how its validator work (https://pydantic-docs.helpmanual.io/usage/validators/). You can find an example of how we use validators here: https://github.com/IAMconsortium/nomenclature/blob/main/nomenclature/codelist.py#L62
  2. For the test, inside the with pytest.raises I would recommend using something like:
...
CodeList.from_directory("variable", TEST_DATA_DIR / "stray_tag" / "variable")

instead of using assert_valid_structure. Reason being here that we want to make sure that a unit test is as close to where the error originates as possible.

@luciecastella luciecastella marked this pull request as ready for review August 23, 2022 14:09
Copy link
Contributor

@phackstock phackstock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks almost ready to me, good work @luciecastella.
One comment below and two more general thoughts:

  • test_stray_tag_fails would probably fit better in test_codelist than in test_testing as it no longer uses assert_valid_structure.
  • Maybe you could add a hint to the user in the error message how the issue can be resolved. Perhaps something along the lines of "Check if the tag was spelled correctly or if it wrongly does not exist".

nomenclature/codelist.py Outdated Show resolved Hide resolved
@danielhuppmann
Copy link
Member

Agree with @phackstock about adding a hint on how to resolve, but I suggest to shorten it to

"Check if the tag was spelled correctly."

Copy link
Contributor

@phackstock phackstock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to be merged from my side.

def test_stray_tag_fails():
"""Check that typos in a tag raises expected error"""

match = "Unexpected {} in codelist : Primary Energy|{Feul}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this test now fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Ok sorry got it. I thought the match looks for this string in the error, and it does not matter if there are more characters. but maybe not... Should I change it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK no you're right, something is wrong, thank you! I'll look into that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just played around for a minute, it seems that the : does something to the match-validation.

I suggest to remove the whitespace, so that the test passes as expected (by failing with the right error message).

Suggested change
match = "Unexpected {} in codelist : Primary Energy|{Feul}"
match = "Unexpected {} in codelist: Primary Energy|{Feul}"

And I suggest that you look into this more closely tomorrow (but not related to this PR), for your own knowledge...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The match is interpreted as a regular expression and the character messing up the match is the pipe | which is evaluated as or. To evaluate it as a literal pipe character you'll have to prefix it with a slash \.

match = "Unexpected {} in codelist : Primary Energy\|{Feul}"

should fail as expected and

Suggested change
match = "Unexpected {} in codelist : Primary Energy|{Feul}"
match = "Unexpected {} in codelist: Primary Energy\|{Feul}"

should be evaluated correctly and make the test pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thank you!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the pipe in the string allows for any string to pass the test. We actually need a double \\ to make it work :)

Copy link
Member

@danielhuppmann danielhuppmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks great!

@danielhuppmann danielhuppmann merged commit 4b9995e into IAMconsortium:main Aug 24, 2022
@luciecastella luciecastella deleted the fix/stray-tags branch August 29, 2022 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validate against stray tags
3 participants