-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a validation check to rename()
#202
Add a validation check to rename()
#202
Conversation
@gidden, ready for review if you have a minute... |
pyam/core.py
Outdated
@@ -580,6 +589,7 @@ def rename(self, mapping=None, inplace=False, append=False, **kwargs): | |||
# renaming is only applied where a filter matches for all given columns | |||
rows = ret._apply_filters(filters) | |||
idx = ret.meta.index.isin(_make_index(ret.data[rows])) | |||
_data = ret.data.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to make a copy here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case check_duplicates=True
and a ValueError
is raised, we don't want any changes to the IamDataFrame
to persist.
But I guess _data = ret.data.copy() if check_duplicates else ret.data
would be slightly more efficient, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Can you add a warning in the docstring stating that this makes a copy as well? Also, are you sure we want that to be True
by default? What are the pro/cons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the more elaborate only-make-interim-copy-if-check_duplicates
and an inline comment to explain that the interim copy is kept until after the validation check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, are you sure we want that to be
True
by default? What are the pro/cons?
I think that the more careful behaviour (check_duplicates == True
) should be the default. Users should explicitly have to override the sanity check when (as in the unit test) renaming timeseries test_1
to test_3
and aggregating with existing timeseries data of test_3
.
looks good, thanks @danielhuppmann ! |
Please confirm that this PR has done the following:
Description of PR
This PR adds a validation check to
df.rename()
to prevent accidentally renaming and aggregating existing and renamed variables.In terms of the new unit test
test_rename_duplicates()
, the user may not be (actively) aware that a variabletest_3
exists in theIamDataFrame
and not actually want to aggregatetest_1
andtest_3
. With the new feature, the user will receive an error message as default behaviour and has to actively override the validation step.closes #182