-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DataFrame/Series rename_axis #1843
Conversation
cc @itholic can you review this please? |
Sure, let me take a look sooner or later :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically looks fine to me, but let me check this once again on this weekend .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, Seems fine to me.
A scalar, list-like, dict-like or functions transformations to | ||
apply to the axis name attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this explanation for mapper
is not correct?
In the pandas latest docs, they say:
Value to set the axis name attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I should have precised that pandas does not support using a dict or a function as mapper
. They say :
However, if mapper is dict-like or a function, it will use the deprecated behavior of modifying the axis labels.
So, I am not sure this is the correct behavior, that is why I added the possibility to use dict / function as mapper
, and therefore updated the docs.
I am interested in your opinion on this !
See Also | ||
-------- | ||
DataFrame.rename : Alter DataFrame index labels or name. | ||
Index.rename : Set new names on index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can also have Series.rename
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I did not put them as I split Series
/DataFrame
docs unlike pandas, but I will add them back. :)
Maybe we could also refer to Series.rename_axis
or instead ?
mapper, index : scalar, list-like, dict-like or function, optional | ||
A scalar, list-like, dict-like or functions transformations to | ||
apply to the index values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this it's also not correct? I think you can refer to here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the same reason I explained above.
See Also | ||
-------- | ||
Series.rename : Alter Series index labels or name. | ||
Index.rename : Set new names on index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can also have DataFrame.rename
here ?
databricks/koalas/series.py
Outdated
monkey 2 | ||
Name: num_legs, dtype: int64 | ||
""" | ||
return first_series(self.to_frame().rename_axis(mapper=mapper, index=index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe inplace=inplace
is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed !
self.assertRaises(ValueError, lambda: kdf.rename_axis(["cols2", "cols3"], axis=1)) | ||
|
||
# index/columns parameters and dict_like/functions mappers introduced in pandas 0.24.0 | ||
if LooseVersion(pd.__version__) >= LooseVersion("0.24.0"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you make tests for pandas < 0.24.0
by manually declaring the expected result like we did before ??
) | ||
|
||
# index/columns parameters and dict_like/functions mappers introduced in pandas 0.24.0 | ||
if LooseVersion(pd.__version__) >= LooseVersion("0.24.0"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto ?
self.assert_eq( | ||
pdf.rename_axis(["index2"]).sort_index(), kdf.rename_axis(["index2"]).sort_index(), | ||
) | ||
|
||
self.assert_eq( | ||
pdf.rename_axis(["index2"], axis=1).sort_index(), | ||
kdf.rename_axis(["index2"], axis=1).sort_index(), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we just add these tests to the above??
For example,
for axis in [0, "index"]:
self.assert_eq(
pdf.rename_axis("index2", axis=axis).sort_index(),
kdf.rename_axis("index2", axis=axis).sort_index(),
)
self.assert_eq(
pdf.rename_axis(["index2"], axis=axis).sort_index(),
kdf.rename_axis(["index2"], axis=axis).sort_index(),
)
for axis in [1, "columns"]:
self.assert_eq(
pdf.rename_axis("cols2", axis=axis).sort_index(),
kdf.rename_axis("cols2", axis=axis).sort_index(),
)
self.assert_eq(
pdf.rename_axis(["cols2"], axis=axis).sort_index(),
kdf.rename_axis(["cols2"], axis=axis).sort_index(),
)
databricks/koalas/frame.py
Outdated
spark_frame = self._internal.resolved_copy.spark_frame | ||
internal = InternalFrame( | ||
spark_frame=spark_frame, | ||
index_map=index_map, | ||
column_labels=self._internal.column_labels, | ||
data_spark_columns=[ | ||
scol_for(spark_frame, col) for col in self._internal.data_spark_column_names | ||
], | ||
column_label_names=column_label_names, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I think we could just simply update the index_map
and column_label_names
using self._internal.copy
rather than create the new InternalFrame
here ??
Because this API updates only the name of index or columns.
internal = self._internal.copy(
index_map=index_map,
column_label_names=column_label_names,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely !
By the way, I actually do not think anymore that a copy of the underlying data is performed in this implementation. However, I am not sure copying the immutable |
Codecov Report
@@ Coverage Diff @@
## master #1843 +/- ##
==========================================
- Coverage 94.16% 94.15% -0.02%
==========================================
Files 40 40
Lines 9725 9772 +47
==========================================
+ Hits 9158 9201 +43
- Misses 567 571 +4
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, LGTM.
Thanks ! |
Hi, this PR implements
DataFrame.rename_axis
andSeries.rename_axis
.I did not add
copy
parameter as a copy is performed anyway.