Implement DataFrame/Series rename_axis #1843

LucasG0 · 2020-10-12T22:51:44Z

Hi, this PR implements DataFrame.rename_axis and Series.rename_axis.
I did not add copy parameter as a copy is performed anyway.

HyukjinKwon · 2020-10-15T10:59:10Z

cc @itholic can you review this please?

itholic · 2020-10-15T14:03:57Z

Sure, let me take a look sooner or later :)

itholic

Basically looks fine to me, but let me check this once again on this weekend .

databricks/koalas/frame.py

databricks/koalas/tests/test_series.py

databricks/koalas/frame.py

databricks/koalas/series.py

itholic

Otherwise, Seems fine to me.

itholic · 2020-10-18T05:36:35Z

databricks/koalas/frame.py

+            A scalar, list-like, dict-like or functions transformations to
+            apply to the axis name attribute.


Maybe this explanation for mapper is not correct?

In the pandas latest docs, they say:

Value to set the axis name attribute.

Yes, I should have precised that pandas does not support using a dict or a function as mapper. They say :
However, if mapper is dict-like or a function, it will use the deprecated behavior of modifying the axis labels.
So, I am not sure this is the correct behavior, that is why I added the possibility to use dict / function as mapper, and therefore updated the docs.
I am interested in your opinion on this !

itholic · 2020-10-18T05:37:24Z

databricks/koalas/frame.py

+        See Also
+        --------
+        DataFrame.rename : Alter DataFrame index labels or name.
+        Index.rename : Set new names on index.


Maybe we can also have Series.rename ?

Yes I did not put them as I split Series/DataFrame docs unlike pandas, but I will add them back. :)
Maybe we could also refer to Series.rename_axis or instead ?

itholic · 2020-10-18T05:39:53Z

databricks/koalas/series.py

+        mapper, index :  scalar, list-like, dict-like or function, optional
+            A scalar, list-like, dict-like or functions transformations to
+            apply to the index values.


Maybe this it's also not correct? I think you can refer to here.

This is for the same reason I explained above.

itholic · 2020-10-18T05:40:29Z

databricks/koalas/series.py

+        See Also
+        --------
+        Series.rename : Alter Series index labels or name.
+        Index.rename : Set new names on index.


Maybe we can also have DataFrame.rename here ?

itholic · 2020-10-18T05:40:58Z

databricks/koalas/series.py

+                monkey    2
+        Name: num_legs, dtype: int64
+        """
+        return first_series(self.to_frame().rename_axis(mapper=mapper, index=index))


Maybe inplace=inplace is missing?

Yes indeed !

itholic · 2020-10-18T05:43:54Z

databricks/koalas/tests/test_dataframe.py

+        self.assertRaises(ValueError, lambda: kdf.rename_axis(["cols2", "cols3"], axis=1))
+
+        # index/columns parameters and dict_like/functions mappers introduced in pandas 0.24.0
+        if LooseVersion(pd.__version__) >= LooseVersion("0.24.0"):


Would you make tests for pandas < 0.24.0 by manually declaring the expected result like we did before ??

itholic · 2020-10-18T05:44:04Z

databricks/koalas/tests/test_dataframe.py

+        )
+
+        # index/columns parameters and dict_like/functions mappers introduced in pandas 0.24.0
+        if LooseVersion(pd.__version__) >= LooseVersion("0.24.0"):


itholic · 2020-10-18T05:55:26Z

databricks/koalas/tests/test_dataframe.py

+        self.assert_eq(
+            pdf.rename_axis(["index2"]).sort_index(), kdf.rename_axis(["index2"]).sort_index(),
+        )
+
+        self.assert_eq(
+            pdf.rename_axis(["index2"], axis=1).sort_index(),
+            kdf.rename_axis(["index2"], axis=1).sort_index(),
+        )


Why don't we just add these tests to the above??

For example,

for axis in [0, "index"]: self.assert_eq( pdf.rename_axis("index2", axis=axis).sort_index(), kdf.rename_axis("index2", axis=axis).sort_index(), ) self.assert_eq( pdf.rename_axis(["index2"], axis=axis).sort_index(), kdf.rename_axis(["index2"], axis=axis).sort_index(), ) for axis in [1, "columns"]: self.assert_eq( pdf.rename_axis("cols2", axis=axis).sort_index(), kdf.rename_axis("cols2", axis=axis).sort_index(), ) self.assert_eq( pdf.rename_axis(["cols2"], axis=axis).sort_index(), kdf.rename_axis(["cols2"], axis=axis).sort_index(), )

itholic · 2020-10-18T07:09:34Z

databricks/koalas/frame.py

+        spark_frame = self._internal.resolved_copy.spark_frame
+        internal = InternalFrame(
+            spark_frame=spark_frame,
+            index_map=index_map,
+            column_labels=self._internal.column_labels,
+            data_spark_columns=[
+                scol_for(spark_frame, col) for col in self._internal.data_spark_column_names
+            ],
+            column_label_names=column_label_names,
+        )


Maybe I think we could just simply update the index_map and column_label_names using self._internal.copy rather than create the new InternalFrame here ??

Because this API updates only the name of index or columns.

internal = self._internal.copy( index_map=index_map, column_label_names=column_label_names, )

Definitely !

LucasG0 · 2020-10-18T20:52:31Z

By the way, I actually do not think anymore that a copy of the underlying data is performed in this implementation. However, I am not sure copying the immutable spark_frame is relevant here, so I wonder if we should keep copy parameter.

codecov-io · 2020-10-18T21:09:13Z

Codecov Report

Merging #1843 into master will decrease coverage by 0.01%.
The diff coverage is 97.22%.

@@            Coverage Diff             @@
##           master    #1843      +/-   ##
==========================================
- Coverage   94.16%   94.15%   -0.02%     
==========================================
  Files          40       40              
  Lines        9725     9772      +47     
==========================================
+ Hits         9158     9201      +43     
- Misses        567      571       +4

Impacted Files	Coverage Δ
databricks/koalas/missing/frame.py	`100.00% <ø> (ø)`
databricks/koalas/missing/series.py	`100.00% <ø> (ø)`
databricks/koalas/frame.py	`96.76% <96.66%> (-0.01%)`	⬇️
databricks/koalas/series.py	`96.86% <100.00%> (+0.01%)`	⬆️
databricks/koalas/generic.py	`95.35% <0.00%> (-0.29%)`	⬇️
databricks/koalas/plot.py	`91.92% <0.00%> (-0.24%)`	⬇️
databricks/koalas/indexing.py	`92.73% <0.00%> (ø)`
databricks/koalas/internal.py	`96.44% <0.00%> (ø)`
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d01d9a7...edcef50. Read the comment docs.

ueshin

Otherwise, LGTM.

databricks/koalas/frame.py

LucasG0 · 2020-10-20T18:11:56Z

Thanks !

LucasG0 added 4 commits October 13, 2020 20:54

Implement DataFrame/Series rename_axis

caafa70

support previous pandas versions

bfc5677

add check on pandas version in tests

f813c13

specify columns when instantiating df in doctest

d45f65b

LucasG0 force-pushed the rename_axis branch from 619bc98 to d45f65b Compare October 13, 2020 18:54

itholic reviewed Oct 16, 2020

View reviewed changes

ueshin reviewed Oct 16, 2020

View reviewed changes

databricks/koalas/tests/test_series.py Show resolved Hide resolved

ueshin reviewed Oct 16, 2020

View reviewed changes

databricks/koalas/frame.py Outdated Show resolved Hide resolved

databricks/koalas/series.py Outdated Show resolved Hide resolved

use resolved spark frame + minor changes

5085f77

LucasG0 force-pushed the rename_axis branch from d589d1a to 5085f77 Compare October 17, 2020 20:29

itholic approved these changes Oct 18, 2020

View reviewed changes

itholic reviewed Oct 18, 2020

View reviewed changes

LucasG0 added 4 commits October 18, 2020 22:13

use inplace for series + modify internal creation

0e5d4f0

complete tests

7c56e21

update docs

9e819aa

remove extra lines

f21b6ba

add tests for inplace parameter + fix series with inplace=True

edcef50

HyukjinKwon approved these changes Oct 19, 2020

View reviewed changes

ueshin approved these changes Oct 19, 2020

View reviewed changes

databricks/koalas/frame.py Outdated Show resolved Hide resolved

use is_name_like_tuple

ae98c31

HyukjinKwon merged commit 0036768 into databricks:master Oct 20, 2020

LucasG0 deleted the rename_axis branch October 20, 2020 11:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement DataFrame/Series rename_axis #1843

Implement DataFrame/Series rename_axis #1843

LucasG0 commented Oct 12, 2020

HyukjinKwon commented Oct 15, 2020

itholic commented Oct 15, 2020

itholic left a comment

itholic left a comment

itholic Oct 18, 2020

LucasG0 Oct 18, 2020

itholic Oct 18, 2020 •

edited

Loading

LucasG0 Oct 18, 2020 •

edited

Loading

itholic Oct 18, 2020

LucasG0 Oct 18, 2020

itholic Oct 18, 2020

itholic Oct 18, 2020

LucasG0 Oct 18, 2020

itholic Oct 18, 2020

itholic Oct 18, 2020

itholic Oct 18, 2020 •

edited

Loading

itholic Oct 18, 2020 •

edited

Loading

LucasG0 Oct 18, 2020

LucasG0 commented Oct 18, 2020

codecov-io commented Oct 18, 2020 •

edited

Loading

ueshin left a comment

LucasG0 commented Oct 20, 2020

		A scalar, list-like, dict-like or functions transformations to
		apply to the axis name attribute.

Implement DataFrame/Series rename_axis #1843

Implement DataFrame/Series rename_axis #1843

Conversation

LucasG0 commented Oct 12, 2020

HyukjinKwon commented Oct 15, 2020

itholic commented Oct 15, 2020

itholic left a comment

Choose a reason for hiding this comment

itholic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

itholic Oct 18, 2020 • edited Loading

Choose a reason for hiding this comment

LucasG0 Oct 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

itholic Oct 18, 2020 • edited Loading

Choose a reason for hiding this comment

itholic Oct 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LucasG0 commented Oct 18, 2020

codecov-io commented Oct 18, 2020 • edited Loading

Codecov Report

ueshin left a comment

Choose a reason for hiding this comment

LucasG0 commented Oct 20, 2020

itholic Oct 18, 2020 •

edited

Loading

LucasG0 Oct 18, 2020 •

edited

Loading

itholic Oct 18, 2020 •

edited

Loading

itholic Oct 18, 2020 •

edited

Loading

codecov-io commented Oct 18, 2020 •

edited

Loading