-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix melt for multi-index columns support. #920
Conversation
Codecov Report
@@ Coverage Diff @@
## master #920 +/- ##
==========================================
+ Coverage 94.38% 94.46% +0.08%
==========================================
Files 34 34
Lines 6355 6383 +28
==========================================
+ Hits 5998 6030 +32
+ Misses 357 353 -4
Continue to review full report at Codecov.
|
@@ -6684,7 +6684,7 @@ def _reindex_columns(self, columns): | |||
|
|||
return self._internal.copy(sdf=sdf, data_columns=columns, column_index=idx) | |||
|
|||
def melt(self, id_vars=None, value_vars=None, var_name='variable', | |||
def melt(self, id_vars=None, value_vars=None, var_name=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the default value of melt
's var_name
at namespace.py should be changed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, good catch!
id_vars = list(id_vars) | ||
elif isinstance(id_vars, str): | ||
id_vars = [(id_vars,)] | ||
elif isinstance(id_vars, tuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the tuple alone is not allowed when multi-index:
>>> pdf.melt(id_vars=('X', 'A'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 6500, in melt
col_level=col_level,
File "/usr/local/lib/python3.7/site-packages/pandas/core/reshape/melt.py", line 42, in melt
"id_vars must be a list of tuples when columns" " are a MultiIndex"
ValueError: id_vars must be a list of tuples when columns are a MultiIndex
vs
>>> kdf.melt(id_vars=('X', 'A'))
('X', 'A') variable_0 variable_1 value
0 1 X B 2
1 1 Y C 7
2 3 X B 4
3 3 Y C 8
4 5 X B 6
5 5 Y C 9
Maybe we should check if via len(self._internal.index_map) == 1:
and disallow a tuple alone.
and .. I think it should be considered as multiple columns. See
>>> kdf.melt(id_vars=('A', 'B'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../koalas/databricks/koalas/frame.py", line 6818, in melt
for name in var_name[:self._internal.column_index_level]] +
File "/.../koalas/databricks/koalas/frame.py", line 6816, in <listcomp>
for idx in id_vars] +
File "/.../koalas/databricks/koalas/internal.py", line 534, in scol_for
return scol_for(self._sdf, self.column_name_for(column_name_or_index))
File "/.../koalas/databricks/koalas/internal.py", line 523, in column_name_for
raise KeyError(column_name_or_index)
KeyError: ('A', 'B')
vs
>>> pdf.melt(id_vars=('A', 'B'))
A B variable value
0 1 2 C 7
1 3 4 C 8
2 5 6 C 9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I thought it's a weird behavior but let's follow pandas for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good otherwise.
Softagram Impact Report for pull/920 (head commit: eb34150)⭐ Change Overview
📄 Full report
Impact Report explained. Give feedback on this report to [email protected] |
Thanks, merged. |
No description provided.