Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix melt for multi-index columns support. #920

Merged
merged 3 commits into from
Oct 15, 2019
Merged

Conversation

ueshin
Copy link
Collaborator

@ueshin ueshin commented Oct 11, 2019

No description provided.

@codecov-io
Copy link

codecov-io commented Oct 11, 2019

Codecov Report

Merging #920 into master will increase coverage by 0.08%.
The diff coverage is 92.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #920      +/-   ##
==========================================
+ Coverage   94.38%   94.46%   +0.08%     
==========================================
  Files          34       34              
  Lines        6355     6383      +28     
==========================================
+ Hits         5998     6030      +32     
+ Misses        357      353       -4
Impacted Files Coverage Δ
databricks/koalas/namespace.py 86.83% <100%> (ø) ⬆️
databricks/koalas/frame.py 96.02% <92%> (-0.03%) ⬇️
databricks/koalas/utils.py 97.94% <0%> (+0.01%) ⬆️
databricks/koalas/series.py 95.59% <0%> (+0.13%) ⬆️
databricks/koalas/indexing.py 94.4% <0%> (+1.88%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7f1fe0...eb34150. Read the comment docs.

@@ -6684,7 +6684,7 @@ def _reindex_columns(self, columns):

return self._internal.copy(sdf=sdf, data_columns=columns, column_index=idx)

def melt(self, id_vars=None, value_vars=None, var_name='variable',
def melt(self, id_vars=None, value_vars=None, var_name=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the default value of melt's var_name at namespace.py should be changed as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, good catch!

id_vars = list(id_vars)
elif isinstance(id_vars, str):
id_vars = [(id_vars,)]
elif isinstance(id_vars, tuple):
Copy link
Member

@HyukjinKwon HyukjinKwon Oct 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the tuple alone is not allowed when multi-index:

>>> pdf.melt(id_vars=('X', 'A'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 6500, in melt
    col_level=col_level,
  File "/usr/local/lib/python3.7/site-packages/pandas/core/reshape/melt.py", line 42, in melt
    "id_vars must be a list of tuples when columns" " are a MultiIndex"
ValueError: id_vars must be a list of tuples when columns are a MultiIndex

vs

>>> kdf.melt(id_vars=('X', 'A'))
   ('X', 'A') variable_0 variable_1  value
0           1          X          B      2
1           1          Y          C      7
2           3          X          B      4
3           3          Y          C      8
4           5          X          B      6
5           5          Y          C      9

Maybe we should check if via len(self._internal.index_map) == 1: and disallow a tuple alone.

and .. I think it should be considered as multiple columns. See

>>> kdf.melt(id_vars=('A', 'B'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../koalas/databricks/koalas/frame.py", line 6818, in melt
    for name in var_name[:self._internal.column_index_level]] +
  File "/.../koalas/databricks/koalas/frame.py", line 6816, in <listcomp>
    for idx in id_vars] +
  File "/.../koalas/databricks/koalas/internal.py", line 534, in scol_for
    return scol_for(self._sdf, self.column_name_for(column_name_or_index))
  File "/.../koalas/databricks/koalas/internal.py", line 523, in column_name_for
    raise KeyError(column_name_or_index)
KeyError: ('A', 'B')

vs

>>> pdf.melt(id_vars=('A', 'B'))
   A  B variable  value
0  1  2        C      7
1  3  4        C      8
2  5  6        C      9

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I thought it's a weird behavior but let's follow pandas for now.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise.

@softagram-bot
Copy link

Softagram Impact Report for pull/920 (head commit: eb34150)

⭐ Change Overview

Showing the changed files, dependency changes and the impact - click for full size
(Open in Softagram Desktop for full details)

📄 Full report

Impact Report explained. Give feedback on this report to [email protected]

@HyukjinKwon HyukjinKwon merged commit a7f4249 into databricks:master Oct 15, 2019
@HyukjinKwon
Copy link
Member

Thanks, merged.

@ueshin ueshin deleted the melt branch October 15, 2019 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants