Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sort_values for Index/MultiIndex #1120

Merged
merged 7 commits into from
Dec 19, 2019

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Dec 12, 2019

Implement sort_values for Index/MultiIndex
(https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.sort_values.html#pandas.Index.sort_values)

>>> idx = ks.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

>>> idx.sort_values(ascending=False)
Int64Index([1000, 100, 10, 1], dtype='int64')

Support for MultiIndex.

>>> kidx = ks.MultiIndex.from_tuples([('a', 'x', 1), ('c', 'y', 2), ('b', 'z', 3)])
>>> kidx
MultiIndex([('a', 'x', 1),
            ('c', 'y', 2),
            ('b', 'z', 3)],
           )

>>> kidx.sort_values()
MultiIndex([('a', 'x', 1),
            ('b', 'z', 3),
            ('c', 'y', 2)],
           )

>>> kidx.sort_values(ascending=False)
MultiIndex([('c', 'y', 2),
            ('b', 'z', 3),
            ('a', 'x', 1)],
           )

@codecov-io
Copy link

codecov-io commented Dec 12, 2019

Codecov Report

Merging #1120 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1120      +/-   ##
==========================================
+ Coverage   95.19%   95.19%   +<.01%     
==========================================
  Files          35       35              
  Lines        7071     7075       +4     
==========================================
+ Hits         6731     6735       +4     
  Misses        340      340
Impacted Files Coverage Δ
databricks/koalas/missing/indexes.py 100% <ø> (ø) ⬆️
databricks/koalas/indexes.py 96.45% <100%> (+0.07%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 468bf3a...a8dda7d. Read the comment docs.

Comment on lines 784 to 787
if isinstance(self, MultiIndex):
result.names = self.names
else:
result.name = self.name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this, i think we can't keep names when below case.

>>> kidx = ks.MultiIndex.from_tuples([('a', 'x', 1), ('c', 'y', 2), ('b', 'z', 3)])
>>> kidx.names = ['A', 'B', 'C']
>>> kidx.sort_values()
MultiIndex([('a', 'x', 1),
            ('b', 'z', 3),
            ('c', 'y', 2)],
           )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see. @names.setter seems wrong.

Copy link
Contributor Author

@itholic itholic Dec 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i got it.

in @names.setter, the new index_map is overwritten to self._kdf._internal, not to self._internal.

like below

self._kdf._internal = internal.copy(index_map=list(zip(internal.index_columns, names)))

at this point, i curious why we overwrite self._kdf._internal rather than simply self._internal?

For now, i've fixed it to the current implementation

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine but let me leave it to @ueshin

@softagram-bot
Copy link

Softagram Impact Report for pull/1120 (head commit: 8ae7722)

⚠️ Copy paste found

ℹ️ test_indexes.py: Copy paste fragment on line 30 shared with ../test_dataframe.py, ../test_numpy_compat.py:


    @property
    def pdf(self):
        return pd.DataFrame({
            'a': [1, 2, 3, 4, 5, 6, 7, 8, 9],
            'b': [4, 5, 6, 3, 2, 1, ...(truncated 160 chars)

ℹ️ indexes.py: Copy paste fragment inside the same file on lines 720, 1163:

            raise NotImplementedError(
                \"Doesn't support symmetric_difference between Index & MultiIndex for now\")

        sdf_self = self._kdf._s...(truncated 477 chars)

Now that you are on the file, it would be easier to pay back some tech. debt.

⭐ Change Overview

Showing the changed files, dependency changes and the impact - click for full size
(Open in Softagram Desktop for full details)

📄 Full report

Impact Report explained. Give feedback on this report to [email protected]

@HyukjinKwon
Copy link
Member

@itholic can you resolve conflicts?

@itholic
Copy link
Contributor Author

itholic commented Dec 19, 2019

@HyukjinKwon resolved :)

databricks/koalas/missing/indexes.py Outdated Show resolved Hide resolved
databricks/koalas/missing/indexes.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@HyukjinKwon HyukjinKwon merged commit c03b3a6 into databricks:master Dec 19, 2019
@itholic itholic deleted the i_sort_values branch December 20, 2019 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants