Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix idxmax() / idxmin() for Series work properly #1078

Merged
merged 8 commits into from
Dec 4, 2019

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Nov 26, 2019

Reopen of #1065

fix more properly with considering of more examples commented at #1065 (comment)

>>> pser = pd.Series([1, 100, None, 100, 1, 100], index=[10, 3, 5, 2, 1, 8])
>>> kser = ks.from_pandas(pser)
>>>
>>> pser.idxmax()
3
>>> kser.idxmax()
3
>>> pser.idxmin()
10
>>> kser.idxmin()
10

@codecov-io
Copy link

codecov-io commented Nov 26, 2019

Codecov Report

Merging #1078 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1078      +/-   ##
==========================================
- Coverage    95.2%   95.19%   -0.02%     
==========================================
  Files          34       34              
  Lines        6868     6935      +67     
==========================================
+ Hits         6539     6602      +63     
- Misses        329      333       +4
Impacted Files Coverage Δ
databricks/koalas/series.py 96.5% <100%> (-0.04%) ⬇️
databricks/koalas/config.py 98.98% <0%> (-1.02%) ⬇️
databricks/koalas/base.py 94.88% <0%> (-0.36%) ⬇️
databricks/koalas/utils.py 98.15% <0%> (-0.02%) ⬇️
databricks/koalas/missing/frame.py 100% <0%> (ø) ⬆️
databricks/koalas/missing/indexes.py 100% <0%> (ø) ⬆️
databricks/koalas/missing/series.py 100% <0%> (ø) ⬆️
databricks/koalas/frame.py 96.76% <0%> (+0.02%) ⬆️
databricks/koalas/indexes.py 96.2% <0%> (+0.03%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80e9ebe...9b41867. Read the comment docs.

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this current implementation runs Spark jobs three times.
I guess we should just use F.monotonically_increasing_id() instead of index_scols in the previous PR #1065?

databricks/koalas/series.py Outdated Show resolved Hide resolved
@HyukjinKwon
Copy link
Member

Can you make tests passed at Python 3.5 too @itholic?

@ueshin
Copy link
Collaborator

ueshin commented Dec 3, 2019

What do you think about #1078 (review)?

I guess we should just use F.monotonically_increasing_id() instead of index_scols in the previous PR #1065?

@itholic
Copy link
Contributor Author

itholic commented Dec 4, 2019

@ueshin Ah! sorry i got missed that comment. 😢

It makes sense. Thanks for the comment i fixed it!! 👍

@softagram-bot
Copy link

Softagram Impact Report for pull/1078 (head commit: 9b41867)

⚠️ Copy paste found

ℹ️ series.py: Copy paste fragment on line 1249 shared with ../frame.py:


    def to_latex(self, buf=None, columns=None, col_space=None, header=True, index=True,
                 na_rep='NaN',...(truncated 256 chars)

ℹ️ series.py: Copy paste fragment inside the same file on lines 3186, 3294:

        results = sdf.select([scol] + index_scols).take(1)
        if len(results) == 0:
           ...(truncated 409 chars)

ℹ️ series.py: Copy paste fragment inside the same file on lines 4075, 4229:

        sdf = self._internal.sdf \
            .select(cols) \
            .where(reduce(lambda x, y: x & y, rows))

        if len(self._inter...(truncated 255 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 312, 329:

                     pd.Series([True, False], name='x'),
                     pd.Series([0, 1], name='x'),
                     pd.Series([1, 2,...(truncated 330 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 731, 894:

                              ['speed', 'weight', 'length']],
                             [[0, 0, 0, 1, 1, 1, 2, 2, 2],
                      ...(truncated 117 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 817, 834:


        pser1 = pd.Series([-1, -2, -3, -4, -5], name=0)
        pser2 = pd.Series([-100, -200, -300, -400, -500], name=0)
        k...(truncated 123 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 180, 190:

        pdf = pd.DataFrame({
            'left':  [True, False, True, False, np.nan, np.nan, True, False, np.nan],
            'right': [True, False, False, True, True, False, n...(truncated 119 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 778, 896:

                              [0, 1, 2, 0, 1, 2, 0, 1, 2]])
        kser = ks.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3],
             ...(truncated 145 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 649, 674:


        index = pd.MultiIndex.from_arrays([
            ['a', 'a', 'b', 'b'], ['c', 'd', 'e', 'f']], names=('first', 'se...(truncated 151 chars)

Now that you are on the file, it would be easier to pay back some tech. debt.

⭐ Change Overview

Showing the changed files, dependency changes and the impact - click for full size
(Open in Softagram Desktop for full details)

💡 Insights

  • Co-change Alert: You modified series.py. Often frame.py (databricks/koalas) is modified at the same time.

📄 Full report

Impact Report explained. Give feedback on this report to [email protected]

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ueshin
Copy link
Collaborator

ueshin commented Dec 4, 2019

Thanks! merging.

@ueshin ueshin merged commit 17d067d into databricks:master Dec 4, 2019
@itholic
Copy link
Contributor Author

itholic commented Dec 4, 2019

Thanks :)

@itholic itholic deleted the fix_s_idxmax branch December 10, 2019 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants