Fix idxmax() / idxmin() for Series work properly #1078

itholic · 2019-11-26T02:17:31Z

Reopen of #1065

fix more properly with considering of more examples commented at #1065 (comment)

>>> pser = pd.Series([1, 100, None, 100, 1, 100], index=[10, 3, 5, 2, 1, 8])
>>> kser = ks.from_pandas(pser)
>>>
>>> pser.idxmax()
3
>>> kser.idxmax()
3
>>> pser.idxmin()
10
>>> kser.idxmin()
10

codecov-io · 2019-11-26T02:56:28Z

Codecov Report

Merging #1078 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1078      +/-   ##
==========================================
- Coverage    95.2%   95.19%   -0.02%     
==========================================
  Files          34       34              
  Lines        6868     6935      +67     
==========================================
+ Hits         6539     6602      +63     
- Misses        329      333       +4

Impacted Files	Coverage Δ
databricks/koalas/series.py	`96.5% <100%> (-0.04%)`	⬇️
databricks/koalas/config.py	`98.98% <0%> (-1.02%)`	⬇️
databricks/koalas/base.py	`94.88% <0%> (-0.36%)`	⬇️
databricks/koalas/utils.py	`98.15% <0%> (-0.02%)`	⬇️
databricks/koalas/missing/frame.py	`100% <0%> (ø)`	⬆️
databricks/koalas/missing/indexes.py	`100% <0%> (ø)`	⬆️
databricks/koalas/missing/series.py	`100% <0%> (ø)`	⬆️
databricks/koalas/frame.py	`96.76% <0%> (+0.02%)`	⬆️
databricks/koalas/indexes.py	`96.2% <0%> (+0.03%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80e9ebe...9b41867. Read the comment docs.

ueshin

Hm, this current implementation runs Spark jobs three times.
I guess we should just use F.monotonically_increasing_id() instead of index_scols in the previous PR #1065?

databricks/koalas/series.py

HyukjinKwon · 2019-12-02T06:23:26Z

Can you make tests passed at Python 3.5 too @itholic?

databricks/koalas/series.py

ueshin · 2019-12-03T22:14:54Z

What do you think about #1078 (review)?

I guess we should just use F.monotonically_increasing_id() instead of index_scols in the previous PR #1065?

itholic · 2019-12-04T02:09:37Z

@ueshin Ah! sorry i got missed that comment. 😢

It makes sense. Thanks for the comment i fixed it!! 👍

softagram-bot · 2019-12-04T02:11:02Z

Softagram Impact Report for pull/1078 (head commit: `9b41867`)

⚠️ Copy paste found

ℹ️ series.py: Copy paste fragment on line 1249 shared with ../frame.py:


    def to_latex(self, buf=None, columns=None, col_space=None, header=True, index=True,
                 na_rep='NaN',...(truncated 256 chars)

ℹ️ series.py: Copy paste fragment inside the same file on lines 3186, 3294:

        results = sdf.select([scol] + index_scols).take(1)
        if len(results) == 0:
           ...(truncated 409 chars)

ℹ️ series.py: Copy paste fragment inside the same file on lines 4075, 4229:

        sdf = self._internal.sdf \
            .select(cols) \
            .where(reduce(lambda x, y: x & y, rows))

        if len(self._inter...(truncated 255 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 312, 329:

                     pd.Series([True, False], name='x'),
                     pd.Series([0, 1], name='x'),
                     pd.Series([1, 2,...(truncated 330 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 731, 894:

                              ['speed', 'weight', 'length']],
                             [[0, 0, 0, 1, 1, 1, 2, 2, 2],
                      ...(truncated 117 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 817, 834:


        pser1 = pd.Series([-1, -2, -3, -4, -5], name=0)
        pser2 = pd.Series([-100, -200, -300, -400, -500], name=0)
        k...(truncated 123 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 180, 190:

        pdf = pd.DataFrame({
            'left':  [True, False, True, False, np.nan, np.nan, True, False, np.nan],
            'right': [True, False, False, True, True, False, n...(truncated 119 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 778, 896:

                              [0, 1, 2, 0, 1, 2, 0, 1, 2]])
        kser = ks.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3],
             ...(truncated 145 chars)

ℹ️ test_series.py: Copy paste fragment inside the same file on lines 649, 674:


        index = pd.MultiIndex.from_arrays([
            ['a', 'a', 'b', 'b'], ['c', 'd', 'e', 'f']], names=('first', 'se...(truncated 151 chars)

Now that you are on the file, it would be easier to pay back some tech. debt.

⭐ Change Overview

(Open in Softagram Desktop for full details)

💡 Insights

Co-change Alert: You modified series.py. Often frame.py (databricks/koalas) is modified at the same time.

📄 Full report

Permalink: Full report for pull/1078

Impact Report explained. Give feedback on this report to [email protected]

ueshin

LGTM.

ueshin · 2019-12-04T18:47:15Z

Thanks! merging.

itholic · 2019-12-04T23:00:18Z

Thanks :)

itholic added 5 commits November 22, 2019 12:02

Fix Series.idxmax() to make result properly

c409d77

fix doc

709b168

fix idxmin also

9ec5eea

Fix idxmax/idxmin for Seires work properly

7708b0a

add test cases

1d6e772

ueshin reviewed Nov 27, 2019

View reviewed changes

databricks/koalas/series.py Outdated Show resolved Hide resolved

move exception logic to proper place

e4c8843

Empty commit for build test

a0e3346

HyukjinKwon reviewed Dec 3, 2019

View reviewed changes

databricks/koalas/series.py Show resolved Hide resolved

fix

9b41867

ueshin approved these changes Dec 4, 2019

View reviewed changes

ueshin merged commit 17d067d into databricks:master Dec 4, 2019

itholic deleted the fix_s_idxmax branch December 10, 2019 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix idxmax() / idxmin() for Series work properly #1078

Fix idxmax() / idxmin() for Series work properly #1078

itholic commented Nov 26, 2019

codecov-io commented Nov 26, 2019 •

edited

Loading

ueshin left a comment

HyukjinKwon commented Dec 2, 2019

ueshin commented Dec 3, 2019

itholic commented Dec 4, 2019 •

edited

Loading

softagram-bot commented Dec 4, 2019

ueshin left a comment

ueshin commented Dec 4, 2019

itholic commented Dec 4, 2019

Fix idxmax() / idxmin() for Series work properly #1078

Fix idxmax() / idxmin() for Series work properly #1078

Conversation

itholic commented Nov 26, 2019

codecov-io commented Nov 26, 2019 • edited Loading

Codecov Report

ueshin left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Dec 2, 2019

ueshin commented Dec 3, 2019

itholic commented Dec 4, 2019 • edited Loading

softagram-bot commented Dec 4, 2019

Softagram Impact Report for pull/1078 (head commit: 9b41867)

⚠️ Copy paste found

⭐ Change Overview

💡 Insights

📄 Full report

ueshin left a comment

Choose a reason for hiding this comment

ueshin commented Dec 4, 2019

itholic commented Dec 4, 2019

codecov-io commented Nov 26, 2019 •

edited

Loading

itholic commented Dec 4, 2019 •

edited

Loading

Softagram Impact Report for pull/1078 (head commit: `9b41867`)