Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements Series.filter #1511

Merged
merged 5 commits into from
May 28, 2020
Merged

Conversation

beobest2
Copy link
Contributor

Implementing Series.filter

>>> kser = ks.Series([0, 1, 2], index=['one', 'two', 'three'])

>>> kser
one      0
two      1
three    2
Name: 0, dtype: int64

>>> kser.filter(items=['one', 'three'])
one      0
three    2
Name: 0, dtype: int64

>>> kser.filter(regex='e$')
one      0
three    2
Name: 0, dtype: int64

>>> kser.filter(like='hre')
three    2
Name: 0, dtype: int64

@beobest2 beobest2 changed the title Implement Series.filter Implements Series.filter May 18, 2020
@beobest2 beobest2 force-pushed the add_series_filter branch 2 times, most recently from a82cff3 to 8a0b9dc Compare May 20, 2020 14:26
@codecov-commenter
Copy link

codecov-commenter commented May 27, 2020

Codecov Report

Merging #1511 into master will increase coverage by 0.00%.
The diff coverage is 98.66%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1511   +/-   ##
=======================================
  Coverage   94.19%   94.20%           
=======================================
  Files          38       38           
  Lines        8595     8608   +13     
=======================================
+ Hits         8096     8109   +13     
  Misses        499      499           
Impacted Files Coverage Δ
databricks/koalas/frame.py 95.68% <ø> (-0.08%) ⬇️
databricks/koalas/missing/series.py 100.00% <ø> (ø)
databricks/koalas/generic.py 97.05% <98.66%> (+0.45%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3f08d80...b3791e3. Read the comment docs.

@HyukjinKwon
Copy link
Member

Ah, #1512 caused a conflict here. Can you rebase and resolve the conflicts please?

@beobest2 beobest2 force-pushed the add_series_filter branch 2 times, most recently from 34abf72 to 153b930 Compare May 28, 2020 12:01
@beobest2
Copy link
Contributor Author

@HyukjinKwon please check this.

  1. Resolving code conflicts
  2. Remove filter of frame.py
  3. Add multi-index support implemented at 4a18a09
  4. Add multi-index test cases in Series

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @beobest2

@HyukjinKwon HyukjinKwon merged commit 8fb85de into databricks:master May 28, 2020
@ueshin
Copy link
Collaborator

ueshin commented May 28, 2020

Sorry for the late review.
Actually I'd say it's not good to move the function to generic.
I'll submit a PR to fix it soon.

Btw, this broke the docstring, I'll submit the PR to fix it anyway.

the specified index.
Note that this routine does not filter a dataframe on its
contents. The filter is applied to the labels of the index.
Parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah .. I didn't notice the newlines disappeared here.. it breaks the doc rendering. Let's be careful next time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oppse I'll be careful next time. I'm Sorry

@HyukjinKwon
Copy link
Member

HyukjinKwon commented May 29, 2020

Okay, I had a offline discussion. We have been following pandas' structure so far. However, now it became kind of a bit weird for some instances in generic.py as we should handle both differently with an if-else.
We can avoid an if-else on self in generic.py which actually looks a bit odds, but only share the common codes.

These will be cleaned up in a separate PR.

HyukjinKwon pushed a commit that referenced this pull request May 29, 2020
@itholic
Copy link
Contributor

itholic commented May 29, 2020

@HyukjinKwon Thanks for sharing, makes sense.

@ueshin
Copy link
Collaborator

ueshin commented May 29, 2020

I don't look into all of the function in generic.py file yet so I'm not sure we should move all of them to the subclasses, but here is a pattern especially I'd avoid in generic.py:

def some_function(self, ...):

    ... preprocess ...

    if isinstance(self, Series):
        kdf = self.to_frame()
    else:
        kdf = self

    ... working with kdf and the result is ret ...

    if isinstance(self, Series):
        return first_series(ret)
    else:
        return ret

Then we should move it to DataFrame and make it work with only DataFrame, then we can add something like the following in Series:

def some_function(self, ...):

    ... preprocess ...

    return first_series(self.to_frame().some_function(...))

@ueshin
Copy link
Collaborator

ueshin commented May 29, 2020

Anyway I'll submit PRs when I encounter the pattern.

@beobest2 beobest2 deleted the add_series_filter branch June 15, 2020 17:10
rising-star92 added a commit to rising-star92/databricks-koalas that referenced this pull request Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants