Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DataFrame.query #1273

Merged
merged 12 commits into from
Feb 14, 2020
Merged

Implement DataFrame.query #1273

merged 12 commits into from
Feb 14, 2020

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Feb 12, 2020

This PR proposes to implement DataFrame.query

>>> df = ks.DataFrame({'A': range(1, 6),
...                    'B': range(10, 0, -2),
...                    'C C': range(10, 5, -1)})
>>> df
   A   B  C C
0  1  10   10
1  2   8    9
2  3   6    8
3  4   4    7
4  5   2    6

>>> df.query('A > B')
   A  B  C C
4  5  2    6

The previous expression is equivalent to

>>> df[df.A > df.B]
   A  B  C C
4  5  2    6

For columns with spaces in their name, you can use backtick quoting.

>>> df.query('B == `C C`')
   A   B  C C
0  1  10   10

The previous expression is equivalent to

>>> df[df.B == df['C C']]
   A   B  C C
0  1  10   10

@codecov-io
Copy link

codecov-io commented Feb 13, 2020

Codecov Report

Merging #1273 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1273      +/-   ##
==========================================
- Coverage   95.14%   95.11%   -0.04%     
==========================================
  Files          35       34       -1     
  Lines        7202     7219      +17     
==========================================
+ Hits         6852     6866      +14     
- Misses        350      353       +3
Impacted Files Coverage Δ
databricks/koalas/missing/frame.py 100% <ø> (ø) ⬆️
databricks/koalas/frame.py 96.55% <100%> (-0.14%) ⬇️
databricks/koalas/usage_logging/__init__.py 97.29% <0%> (ø) ⬆️
databricks/koalas/usage_logging/usage_logger.py 100% <0%> (ø) ⬆️
databricks/conftest.py 96.22% <0%> (ø) ⬆️
databricks/koalas/__init__.py 85.1% <0%> (ø) ⬆️
databricks/koalas/namespace.py 87.87% <0%> (ø) ⬆️
databricks/koalas/plot.py 94.28% <0%> (ø) ⬆️
databricks/koalas/testing/utils.py 78.51% <0%> (ø) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8556443...8300045. Read the comment docs.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise.

@itholic
Copy link
Contributor Author

itholic commented Feb 14, 2020

@HyukjinKwon Thanks for the review!

HyukjinKwon pushed a commit that referenced this pull request Feb 18, 2020
As requested at #1273 (comment) ,

and since `map_in_pandas` (#1276 ) has been merged,

just uncommented existing doctest for DataFrame.query
ueshin added a commit that referenced this pull request Feb 19, 2020
… as expected. (#1283)

This is a follow-up of #1273.
The Spark column names are not always the same as its column label.
This PR is to rename data columns prior to filter to make sure the column names are as expected.
@ueshin ueshin mentioned this pull request Apr 9, 2020
@itholic itholic deleted the f_query branch September 10, 2020 11:48
rising-star92 added a commit to rising-star92/databricks-koalas that referenced this pull request Jan 27, 2023
As requested at databricks/koalas#1273 (comment) ,

and since `map_in_pandas` (#1276 ) has been merged,

just uncommented existing doctest for DataFrame.query
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants