Implement DataFrame.query #1273

itholic · 2020-02-12T04:18:02Z

This PR proposes to implement DataFrame.query

>>> df = ks.DataFrame({'A': range(1, 6),
...                    'B': range(10, 0, -2),
...                    'C C': range(10, 5, -1)})
>>> df
   A   B  C C
0  1  10   10
1  2   8    9
2  3   6    8
3  4   4    7
4  5   2    6

>>> df.query('A > B')
   A  B  C C
4  5  2    6

The previous expression is equivalent to

>>> df[df.A > df.B]
   A  B  C C
4  5  2    6

For columns with spaces in their name, you can use backtick quoting.

>>> df.query('B == `C C`')
   A   B  C C
0  1  10   10

The previous expression is equivalent to

>>> df[df.B == df['C C']]
   A   B  C C
0  1  10   10

databricks/koalas/frame.py

codecov-io · 2020-02-13T02:43:59Z

Codecov Report

Merging #1273 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1273      +/-   ##
==========================================
- Coverage   95.14%   95.11%   -0.04%     
==========================================
  Files          35       34       -1     
  Lines        7202     7219      +17     
==========================================
+ Hits         6852     6866      +14     
- Misses        350      353       +3

Impacted Files	Coverage Δ
databricks/koalas/missing/frame.py	`100% <ø> (ø)`	⬆️
databricks/koalas/frame.py	`96.55% <100%> (-0.14%)`	⬇️
databricks/koalas/usage_logging/__init__.py	`97.29% <0%> (ø)`	⬆️
databricks/koalas/usage_logging/usage_logger.py	`100% <0%> (ø)`	⬆️
databricks/conftest.py	`96.22% <0%> (ø)`	⬆️
databricks/koalas/__init__.py	`85.1% <0%> (ø)`	⬆️
databricks/koalas/namespace.py	`87.87% <0%> (ø)`	⬆️
databricks/koalas/plot.py	`94.28% <0%> (ø)`	⬆️
databricks/koalas/testing/utils.py	`78.51% <0%> (ø)`	⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8556443...8300045. Read the comment docs.

databricks/koalas/frame.py

databricks/koalas/tests/test_dataframe.py

databricks/koalas/frame.py

HyukjinKwon

Looks good otherwise.

itholic · 2020-02-14T08:00:28Z

@HyukjinKwon Thanks for the review!

databricks/koalas/frame.py

As requested at #1273 (comment) , and since `map_in_pandas` (#1276 ) has been merged, just uncommented existing doctest for DataFrame.query

… as expected. (#1283) This is a follow-up of #1273. The Spark column names are not always the same as its column label. This PR is to rename data columns prior to filter to make sure the column names are as expected.

As requested at databricks/koalas#1273 (comment) , and since `map_in_pandas` (#1276 ) has been merged, just uncommented existing doctest for DataFrame.query

itholic added 4 commits February 12, 2020 13:09

Implement DataFrame.query

2f5e32b

Fix docstring

9a3666f

add repr

f196fc4

change col_name

399942d

HyukjinKwon reviewed Feb 13, 2020

View reviewed changes

databricks/koalas/frame.py Show resolved Hide resolved

change col_name

89fde46

itholic added 2 commits February 13, 2020 13:06

exception for MultiIndex columns

fe84919

Adding note

8ac3fde

HyukjinKwon reviewed Feb 13, 2020

View reviewed changes

databricks/koalas/frame.py Outdated Show resolved Hide resolved

HyukjinKwon reviewed Feb 13, 2020

View reviewed changes

databricks/koalas/frame.py Outdated Show resolved Hide resolved

Fixing note

59a0bdd

HyukjinKwon reviewed Feb 13, 2020

View reviewed changes

databricks/koalas/tests/test_dataframe.py Show resolved Hide resolved

HyukjinKwon reviewed Feb 13, 2020

View reviewed changes

databricks/koalas/frame.py Show resolved Hide resolved

itholic added 2 commits February 14, 2020 10:36

Add workaround example with map_in_pandas

33a12ef

SKIP doctest for map_in_pandas

bacd984

HyukjinKwon reviewed Feb 14, 2020

View reviewed changes

databricks/koalas/frame.py Outdated Show resolved Hide resolved

add indentation to example of docstring

f46be02

HyukjinKwon reviewed Feb 14, 2020

View reviewed changes

databricks/koalas/frame.py Outdated Show resolved Hide resolved

HyukjinKwon reviewed Feb 14, 2020

View reviewed changes

databricks/koalas/frame.py Outdated Show resolved Hide resolved

HyukjinKwon approved these changes Feb 14, 2020

View reviewed changes

fix docstring

8300045

HyukjinKwon merged commit d536a51 into databricks:master Feb 14, 2020

ueshin mentioned this pull request Feb 14, 2020

Rename data columns prior to filter to make sure the column names are as expected. #1283

Merged

HyukjinKwon reviewed Feb 17, 2020

View reviewed changes

databricks/koalas/frame.py Show resolved Hide resolved

itholic mentioned this pull request Feb 18, 2020

[MINOR] Uncomment doctest for DataFrame.query #1291

Merged

HyukjinKwon pushed a commit that referenced this pull request Feb 18, 2020

Uncomment doctest for DataFrame.query (#1291)

a2c5a22

As requested at #1273 (comment) , and since `map_in_pandas` (#1276 ) has been merged, just uncommented existing doctest for DataFrame.query

ueshin mentioned this pull request Apr 9, 2020

DataFrame.query #876

Closed

itholic deleted the f_query branch September 10, 2020 11:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement DataFrame.query #1273

Implement DataFrame.query #1273

itholic commented Feb 12, 2020

codecov-io commented Feb 13, 2020 •

edited

Loading

HyukjinKwon left a comment

itholic commented Feb 14, 2020

Implement DataFrame.query #1273

Implement DataFrame.query #1273

Conversation

itholic commented Feb 12, 2020

codecov-io commented Feb 13, 2020 • edited Loading

Codecov Report

HyukjinKwon left a comment

Choose a reason for hiding this comment

itholic commented Feb 14, 2020

codecov-io commented Feb 13, 2020 •

edited

Loading