Implement DataFrame.map_in_pandas #1276

HyukjinKwon · 2020-02-13T09:14:01Z

This PR implements a Koalas specific API in order to directly use pandas DataFrame APIs.

For pandas Series APIs, we have a workaround for instance:

>>> ks.range(10)[['id']].apply(lambda series: series)['id']
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
Name: id, dtype: int64

however, there is no way to use the APIs in the pandas DataFrame.

There is a similar API called map_partitions in Dask and this API is inspired by it.

codecov-io · 2020-02-13T09:35:52Z

Codecov Report

Merging #1276 into master will decrease coverage by 0.06%.
The diff coverage is 92.85%.

@@            Coverage Diff             @@
##           master    #1276      +/-   ##
==========================================
- Coverage   95.15%   95.09%   -0.07%     
==========================================
  Files          34       34              
  Lines        7208     7236      +28     
==========================================
+ Hits         6859     6881      +22     
- Misses        349      355       +6

Impacted Files	Coverage Δ
databricks/koalas/frame.py	`96.47% <92.85%> (-0.06%)`	⬇️
databricks/koalas/usage_logging/__init__.py	`97.29% <0%> (ø)`	⬆️
databricks/koalas/usage_logging/usage_logger.py	`100% <0%> (ø)`	⬆️
databricks/koalas/__init__.py	`85.1% <0%> (-8.52%)`	⬇️
databricks/conftest.py	`96.22% <0%> (ø)`	⬆️
databricks/koalas/namespace.py	`87.87% <0%> (ø)`	⬆️
databricks/koalas/plot.py	`94.28% <0%> (ø)`	⬆️
databricks/koalas/testing/utils.py	`78.51% <0%> (ø)`	⬆️
databricks/koalas/groupby.py	`91.43% <0%> (ø)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6662ae6...fa8db4d. Read the comment docs.

HyukjinKwon · 2020-02-17T08:49:55Z

@ueshin, let me merge this for now. The implementation is similar with DataFrame.apply so there won't be too many things to closely review.

Also, this PR has a reference from Dask, it allows many workarounds for example:

        >>> def take_func(pdf) -> ks.DataFrame[int, int]:
        ...     return pdf.query('A == 1')
        >>> df.map_in_pandas(take_func)

Let me know if you have any concern!

As requested at #1273 (comment) , and since `map_in_pandas` (#1276 ) has been merged, just uncommented existing doctest for DataFrame.query

Implement DataFrame.map_in_pandas

fa8db4d

HyukjinKwon requested a review from ueshin February 13, 2020 09:14

HyukjinKwon mentioned this pull request Feb 13, 2020

Implement DataFrame.query #1273

Merged

HyukjinKwon merged commit 0799c83 into databricks:master Feb 17, 2020

itholic mentioned this pull request Feb 18, 2020

[MINOR] Uncomment doctest for DataFrame.query #1291

Merged

HyukjinKwon pushed a commit that referenced this pull request Feb 18, 2020

Uncomment doctest for DataFrame.query (#1291)

a2c5a22

As requested at #1273 (comment) , and since `map_in_pandas` (#1276 ) has been merged, just uncommented existing doctest for DataFrame.query

HyukjinKwon deleted the map_in_pandas branch September 11, 2020 07:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement DataFrame.map_in_pandas #1276

Implement DataFrame.map_in_pandas #1276

HyukjinKwon commented Feb 13, 2020

codecov-io commented Feb 13, 2020 •

edited

Loading

HyukjinKwon commented Feb 17, 2020

Implement DataFrame.map_in_pandas #1276

Implement DataFrame.map_in_pandas #1276

Conversation

HyukjinKwon commented Feb 13, 2020

codecov-io commented Feb 13, 2020 • edited Loading

Codecov Report

HyukjinKwon commented Feb 17, 2020

codecov-io commented Feb 13, 2020 •

edited

Loading