Complete NumPy universal functions compat for DataFrames #1127

HyukjinKwon · 2019-12-13T04:24:48Z

This PR proposes to complete NumPy's universal functions support against DataFrame.

>>> import databricks.koalas as ks
>>> import numpy as np
>>> kdf = ks.range(10)
>>> np.log(kdf)
         id
0       NaN
1  0.000000
2  0.693147
3  1.098612
4  1.386294
5  1.609438
6  1.791759
7  1.945910
8  2.079442
9  2.197225

codecov-io · 2019-12-13T05:01:16Z

Codecov Report

Merging #1127 into master will increase coverage by <.01%.
The diff coverage is 95.65%.

@@            Coverage Diff             @@
##           master    #1127      +/-   ##
==========================================
+ Coverage   95.15%   95.15%   +<.01%     
==========================================
  Files          35       35              
  Lines        7017     7039      +22     
==========================================
+ Hits         6677     6698      +21     
- Misses        340      341       +1

Impacted Files	Coverage Δ
databricks/koalas/frame.py	`96.81% <95.65%> (-0.02%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb763ea...1737eb0. Read the comment docs.

HyukjinKwon · 2019-12-13T07:09:41Z

databricks/koalas/tests/test_numpy_compat.py

+        # Test only top 5 for now. 'compute.ops_on_diff_frames' option increases too much time.
+        try:
+            set_option('compute.ops_on_diff_frames', True)
+            for np_name, spark_func in list(binary_np_spark_mappings.items())[:5]:


I checked all the tests pass for its complete list.

Hmm, in my local, right_shift function always fails.

>>> pdf a b 0 65 83 1 50 -14 2 -95 35 3 98 97 4 -19 52 5 -39 53 6 -60 45 7 37 51 8 -25 7 9 -11 24 10 34 8 11 -53 8 12 -6 -12 13 -40 -71 14 -14 -36 15 -62 -98 16 93 -38 17 -81 56 18 45 27 19 84 97 20 38 46 21 -30 -76 22 -29 -24 23 -69 -67 >>> np.right_shift(pdf, pdf) a b 0 0 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 10 0 0 11 0 0 12 0 0 13 0 0 14 0 0 15 0 0 16 0 0 17 0 0 18 0 0 19 0 0 20 0 0 21 0 0 22 0 0 23 0 0

whereas

>>> np.right_shift(kdf, kdf) a b 0 32 0 1 0 -1 2 -1 0 3 0 0 4 -1 0 5 -1 0 6 -4 0 7 0 0 8 -1 0 9 -1 0 10 0 0 11 -1 0 12 -1 -1 13 -1 -1 14 -1 -1 15 -16 -1 16 0 -1 17 -1 0 18 0 0 19 0 0 20 0 0 21 -1 -1 22 -1 -1 23 -1 -1

np.right_shift(kdf, 1)

seems fine.

Hm, they are same in my local:

>>> np.right_shift(pdf, pdf) a b 0 0 0 1 0 0 2 -1 0 3 0 -1 4 -1 0 5 0 0 6 -1 -4 7 -1 -1 8 0 -1 9 16 -1 10 16 -1 11 0 -1 12 -1 -1 13 -1 -1 14 0 -2 15 1 -1 16 -1 0 17 0 -1 18 0 -1 19 0 -1 20 0 -1 21 0 0 22 -1 0 23 0 0 24 -1 -1 25 0 0 26 -1 0 27 -1 0 28 0 -1 29 -8 0 30 0 -1 31 0 -1 32 -1 0 33 0 0 34 0 0 35 -1 0 36 -1 -1 37 0 -1 38 -1 -1 39 0 -2 40 -1 0 41 0 0 42 0 0 43 0 -1 44 0 0 45 -1 0 46 -1 -1 47 0 -1 48 0 -1 49 0 -1 50 -1 -1 51 0 -32 52 0 0 >>> np.right_shift(kdf, kdf) a b 0 0 0 1 0 0 2 -1 0 3 0 -1 4 -1 0 5 0 0 6 -1 -4 7 -1 -1 8 0 -1 9 16 -1 10 16 -1 11 0 -1 12 -1 -1 13 -1 -1 14 0 -2 15 1 -1 16 -1 0 17 0 -1 18 0 -1 19 0 -1 20 0 -1 21 0 0 22 -1 0 23 0 0 24 -1 -1 25 0 0 26 -1 0 27 -1 0 28 0 -1 29 -8 0 30 0 -1 31 0 -1 32 -1 0 33 0 0 34 0 0 35 -1 0 36 -1 -1 37 0 -1 38 -1 -1 39 0 -2 40 -1 0 41 0 0 42 0 0 43 0 -1 44 0 0 45 -1 0 46 -1 -1 47 0 -1 48 0 -1 49 0 -1 50 -1 -1 51 0 -32 52 0 0

Seems versions matter ... (?)

I will investigate it separately in another PR.

softagram-bot · 2019-12-16T10:55:37Z

Softagram Impact Report for pull/1127 (head commit: `1737eb0`)

⚠️ Copy paste found

ℹ️ test_numpy_compat.py: Copy paste fragment on line 50 shared with ../test_dataframe.py, ../test_indexes.py:

    def pdf(self):
        return pd.DataFrame({
            'a': [1, 2, 3, 4, 5, 6, 7, 8, 9],
            'b': [4, 5, 6, 3, 2, 1, 0, 0, 0],
        }, index=[0, 1, 3, 5, 6, 8, 9, 9, 9])...(truncated 105 chars)

ℹ️ test_numpy_compat.py: Copy paste fragment inside the same file on lines 85, 131:

        # Use randomly generated dataFrame
        pdf = pd.DataFrame(
            np.random.randint(-100, 100, size=(np.random.randint(100), 2)), columns=['...(truncated 498 chars)

ℹ️ frame.py: Copy paste fragment on line 5771 shared with ../namespace.py:

              on: Union[str, List[str], Tuple[str, ...], List[Tuple[str, ...]]] = None,
              left_on: Union[str, List[str], Tuple[s...(truncated 273 chars)

ℹ️ frame.py: Copy paste fragment inside the same file on lines 7269, 7352:


        # TODO: there is a similar logic to transpose in, for instance,
        #  DataFrame.any, Series.quantile. Maybe ...(truncated 1065 chars)

ℹ️ frame.py: Copy paste fragment inside the same file on lines 4900, 4921:

            sdf = self._sdf.select(
                self._internal.index_scols +
                [self._internal.scol_for(idx...(truncated 466 chars)

Now that you are on the file, it would be easier to pay back some tech. debt.

⭐ Change Overview

(Open in Softagram Desktop for full details)

📄 Full report

Permalink: Full report for pull/1127

Impact Report explained. Give feedback on this report to [email protected]

HyukjinKwon force-pushed the numpy-compay-frame branch 3 times, most recently from 742e285 to 67cd75c Compare December 13, 2019 04:42

HyukjinKwon requested a review from ueshin December 13, 2019 05:39

HyukjinKwon commented Dec 13, 2019

View reviewed changes

HyukjinKwon changed the title ~~Complete NumPy universal functions for DataFrames~~ Complete NumPy universal functions compat for DataFrames Dec 13, 2019

Complete NumPy universial functions for DataFrames

1737eb0

HyukjinKwon force-pushed the numpy-compay-frame branch from 67cd75c to 1737eb0 Compare December 16, 2019 10:54

HyukjinKwon merged commit 9343a1d into databricks:master Dec 16, 2019

HyukjinKwon deleted the numpy-compay-frame branch September 11, 2020 07:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complete NumPy universal functions compat for DataFrames #1127

Complete NumPy universal functions compat for DataFrames #1127

HyukjinKwon commented Dec 13, 2019

codecov-io commented Dec 13, 2019 •

edited

Loading

HyukjinKwon Dec 13, 2019

ueshin Dec 14, 2019

ueshin Dec 14, 2019

HyukjinKwon Dec 16, 2019

HyukjinKwon Dec 16, 2019

HyukjinKwon Dec 16, 2019

softagram-bot commented Dec 16, 2019

Complete NumPy universal functions compat for DataFrames #1127

Complete NumPy universal functions compat for DataFrames #1127

Conversation

HyukjinKwon commented Dec 13, 2019

codecov-io commented Dec 13, 2019 • edited Loading

Codecov Report

HyukjinKwon Dec 13, 2019

Choose a reason for hiding this comment

ueshin Dec 14, 2019

Choose a reason for hiding this comment

ueshin Dec 14, 2019

Choose a reason for hiding this comment

HyukjinKwon Dec 16, 2019

Choose a reason for hiding this comment

HyukjinKwon Dec 16, 2019

Choose a reason for hiding this comment

HyukjinKwon Dec 16, 2019

Choose a reason for hiding this comment

softagram-bot commented Dec 16, 2019

Softagram Impact Report for pull/1127 (head commit: 1737eb0)

⚠️ Copy paste found

⭐ Change Overview

📄 Full report

codecov-io commented Dec 13, 2019 •

edited

Loading

Softagram Impact Report for pull/1127 (head commit: `1737eb0`)