Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Frame scans #9021

Merged
merged 13 commits into from
Aug 17, 2021
Merged

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Aug 12, 2021

This pull request is a substantial refactor of the internals of scan operations like cummax and cumsum. The new implementation moves nearly all logic to the Frame level. The resulting code improves performance and adds support for new features. In particular:

  • For data sizes where Python overheads dominate, Series operations are now 10-20% faster. More importantly, DataFrame operations are 2-3x faster.
  • Prefix sums are now automatically supported for Index types as well.
  • Prefix sums for DataFrame now support axis=1 (previously only reductions like sum did so).
  • Total code is halved

@vyasr vyasr added 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. Performance Performance related issue tech debt improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 12, 2021
@vyasr vyasr added this to the CuDF Python Refactoring milestone Aug 12, 2021
@vyasr vyasr self-assigned this Aug 12, 2021
@vyasr vyasr requested a review from a team as a code owner August 12, 2021 00:06
@vyasr
Copy link
Contributor Author

vyasr commented Aug 12, 2021

Here are some detailed performance numbers for comparison.

Benchmarks

Before:

------------------------------------------------------------------------------------------------------- benchmark: 36 tests --------------------------------------------------------------------------------------------------------
Name (time in us)                                       Min                   Max                  Mean              StdDev                Median                 IQR            Outliers          OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_scans[False-1000-Series-cummax]                73.9228 (1.0)      3,848.2994 (33.55)       76.0459 (1.0)       45.6918 (23.80)       75.0646 (1.0)        0.3986 (1.0)         2;810  13,149.9479 (1.0)        6845           1
test_scans[False-100000-Series-cummax]              77.8828 (1.05)       114.6924 (1.0)         80.8747 (1.06)       2.3472 (1.22)        80.5482 (1.07)       3.4459 (8.64)       687;78  12,364.8120 (0.94)       7395           1
test_scans[True-1000-Series-cummax]                 81.4851 (1.10)       119.8649 (1.05)        84.4076 (1.11)       4.6344 (2.41)        82.3531 (1.10)       4.4983 (11.29)     312;262  11,847.2782 (0.90)       6267           1
test_scans[False-1000-Series-cumprod]               91.3497 (1.24)       126.5109 (1.10)        94.5035 (1.24)       2.9042 (1.51)        92.9190 (1.24)       4.3111 (10.82)      445;64  10,581.6148 (0.80)       5700           1
test_scans[False-100000-Series-cumsum]              93.8084 (1.27)       131.5419 (1.15)        96.9635 (1.28)       3.6539 (1.90)        95.2389 (1.27)       4.3986 (11.04)     360;186  10,313.1559 (0.78)       6644           1
test_scans[True-100000-Series-cummax]               93.8661 (1.27)       124.8010 (1.09)        95.8134 (1.26)       2.4631 (1.28)        95.2166 (1.27)       0.5364 (1.35)      485;796  10,436.9496 (0.79)       5610           1
test_scans[False-1000-Series-cumsum]                93.9965 (1.27)       149.2389 (1.30)        95.3764 (1.25)       1.9201 (1.0)         94.9819 (1.27)       0.5579 (1.40)       95;210  10,484.7784 (0.80)       2001           1
test_scans[False-100000-Series-cumprod]             95.0806 (1.29)     2,134.1927 (18.61)       98.3298 (1.29)      24.9408 (12.99)       96.5036 (1.29)       4.2506 (10.66)        2;66  10,169.8576 (0.77)       6745           1
test_scans[True-1000-Series-cumsum]                 98.0236 (1.33)       139.5363 (1.22)       100.5975 (1.32)       3.5820 (1.87)        99.0704 (1.32)       1.9060 (4.78)     580;1030   9,940.6023 (0.76)       4901           1
test_scans[True-1000-Series-cumprod]                99.6497 (1.35)       133.0208 (1.16)       102.0304 (1.34)       3.2772 (1.71)       100.7300 (1.34)       0.9914 (2.49)     907;1019   9,801.0051 (0.75)       5197           1
test_scans[True-100000-Series-cumprod]             110.1997 (1.49)     2,355.5607 (20.54)      114.8942 (1.51)      32.3755 (16.86)      113.9380 (1.52)       1.1288 (2.83)        4;931   8,703.6628 (0.66)       4842           1
test_scans[True-100000-Series-cumsum]              110.6542 (1.50)       156.7807 (1.37)       114.4759 (1.51)       4.0665 (2.12)       112.3939 (1.50)       5.4725 (13.73)     390;119   8,735.4640 (0.66)       4828           1
test_scans[False-1000-DataFrame-cummax]            182.8000 (2.47)     2,533.2738 (22.09)      186.9722 (2.46)      37.6408 (19.60)      185.2484 (2.47)       1.4082 (3.53)        3;459   5,348.3878 (0.41)       3934           1
test_scans[False-100000-DataFrame-cummax]          183.7574 (2.49)       228.1386 (1.99)       187.2274 (2.46)       3.8018 (1.98)       186.1509 (2.48)       1.4226 (3.57)      381;454   5,341.0974 (0.41)       3931           1
test_scans[True-1000-DataFrame-cummax]             199.2304 (2.70)       265.1550 (2.31)       205.1653 (2.70)       6.4100 (3.34)       202.4323 (2.70)       2.6878 (6.74)      581;674   4,874.1174 (0.37)       3275           1
test_scans[False-1000-DataFrame-cumsum]            204.5929 (2.77)       504.9799 (4.40)       217.7285 (2.86)       7.6352 (3.98)       217.0028 (2.89)       2.0247 (5.08)      316;652   4,592.8769 (0.35)       2430           1
test_scans[False-1000-DataFrame-cumprod]           205.3790 (2.78)       252.2934 (2.20)       216.1440 (2.84)       7.8782 (4.10)       217.7618 (2.90)      15.6504 (39.26)      1703;7   4,626.5449 (0.35)       3411           1
test_scans[False-100000-DataFrame-cumsum]          206.2637 (2.79)     3,398.9083 (29.63)      215.9525 (2.84)      66.6541 (34.71)      211.0600 (2.81)      10.6506 (26.72)        3;25   4,630.6477 (0.35)       3605           1
test_scans[False-100000-DataFrame-cumprod]         207.3105 (2.80)       259.6062 (2.26)       212.0353 (2.79)       5.7615 (3.00)       209.9983 (2.80)       1.6359 (4.10)      437;568   4,716.1966 (0.36)       3485           1
test_scans[True-100000-DataFrame-cummax]           209.2980 (2.83)       261.1242 (2.28)       216.8990 (2.85)       6.5191 (3.40)       213.2459 (2.84)      10.5426 (26.45)      588;38   4,610.4413 (0.35)       3234           1
test_scans[True-1000-DataFrame-cumsum]             220.4292 (2.98)       280.0953 (2.44)       229.8392 (3.02)       6.1747 (3.22)       227.7326 (3.03)       7.8068 (19.59)      734;50   4,350.8680 (0.33)       3057           1
test_scans[True-1000-DataFrame-cumprod]            225.4657 (3.05)       266.9841 (2.33)       233.0897 (3.07)       6.8478 (3.57)       228.9843 (3.05)      12.1044 (30.37)      783;17   4,290.1943 (0.33)       2937           1
test_scans[True-100000-DataFrame-cumsum]           234.5908 (3.17)       287.4155 (2.51)       243.6458 (3.20)       7.6356 (3.98)       238.9885 (3.18)      12.6273 (31.68)      359;39   4,104.3193 (0.31)       2891           1
test_scans[True-100000-DataFrame-cumprod]          235.6097 (3.19)       304.1718 (2.65)       243.8593 (3.21)       7.3361 (3.82)       239.7355 (3.19)      11.7868 (29.57)      522;31   4,100.7257 (0.31)       2974           1
test_scans[False-10000000-DataFrame-cumprod]     1,749.1747 (23.66)    5,481.5058 (47.79)    2,159.4166 (28.40)    235.4324 (122.62)   2,104.5655 (28.04)      7.5297 (18.89)      10;142     463.0880 (0.04)        525           1
test_scans[False-10000000-DataFrame-cumsum]      1,960.0447 (26.51)    5,224.4812 (45.55)    2,126.4473 (27.96)    214.0582 (111.48)   2,104.0002 (28.03)      2.9579 (7.42)         7;92     470.2679 (0.04)        514           1
test_scans[False-10000000-Series-cumprod]        2,009.2838 (27.18)    2,204.7088 (19.22)    2,104.2421 (27.67)     14.7170 (7.66)     2,104.4854 (28.04)      2.7847 (6.99)        34;91     475.2305 (0.04)        503           1
test_scans[False-10000000-Series-cumsum]         2,014.4917 (27.25)    4,334.9601 (37.80)    2,111.5678 (27.77)    109.4375 (57.00)    2,104.6009 (28.04)      8.8657 (22.24)       1;139     473.5818 (0.04)        449           1
test_scans[False-10000000-Series-cummax]         2,031.5275 (27.48)    7,232.2730 (63.06)    2,121.1100 (27.89)    254.0227 (132.30)   2,104.4593 (28.04)      3.7970 (9.53)         3;92     471.4513 (0.04)        501           1
test_scans[False-10000000-DataFrame-cummax]      2,244.7575 (30.37)    4,894.4242 (42.67)    2,290.3809 (30.12)    123.2371 (64.18)    2,281.3752 (30.39)      8.6967 (21.82)        4;27     436.6086 (0.03)        462           1
test_scans[True-10000000-DataFrame-cummax]       2,646.8299 (35.81)    5,902.1506 (51.46)    3,103.7298 (40.81)    167.3519 (87.16)    3,090.8491 (41.18)      7.5642 (18.98)        3;36     322.1930 (0.02)        310           1
test_scans[True-10000000-Series-cumprod]         2,878.8857 (38.94)    3,174.3255 (27.68)    3,007.6830 (39.55)     17.3248 (9.02)     3,005.1908 (40.03)      6.4801 (16.26)       16;26     332.4818 (0.03)        306           1
test_scans[True-10000000-Series-cumsum]          2,917.6325 (39.47)    3,622.0588 (31.58)    3,206.7022 (42.17)    230.8731 (120.24)   3,036.6564 (40.45)    268.5534 (673.73)       50;0     311.8469 (0.02)        232           1
test_scans[True-10000000-Series-cummax]          2,988.6998 (40.43)    5,301.4960 (46.22)    3,026.5171 (39.80)    182.0607 (94.82)    3,006.4434 (40.05)     16.2595 (40.79)         3;3     330.4128 (0.03)        305           1
test_scans[True-10000000-DataFrame-cumsum]       3,011.6253 (40.74)    3,473.8686 (30.29)    3,099.0246 (40.75)     26.3214 (13.71)    3,094.0389 (41.22)     11.2504 (28.22)       12;23     322.6822 (0.02)        306           1
test_scans[True-10000000-DataFrame-cumprod]      3,068.1714 (41.51)    3,162.5628 (27.57)    3,097.9659 (40.74)     12.9873 (6.76)     3,093.1048 (41.21)     18.1273 (45.48)        76;6     322.7924 (0.02)        308           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

After:

------------------------------------------------------------------------------------------------------- benchmark: 36 tests --------------------------------------------------------------------------------------------------------
Name (time in us)                                       Min                   Max                  Mean              StdDev                Median                 IQR            Outliers          OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_scans[False-1000-Series-cumprod]               64.6412 (1.0)        200.7727 (2.11)        69.4623 (1.02)       4.1797 (3.28)        69.7784 (1.02)       3.5176 (8.78)      850;363  14,396.2982 (0.99)       7899           1
test_scans[False-1000-Series-cummax]                65.0175 (1.01)     3,964.5657 (41.62)       68.4332 (1.0)       46.5356 (36.47)       68.3218 (1.0)        2.6254 (6.56)        2;106  14,612.7912 (1.0)        7028           1
test_scans[False-1000-Series-cumsum]                67.0999 (1.04)       109.7862 (1.15)        69.1531 (1.01)       1.7904 (1.40)        69.2680 (1.01)       1.4063 (3.51)        90;50  14,460.6707 (0.99)       1671           1
test_scans[False-1000-DataFrame-cumsum]             68.2771 (1.06)       103.2911 (1.08)        72.2374 (1.06)       1.8788 (1.47)        72.3675 (1.06)       0.6706 (1.67)    1005;1123  13,843.2451 (0.95)       6207           1
test_scans[False-100000-Series-cumsum]              68.3982 (1.06)        95.8610 (1.01)        70.4691 (1.03)       2.0422 (1.60)        69.5474 (1.02)       2.5528 (6.37)      801;199  14,190.6170 (0.97)       7629           1
test_scans[False-1000-DataFrame-cummax]             68.5528 (1.06)        99.4634 (1.04)        71.9204 (1.05)       2.4497 (1.92)        72.3060 (1.06)       2.9379 (7.34)     1880;226  13,904.2522 (0.95)       7677           1
test_scans[False-100000-Series-cumprod]             68.8955 (1.07)     2,182.1279 (22.91)       72.9000 (1.07)      22.6537 (17.76)       72.5035 (1.06)       0.7190 (1.80)      12;1915  13,717.4271 (0.94)       8850           1
test_scans[False-100000-Series-cummax]              68.9626 (1.07)        95.2575 (1.0)         71.4510 (1.04)       2.1428 (1.68)        70.6334 (1.03)       2.9411 (7.34)      859;201  13,995.6003 (0.96)       8300           1
test_scans[False-1000-DataFrame-cumprod]            71.5517 (1.11)     1,280.2929 (13.44)       72.9522 (1.07)      13.6962 (10.73)       72.3302 (1.06)       0.4005 (1.0)        25;999  13,707.6088 (0.94)       8055           1
test_scans[True-1000-Series-cumprod]                71.5554 (1.11)     2,318.3301 (24.34)       74.2704 (1.09)      27.6539 (21.67)       74.0848 (1.08)       2.1979 (5.49)        2;214  13,464.3191 (0.92)       6636           1
test_scans[False-100000-DataFrame-cumsum]           71.9205 (1.11)     2,116.6652 (22.22)       74.3041 (1.09)      23.5596 (18.47)       73.1573 (1.07)       2.0321 (5.07)        6;150  13,458.2075 (0.92)       7579           1
test_scans[True-1000-Series-cummax]                 72.1365 (1.12)       193.5083 (2.03)        74.5036 (1.09)       3.0908 (2.42)        73.1740 (1.07)       3.8729 (9.67)       333;83  13,422.1679 (0.92)       6357           1
test_scans[True-1000-Series-cumsum]                 72.2948 (1.12)     1,763.6251 (18.51)       74.8690 (1.09)      21.9090 (17.17)       73.0958 (1.07)       3.6340 (9.07)        9;207  13,356.6603 (0.91)       6096           1
test_scans[False-100000-DataFrame-cummax]           75.2956 (1.16)     3,025.2747 (31.76)       76.9998 (1.13)      33.2952 (26.10)       76.3200 (1.12)       0.4219 (1.05)        2;854  12,987.0459 (0.89)       7872           1
test_scans[False-100000-DataFrame-cumprod]          75.5507 (1.17)       112.7496 (1.18)        76.8369 (1.12)       1.2759 (1.0)         76.5510 (1.12)       0.5215 (1.30)      415;652  13,014.5848 (0.89)       7988           1
test_scans[True-1000-DataFrame-cumsum]              76.5137 (1.18)       122.5155 (1.29)        80.3340 (1.17)       3.1015 (2.43)        81.2914 (1.19)       4.3195 (10.79)    1173;106  12,448.0280 (0.85)       5937           1
test_scans[True-1000-DataFrame-cumprod]             81.1964 (1.26)       110.8777 (1.16)        83.2531 (1.22)       2.2320 (1.75)        82.4127 (1.21)       1.2564 (3.14)    1195;1250  12,011.5590 (0.82)       6152           1
test_scans[True-100000-Series-cumprod]              83.6235 (1.29)     2,073.4556 (21.77)       89.4058 (1.31)      25.6466 (20.10)       89.6174 (1.31)       2.1248 (5.31)        5;780  11,184.9545 (0.77)       6051           1
test_scans[True-100000-Series-cummax]               85.0521 (1.32)       147.5997 (1.55)        87.1641 (1.27)       2.5157 (1.97)        86.3709 (1.26)       1.0133 (2.53)     904;1112  11,472.6143 (0.79)       6009           1
test_scans[True-1000-DataFrame-cummax]              85.5606 (1.32)       120.1890 (1.26)        87.0407 (1.27)       1.9773 (1.55)        86.5925 (1.27)       0.4508 (1.13)      315;634  11,488.8758 (0.79)       5879           1
test_scans[True-100000-Series-cumsum]               88.0435 (1.36)       121.3811 (1.27)        89.8735 (1.31)       1.6932 (1.33)        89.5262 (1.31)       0.7972 (1.99)      341;444  11,126.7531 (0.76)       5921           1
test_scans[True-100000-DataFrame-cumsum]            88.6358 (1.37)       130.9160 (1.37)        93.5264 (1.37)       5.7924 (4.54)        93.2962 (1.37)       4.0173 (10.03)     351;332  10,692.1681 (0.73)       5523           1
test_scans[True-100000-DataFrame-cummax]            89.0475 (1.38)       129.9642 (1.36)        91.3718 (1.34)       2.5741 (2.02)        90.3420 (1.32)       1.4841 (3.71)    1063;1146  10,944.2953 (0.75)       5575           1
test_scans[True-100000-DataFrame-cumprod]           92.4468 (1.43)       126.4028 (1.33)        94.8777 (1.39)       2.7811 (2.18)        94.1008 (1.38)       0.6538 (1.63)      370;628  10,539.8828 (0.72)       5248           1
test_scans[False-10000000-DataFrame-cumsum]      1,947.5352 (30.13)    6,618.2502 (69.48)    2,121.0710 (30.99)    240.7774 (188.72)   2,104.4873 (30.80)      3.3351 (8.33)         4;98     471.4599 (0.03)        571           1
test_scans[False-10000000-Series-cumsum]         1,983.7096 (30.69)    5,376.0968 (56.44)    2,113.7990 (30.89)    155.5040 (121.88)   2,104.8151 (30.81)      5.7686 (14.40)       1;119     473.0819 (0.03)        457           1
test_scans[False-10000000-DataFrame-cumprod]     1,998.9237 (30.92)    4,640.4134 (48.71)    2,247.1477 (32.84)    133.8755 (104.93)   2,266.8345 (33.18)     14.4150 (36.00)      85;104     445.0086 (0.03)        558           1
test_scans[False-10000000-Series-cummax]         2,004.9866 (31.02)    4,319.1724 (45.34)    2,110.9397 (30.85)    112.8906 (88.48)    2,104.5562 (30.80)      3.1278 (7.81)         2;75     473.7227 (0.03)        489           1
test_scans[False-10000000-Series-cumprod]        2,013.1078 (31.14)    4,283.4170 (44.97)    2,110.9186 (30.85)    111.6678 (87.52)    2,104.7145 (30.81)      3.3192 (8.29)         2;60     473.7274 (0.03)        486           1
test_scans[False-10000000-DataFrame-cummax]      2,257.6004 (34.93)    4,773.1586 (50.11)    2,299.7850 (33.61)    206.6277 (161.95)   2,276.8117 (33.32)     10.9696 (27.39)        6;27     434.8233 (0.03)        451           1
test_scans[True-10000000-Series-cumprod]         2,735.3931 (42.32)    3,303.3844 (34.68)    3,031.7698 (44.30)     25.7615 (20.19)    3,026.6363 (44.30)     21.0805 (52.64)        11;4     329.8403 (0.02)        340           1
test_scans[True-10000000-Series-cummax]          2,743.8831 (42.45)    3,298.3012 (34.63)    3,021.5191 (44.15)     24.7199 (19.38)    3,023.1047 (44.25)      8.2231 (20.53)        7;57     330.9593 (0.02)        307           1
test_scans[True-10000000-DataFrame-cumsum]       2,962.5632 (45.83)    3,146.6819 (33.03)    3,029.4471 (44.27)     12.9080 (10.12)    3,030.2657 (44.35)      7.1209 (17.78)       68;62     330.0932 (0.02)        324           1
test_scans[True-10000000-DataFrame-cumprod]      2,979.7908 (46.10)    4,793.8377 (50.33)    3,023.1659 (44.18)    101.2040 (79.32)    3,013.4032 (44.11)     14.5677 (36.38)         1;7     330.7791 (0.02)        314           1
test_scans[True-10000000-DataFrame-cummax]       3,001.2503 (46.43)    3,835.5403 (40.26)    3,033.7725 (44.33)     55.6535 (43.62)    3,029.7581 (44.35)      7.4301 (18.55)        3;43     329.6226 (0.02)        315           1
test_scans[True-10000000-Series-cumsum]          3,013.4544 (46.62)    3,759.3022 (39.46)    3,217.5846 (47.02)    273.7924 (214.59)   3,029.1546 (44.34)    584.2550 (>1000.0)      75;0     310.7921 (0.02)        233           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@codecov
Copy link

codecov bot commented Aug 12, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@4968a96). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 887a0ba differs from pull request most recent head d4abda9. Consider uploading reports for the commit d4abda9 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #9021   +/-   ##
===============================================
  Coverage                ?   10.66%           
===============================================
  Files                   ?      114           
  Lines                   ?    18659           
  Branches                ?        0           
===============================================
  Hits                    ?     1990           
  Misses                  ?    16669           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4968a96...d4abda9. Read the comment docs.

@vyasr
Copy link
Contributor Author

vyasr commented Aug 13, 2021

rerun tests

Copy link
Contributor

@galipremsagar galipremsagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, minor comments in pytests..

python/cudf/cudf/tests/test_dataframe.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_dataframe.py Outdated Show resolved Hide resolved
Copy link
Contributor

@marlenezw marlenezw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good to me!

@vyasr
Copy link
Contributor Author

vyasr commented Aug 17, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 368890f into rapidsai:branch-21.10 Aug 17, 2021
@vyasr vyasr deleted the refactor/frame_scans branch January 14, 2022 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Performance Performance related issue Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants