Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the binary_ops common dispatcher logic to be executed on the CPU #9816

Conversation

robertmaynard
Copy link
Contributor

@robertmaynard robertmaynard commented Dec 1, 2021

Fixes #9813

This builds on the work of #9802, and address #9813

In short we see significantly better performance for IMBALANCED, ADD, SUB, MUL, and DIV with no significant change in other binary_ops

@robertmaynard robertmaynard added ! - Hotfix Hotfix is a bug that affects the majority of users for which there is no reasonable workaround Performance Performance related issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 1, 2021
@robertmaynard robertmaynard requested review from a team as code owners December 1, 2021 23:08
@github-actions github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels Dec 1, 2021
@robertmaynard
Copy link
Contributor Author

Performance comparison:

BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000/1/manual_time                                           -0.7842         -0.5238             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000/2/manual_time                                           -0.8191         -0.6653             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000/5/manual_time                                           -0.8563         -0.7844             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000/10/manual_time                                          -0.8681         -0.8350             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/1000000/1/manual_time                                          -0.9151         -0.8787             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/1000000/2/manual_time                                          -0.9206         -0.8999             1             0             1             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/1000000/5/manual_time                                          -0.9228         -0.9150             2             0             2             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/1000000/10/manual_time                                         -0.9243         -0.9200             3             0             3             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/10000000/1/manual_time                                         -0.9355         -0.9310             3             0             3             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/10000000/2/manual_time                                         -0.9360         -0.9337             7             0             7             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/10000000/5/manual_time                                         -0.9362         -0.9353            17             1            17             1
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/10000000/10/manual_time                                        -0.9365         -0.9360            34             2            34             2
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000000/1/manual_time                                        -0.9376         -0.9372            34             2            34             2
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000000/2/manual_time                                        -0.9378         -0.9375            68             4            68             4
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000000/5/manual_time                                        -0.9378         -0.9377           169            11           169            11
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000000/10/manual_time                                       -0.9382         -0.9381           340            21           340            21
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000/1/manual_time                                             -0.8133         -0.5300             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000/2/manual_time                                             -0.8554         -0.6925             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000/5/manual_time                                             -0.8842         -0.8077             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000/10/manual_time                                            -0.8880         -0.8472             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/1000000/1/manual_time                                            -0.9367         -0.8969             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/1000000/2/manual_time                                            -0.9307         -0.9089             1             0             1             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/1000000/5/manual_time                                            -0.9271         -0.9191             2             0             2             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/1000000/10/manual_time                                           -0.9263         -0.9219             3             0             3             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/10000000/1/manual_time                                           -0.9524         -0.9474             3             0             3             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/10000000/2/manual_time                                           -0.9443         -0.9419             7             0             7             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/10000000/5/manual_time                                           -0.9397         -0.9387            17             1            17             1
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/10000000/10/manual_time                                          -0.9381         -0.9376            33             2            33             2
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000000/1/manual_time                                          -0.9542         -0.9537            33             1            33             2
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000000/2/manual_time                                          -0.9460         -0.9458            66             4            67             4
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000000/5/manual_time                                          -0.9410         -0.9409           168            10           168            10
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000000/10/manual_time                                         -0.9395         -0.9395           337            20           337            20
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000/1/manual_time                                           -0.7095         -0.4930             0             0             0             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000/2/manual_time                                           -0.7715         -0.6440             0             0             0             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000/5/manual_time                                           -0.8225         -0.7641             0             0             0             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000/10/manual_time                                          -0.8438         -0.8128             0             0             0             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/1000000/1/manual_time                                          -0.8652         -0.8280             0             0             0             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/1000000/2/manual_time                                          -0.8668         -0.8457             1             0             1             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/1000000/5/manual_time                                          -0.8677         -0.8599             2             0             2             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/1000000/10/manual_time                                         -0.8681         -0.8639             3             0             3             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/10000000/1/manual_time                                         -0.8812         -0.8770             4             0             4             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/10000000/2/manual_time                                         -0.8814         -0.8793             7             1             7             1
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/10000000/5/manual_time                                         -0.8818         -0.8809            18             2            18             2
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/10000000/10/manual_time                                        -0.8819         -0.8814            36             4            36             4
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000000/1/manual_time                                        -0.8834         -0.8829            36             4            36             4
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000000/2/manual_time                                        -0.8832         -0.8830            72             8            72             8
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000000/5/manual_time                                        -0.8838         -0.8838           181            21           181            21
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000000/10/manual_time                                       -0.8840         -0.8839           362            42           362            42
COMPILED_BINARYOP<float, int64_t, int32_t, cudf::binary_operator::ADD>/ADD/10000/manual_time                                                        -0.5575         -0.2299            12             5            30            23
COMPILED_BINARYOP<float, int64_t, int32_t, cudf::binary_operator::ADD>/ADD/100000/manual_time                                                       -0.7400         -0.5087            33             9            51            25
COMPILED_BINARYOP<float, int64_t, int32_t, cudf::binary_operator::ADD>/ADD/1000000/manual_time                                                      -0.8920         -0.8592           326            35           344            48
COMPILED_BINARYOP<float, int64_t, int32_t, cudf::binary_operator::ADD>/ADD/10000000/manual_time                                                     -0.9167         -0.9132          3421           285          3456           300
COMPILED_BINARYOP<float, int64_t, int32_t, cudf::binary_operator::ADD>/ADD/100000000/manual_time                                                    -0.9196         -0.9227         34473          2772         36216          2798
COMPILED_BINARYOP<duration_s, duration_D, duration_ms, cudf::binary_operator::SUB>/SUB/10000/manual_time                                            -0.5760         -0.2425            12             5            31            23
COMPILED_BINARYOP<duration_s, duration_D, duration_ms, cudf::binary_operator::SUB>/SUB/100000/manual_time                                           -0.7434         -0.5148            34             9            51            25
COMPILED_BINARYOP<duration_s, duration_D, duration_ms, cudf::binary_operator::SUB>/SUB/1000000/manual_time                                          -0.8806         -0.8433           337            40           355            56
COMPILED_BINARYOP<duration_s, duration_D, duration_ms, cudf::binary_operator::SUB>/SUB/10000000/manual_time                                         -0.8980         -0.8943          3523           359          3560           376
COMPILED_BINARYOP<duration_s, duration_D, duration_ms, cudf::binary_operator::SUB>/SUB/100000000/manual_time                                        -0.9002         -0.9041         35468          3540         37267          3575
COMPILED_BINARYOP<float, float, int64_t, cudf::binary_operator::MUL>/MUL/10000/manual_time                                                          -0.5487         -0.2223            11             5            30            23
COMPILED_BINARYOP<float, float, int64_t, cudf::binary_operator::MUL>/MUL/100000/manual_time                                                         -0.7748         -0.5234            33             7            50            24
COMPILED_BINARYOP<float, float, int64_t, cudf::binary_operator::MUL>/MUL/1000000/manual_time                                                        -0.9030         -0.8631           333            32           351            48
COMPILED_BINARYOP<float, float, int64_t, cudf::binary_operator::MUL>/MUL/10000000/manual_time                                                       -0.9155         -0.9113          3467           293          3502           310
COMPILED_BINARYOP<float, float, int64_t, cudf::binary_operator::MUL>/MUL/100000000/manual_time                                                      -0.9174         -0.9205         34851          2880         36618          2910
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::DIV>/DIV/10000/manual_time                                                      -0.5596         -0.2353            12             5            31            23
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::DIV>/DIV/100000/manual_time                                                     -0.7129         -0.4972            34            10            51            26
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::DIV>/DIV/1000000/manual_time                                                    -0.8650         -0.8279           343            46           362            62
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::DIV>/DIV/10000000/manual_time                                                   -0.8811         -0.8774          3587           427          3624           444
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::DIV>/DIV/100000000/manual_time                                                  -0.8832         -0.8879         36052          4212         37969          4255
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::TRUE_DIV>/TRUE_DIV/10000/manual_time                                            +0.0198         +0.0064            10            10            28            28
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::TRUE_DIV>/TRUE_DIV/100000/manual_time                                           +0.0393         -0.0109            12            13            29            29
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::TRUE_DIV>/TRUE_DIV/1000000/manual_time                                          -0.0028         -0.0197            54            54            70            69
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::TRUE_DIV>/TRUE_DIV/10000000/manual_time                                         -0.0079         -0.0118           467           463           485           479
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::TRUE_DIV>/TRUE_DIV/100000000/manual_time                                        -0.0083         -0.0087          4599          4561          4647          4607
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::FLOOR_DIV>/FLOOR_DIV/10000/manual_time                                          +0.0080         +0.0028            11            11            29            29
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::FLOOR_DIV>/FLOOR_DIV/100000/manual_time                                         +0.0760         +0.0077            13            14            30            30
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::FLOOR_DIV>/FLOOR_DIV/1000000/manual_time                                        +0.0074         -0.0098            58            59            74            74
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::FLOOR_DIV>/FLOOR_DIV/10000000/manual_time                                       -0.0009         -0.0046           508           508           527           524
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::FLOOR_DIV>/FLOOR_DIV/100000000/manual_time                                      -0.0025         -0.0028          5011          4998          5065          5051
COMPILED_BINARYOP<double, double, double, cudf::binary_operator::MOD>/MOD/10000/manual_time                                                         -0.0780         -0.0238             8             7            26            25
COMPILED_BINARYOP<double, double, double, cudf::binary_operator::MOD>/MOD/100000/manual_time                                                        +0.0625         -0.0088            10            11            27            27
COMPILED_BINARYOP<double, double, double, cudf::binary_operator::MOD>/MOD/1000000/manual_time                                                       +0.0079         -0.0163            47            48            64            63
COMPILED_BINARYOP<double, double, double, cudf::binary_operator::MOD>/MOD/10000000/manual_time                                                      -0.0030         -0.0063           429           428           448           445
COMPILED_BINARYOP<double, double, double, cudf::binary_operator::MOD>/MOD/100000000/manual_time                                                     -0.0048         -0.0052          4233          4212          4277          4255
COMPILED_BINARYOP<int32_t, int64_t, double, cudf::binary_operator::PMOD>/PMOD/10000/manual_time                                                     -0.0263         -0.0028             9             9            27            27
COMPILED_BINARYOP<int32_t, int64_t, double, cudf::binary_operator::PMOD>/PMOD/100000/manual_time                                                    +0.0617         -0.0128            11            11            28            27
COMPILED_BINARYOP<int32_t, int64_t, double, cudf::binary_operator::PMOD>/PMOD/1000000/manual_time                                                   +0.0109         -0.0135            48            48            64            64
COMPILED_BINARYOP<int32_t, int64_t, double, cudf::binary_operator::PMOD>/PMOD/10000000/manual_time                                                  +0.0006         -0.0035           419           419           438           436
COMPILED_BINARYOP<int32_t, int64_t, double, cudf::binary_operator::PMOD>/PMOD/100000000/manual_time                                                 -0.0040         -0.0044          4143          4126          4186          4167
COMPILED_BINARYOP<int32_t, uint8_t, int64_t, cudf::binary_operator::PYMOD>/PYMOD/10000/manual_time                                                  +0.0817         +0.0207             6             6            24            24
COMPILED_BINARYOP<int32_t, uint8_t, int64_t, cudf::binary_operator::PYMOD>/PYMOD/100000/manual_time                                                 +0.0416         -0.0237             7             7            24            24
COMPILED_BINARYOP<int32_t, uint8_t, int64_t, cudf::binary_operator::PYMOD>/PYMOD/1000000/manual_time                                                +0.0052         -0.0300            31            31            49            47
COMPILED_BINARYOP<int32_t, uint8_t, int64_t, cudf::binary_operator::PYMOD>/PYMOD/10000000/manual_time                                               -0.0058         -0.0114           268           266           287           283
COMPILED_BINARYOP<int32_t, uint8_t, int64_t, cudf::binary_operator::PYMOD>/PYMOD/100000000/manual_time                                              -0.0102         -0.0107          2577          2551          2605          2577
COMPILED_BINARYOP<int64_t, int64_t, double, cudf::binary_operator::POW>/POW/10000/manual_time                                                       -0.0008         -0.0008            62            62            80            80
COMPILED_BINARYOP<int64_t, int64_t, double, cudf::binary_operator::POW>/POW/100000/manual_time                                                      +0.0131         -0.0003            64            64            80            80
COMPILED_BINARYOP<int64_t, int64_t, double, cudf::binary_operator::POW>/POW/1000000/manual_time                                                     +0.0007         -0.0023           420           420           436           435
COMPILED_BINARYOP<int64_t, int64_t, double, cudf::binary_operator::POW>/POW/10000000/manual_time                                                    -0.0005         -0.0009          4016          4014          4057          4054
COMPILED_BINARYOP<int64_t, int64_t, double, cudf::binary_operator::POW>/POW/100000000/manual_time                                                   -0.0008         -0.0007         39933         39903         42165         42136
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::LOG_BASE>/LOG_BASE/10000/manual_time                                                -0.0059         +0.0035            46            46            65            65
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::LOG_BASE>/LOG_BASE/100000/manual_time                                               +0.0073         -0.0108            47            48            64            64
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::LOG_BASE>/LOG_BASE/1000000/manual_time                                              +0.0004         -0.0021           307           307           324           323
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::LOG_BASE>/LOG_BASE/10000000/manual_time                                             -0.0002         -0.0001          2933          2932          2964          2964
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::LOG_BASE>/LOG_BASE/100000000/manual_time                                            +0.0001         +0.0003         29133         29134         30367         30375
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::ATAN2>/ATAN2/10000/manual_time                                                      -0.0097         -0.0003            29            29            47            47
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::ATAN2>/ATAN2/100000/manual_time                                                     +0.0093         -0.0182            30            30            47            46
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::ATAN2>/ATAN2/1000000/manual_time                                                    -0.0004         -0.0098           186           186           203           201
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::ATAN2>/ATAN2/10000000/manual_time                                                   +0.0000         -0.0004          1761          1761          1785          1784
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::ATAN2>/ATAN2/100000000/manual_time                                                  +0.0002         +0.0003         17467         17470         17921         17926
COMPILED_BINARYOP<int, int, int, cudf::binary_operator::SHIFT_LEFT>/SHIFT_LEFT/10000/manual_time                                                    +0.0472         +0.0154             5             5            23            24
COMPILED_BINARYOP<int, int, int, cudf::binary_operator::SHIFT_LEFT>/SHIFT_LEFT/100000/manual_time                                                   +0.0763         -0.0030             7             7            24            24
COMPILED_BINARYOP<int, int, int, cudf::binary_operator::SHIFT_LEFT>/SHIFT_LEFT/1000000/manual_time                                                  +0.0384         +0.0061            26            27            41            42
COMPILED_BINARYOP<int, int, int, cudf::binary_operator::SHIFT_LEFT>/SHIFT_LEFT/10000000/manual_time                                                 +0.0006         -0.0062           217           217           235           234
COMPILED_BINARYOP<int, int, int, cudf::binary_operator::SHIFT_LEFT>/SHIFT_LEFT/100000000/manual_time                                                -0.0045         -0.0126          2116          2106          2141          2113
COMPILED_BINARYOP<int16_t, int64_t, int, cudf::binary_operator::SHIFT_RIGHT>/SHIFT_RIGHT/10000/manual_time                                          -0.0651         -0.0110             6             5            24            23
COMPILED_BINARYOP<int16_t, int64_t, int, cudf::binary_operator::SHIFT_RIGHT>/SHIFT_RIGHT/100000/manual_time                                         +0.0913         -0.0152             8             8            25            25
COMPILED_BINARYOP<int16_t, int64_t, int, cudf::binary_operator::SHIFT_RIGHT>/SHIFT_RIGHT/1000000/manual_time                                        +0.0120         -0.0082            32            33            46            46
COMPILED_BINARYOP<int16_t, int64_t, int, cudf::binary_operator::SHIFT_RIGHT>/SHIFT_RIGHT/10000000/manual_time                                       -0.0029         -0.0065           255           254           272           270
COMPILED_BINARYOP<int16_t, int64_t, int, cudf::binary_operator::SHIFT_RIGHT>/SHIFT_RIGHT/100000000/manual_time                                      -0.0090         -0.0085          2485          2463          2511          2490
COMPILED_BINARYOP<int64_t, int32_t, int64_t, cudf::binary_operator::SHIFT_RIGHT_UNSIGNED>/SHIFT_RIGHT_UNSIGNED/10000/manual_time                    -0.0697         -0.0085             5             5            23            23
COMPILED_BINARYOP<int64_t, int32_t, int64_t, cudf::binary_operator::SHIFT_RIGHT_UNSIGNED>/SHIFT_RIGHT_UNSIGNED/100000/manual_time                   +0.0500         -0.0127             8             9            25            25
COMPILED_BINARYOP<int64_t, int32_t, int64_t, cudf::binary_operator::SHIFT_RIGHT_UNSIGNED>/SHIFT_RIGHT_UNSIGNED/1000000/manual_time                  +0.0087         -0.0157            39            39            56            55
COMPILED_BINARYOP<int64_t, int32_t, int64_t, cudf::binary_operator::SHIFT_RIGHT_UNSIGNED>/SHIFT_RIGHT_UNSIGNED/10000000/manual_time                 -0.0059         -0.0089           362           359           381           377
COMPILED_BINARYOP<int64_t, int32_t, int64_t, cudf::binary_operator::SHIFT_RIGHT_UNSIGNED>/SHIFT_RIGHT_UNSIGNED/100000000/manual_time                -0.0060         -0.0060          3571          3549          3608          3586
COMPILED_BINARYOP<int64_t, int32_t, int16_t, cudf::binary_operator::BITWISE_AND>/BITWISE_AND/10000/manual_time                                      -0.0417         -0.0056             5             5            23            23
COMPILED_BINARYOP<int64_t, int32_t, int16_t, cudf::binary_operator::BITWISE_AND>/BITWISE_AND/100000/manual_time                                     +0.0985         -0.0176             8             8            25            25
COMPILED_BINARYOP<int64_t, int32_t, int16_t, cudf::binary_operator::BITWISE_AND>/BITWISE_AND/1000000/manual_time                                    +0.0226         +0.0016            33            33            46            46
COMPILED_BINARYOP<int64_t, int32_t, int16_t, cudf::binary_operator::BITWISE_AND>/BITWISE_AND/10000000/manual_time                                   +0.0010         -0.0023           258           258           273           272
COMPILED_BINARYOP<int64_t, int32_t, int16_t, cudf::binary_operator::BITWISE_AND>/BITWISE_AND/100000000/manual_time                                  -0.0006         -0.0001          2502          2501          2526          2526
COMPILED_BINARYOP<int16_t, int32_t, int64_t, cudf::binary_operator::BITWISE_OR>/BITWISE_OR/10000/manual_time                                        -0.0117         +0.0054             5             5            23            23
COMPILED_BINARYOP<int16_t, int32_t, int64_t, cudf::binary_operator::BITWISE_OR>/BITWISE_OR/100000/manual_time                                       +0.0383         -0.0134             7             7            24            24
COMPILED_BINARYOP<int16_t, int32_t, int64_t, cudf::binary_operator::BITWISE_OR>/BITWISE_OR/1000000/manual_time                                      +0.0288         -0.0154            29            30            46            45
COMPILED_BINARYOP<int16_t, int32_t, int64_t, cudf::binary_operator::BITWISE_OR>/BITWISE_OR/10000000/manual_time                                     -0.0050         -0.0094           264           262           283           280
COMPILED_BINARYOP<int16_t, int32_t, int64_t, cudf::binary_operator::BITWISE_OR>/BITWISE_OR/100000000/manual_time                                    -0.0097         -0.0099          2600          2575          2629          2603
COMPILED_BINARYOP<int16_t, int64_t, int32_t, cudf::binary_operator::BITWISE_XOR>/BITWISE_XOR/10000/manual_time                                      -0.0683         -0.0144             6             5            24            23
COMPILED_BINARYOP<int16_t, int64_t, int32_t, cudf::binary_operator::BITWISE_XOR>/BITWISE_XOR/100000/manual_time                                     +0.0915         -0.0168             8             8            25            25
COMPILED_BINARYOP<int16_t, int64_t, int32_t, cudf::binary_operator::BITWISE_XOR>/BITWISE_XOR/1000000/manual_time                                    +0.0157         -0.0012            32            33            46            46
COMPILED_BINARYOP<int16_t, int64_t, int32_t, cudf::binary_operator::BITWISE_XOR>/BITWISE_XOR/10000000/manual_time                                   +0.0010         -0.0032           254           254           271           270
COMPILED_BINARYOP<int16_t, int64_t, int32_t, cudf::binary_operator::BITWISE_XOR>/BITWISE_XOR/100000000/manual_time                                  -0.0033         -0.0027          2472          2464          2498          2491
COMPILED_BINARYOP<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/LOGICAL_AND/10000/manual_time                                           +0.0030         +0.0196             6             6            24            25
COMPILED_BINARYOP<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/LOGICAL_AND/100000/manual_time                                          +0.0889         +0.0074             8             8            25            25
COMPILED_BINARYOP<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/LOGICAL_AND/1000000/manual_time                                         +0.0373         +0.0206            29            30            42            43
COMPILED_BINARYOP<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/LOGICAL_AND/10000000/manual_time                                        +0.0228         +0.0185           215           220           229           233
COMPILED_BINARYOP<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/LOGICAL_AND/100000000/manual_time                                       +0.0250         +0.0248          2080          2132          2100          2152
COMPILED_BINARYOP<int16_t, int64_t, bool, cudf::binary_operator::LOGICAL_OR>/LOGICAL_OR/10000/manual_time                                           -0.0419         +0.0032             5             5            23            24
COMPILED_BINARYOP<int16_t, int64_t, bool, cudf::binary_operator::LOGICAL_OR>/LOGICAL_OR/100000/manual_time                                          +0.0978         -0.0024             7             8            25            25
COMPILED_BINARYOP<int16_t, int64_t, bool, cudf::binary_operator::LOGICAL_OR>/LOGICAL_OR/1000000/manual_time                                         +0.0221         +0.0174            30            31            43            44
COMPILED_BINARYOP<int16_t, int64_t, bool, cudf::binary_operator::LOGICAL_OR>/LOGICAL_OR/10000000/manual_time                                        +0.0177         +0.0121           216           220           231           233
COMPILED_BINARYOP<int16_t, int64_t, bool, cudf::binary_operator::LOGICAL_OR>/LOGICAL_OR/100000000/manual_time                                       +0.0137         +0.0137          2071          2099          2091          2120
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::EQUAL>/EQUAL/10000/manual_time                                             -0.1215         -0.0094             6             5            24            24
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::EQUAL>/EQUAL/100000/manual_time                                            +0.1160         +0.0014             8             9            26            26
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::EQUAL>/EQUAL/1000000/manual_time                                           +0.0043         +0.0042            41            41            54            54
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::EQUAL>/EQUAL/10000000/manual_time                                          -0.0020         -0.0013           316           315           329           329
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::EQUAL>/EQUAL/100000000/manual_time                                         -0.0030         -0.0032          3055          3046          3083          3073
COMPILED_BINARYOP<decimal32, decimal32, bool, cudf::binary_operator::NOT_EQUAL>/NOT_EQUAL/10000/manual_time                                         -0.0725         -0.0050             5             5            23            23
COMPILED_BINARYOP<decimal32, decimal32, bool, cudf::binary_operator::NOT_EQUAL>/NOT_EQUAL/100000/manual_time                                        +0.0097         -0.0096             7             7            24            24
COMPILED_BINARYOP<decimal32, decimal32, bool, cudf::binary_operator::NOT_EQUAL>/NOT_EQUAL/1000000/manual_time                                       +0.0218         -0.0001            25            25            38            38
COMPILED_BINARYOP<decimal32, decimal32, bool, cudf::binary_operator::NOT_EQUAL>/NOT_EQUAL/10000000/manual_time                                      -0.0017         -0.0057           176           175           190           189
COMPILED_BINARYOP<decimal32, decimal32, bool, cudf::binary_operator::NOT_EQUAL>/NOT_EQUAL/100000000/manual_time                                     -0.0071         -0.0066          1674          1662          1693          1681
COMPILED_BINARYOP<timestamp_s, timestamp_s, bool, cudf::binary_operator::LESS>/LESS/10000/manual_time                                               -0.1031         -0.0035             5             5            23            23
COMPILED_BINARYOP<timestamp_s, timestamp_s, bool, cudf::binary_operator::LESS>/LESS/100000/manual_time                                              +0.0543         -0.0034             9             9            26            26
COMPILED_BINARYOP<timestamp_s, timestamp_s, bool, cudf::binary_operator::LESS>/LESS/1000000/manual_time                                             -0.0088         -0.0050            40            40            53            53
COMPILED_BINARYOP<timestamp_s, timestamp_s, bool, cudf::binary_operator::LESS>/LESS/10000000/manual_time                                            -0.0051         -0.0057           314           313           328           326
COMPILED_BINARYOP<timestamp_s, timestamp_s, bool, cudf::binary_operator::LESS>/LESS/100000000/manual_time                                           -0.0036         -0.0035          3047          3036          3075          3064
COMPILED_BINARYOP<timestamp_ms, timestamp_s, bool, cudf::binary_operator::GREATER>/GREATER/10000/manual_time                                        -0.0518         +0.0074             5             5            23            23
COMPILED_BINARYOP<timestamp_ms, timestamp_s, bool, cudf::binary_operator::GREATER>/GREATER/100000/manual_time                                       +0.0566         -0.0093             9             9            26            26
COMPILED_BINARYOP<timestamp_ms, timestamp_s, bool, cudf::binary_operator::GREATER>/GREATER/1000000/manual_time                                      -0.0052         -0.0033            40            40            53            53
COMPILED_BINARYOP<timestamp_ms, timestamp_s, bool, cudf::binary_operator::GREATER>/GREATER/10000000/manual_time                                     -0.0028         -0.0099           314           313           328           325
COMPILED_BINARYOP<timestamp_ms, timestamp_s, bool, cudf::binary_operator::GREATER>/GREATER/100000000/manual_time                                    -0.0023         -0.0026          3048          3041          3076          3068
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::NULL_EQUALS>/NULL_EQUALS/10000/manual_time                                 -0.1388         -0.0288             7             6            25            24
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::NULL_EQUALS>/NULL_EQUALS/100000/manual_time                                +0.1029         +0.0096            10            11            27            28
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::NULL_EQUALS>/NULL_EQUALS/1000000/manual_time                               +0.0052         -0.0041            42            42            55            55
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::NULL_EQUALS>/NULL_EQUALS/10000000/manual_time                              -0.0043         -0.0056           317           316           331           329
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::NULL_EQUALS>/NULL_EQUALS/100000000/manual_time                             -0.0040         -0.0042          3081          3069          3109          3096
COMPILED_BINARYOP<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/NULL_MAX/10000/manual_time                                      +0.0193         +0.0040            14            15            32            32
COMPILED_BINARYOP<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/NULL_MAX/100000/manual_time                                     +0.0503         +0.0198            39            41            55            56
COMPILED_BINARYOP<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/NULL_MAX/1000000/manual_time                                    +0.0091         +0.0068           379           382           396           399
COMPILED_BINARYOP<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/NULL_MAX/10000000/manual_time                                   -0.0028         -0.0029          3906          3895          3945          3934
COMPILED_BINARYOP<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/NULL_MAX/100000000/manual_time                                  -0.0044         -0.0122         39232         39060         41424         40918
COMPILED_BINARYOP<timestamp_D, timestamp_s, timestamp_s, cudf::binary_operator::NULL_MIN>/NULL_MIN/10000/manual_time                                +0.0210         +0.0047            14            14            32            32
COMPILED_BINARYOP<timestamp_D, timestamp_s, timestamp_s, cudf::binary_operator::NULL_MIN>/NULL_MIN/100000/manual_time                               +0.0575         +0.0262            38            40            54            55
COMPILED_BINARYOP<timestamp_D, timestamp_s, timestamp_s, cudf::binary_operator::NULL_MIN>/NULL_MIN/1000000/manual_time                              +0.0107         +0.0073           372           376           390           393
COMPILED_BINARYOP<timestamp_D, timestamp_s, timestamp_s, cudf::binary_operator::NULL_MIN>/NULL_MIN/10000000/manual_time                             -0.0029         -0.0030          3859          3847          3898          3887
COMPILED_BINARYOP<timestamp_D, timestamp_s, timestamp_s, cudf::binary_operator::NULL_MIN>/NULL_MIN/100000000/manual_time                            -0.0033         -0.0036         38735         38607         40919         40772

@robertmaynard
Copy link
Contributor Author

robertmaynard commented Dec 1, 2021

Binary size change is an increase of 80KB ( from 130084920 to 130164336 )

@codecov
Copy link

codecov bot commented Dec 2, 2021

Codecov Report

Merging #9816 (2796c66) into branch-21.12 (74ac6ed) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff              @@
##           branch-21.12    #9816   +/-   ##
=============================================
  Coverage         10.60%   10.60%           
=============================================
  Files               118      118           
  Lines             20081    20081           
=============================================
  Hits               2130     2130           
  Misses            17951    17951           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74ac6ed...2796c66. Read the comment docs.

@karthikeyann
Copy link
Contributor

Awesome. Surprised that binary size didn't increase much!
Benchmark times are back to original (close to jit kernels) #8192 (comment)

Following operations still take longer time for 100M size.

op time
POW 39ms
LOG_BASE 29ms
ATAN2 17ms
NULL_MAX 39ms
NULL_MIN 38ms

JIT and 21.08 compiled binops are in 3ms range for 100M size.

We need to add benchmark for both common_type combination and non common type combination to benchmark both kernels now.

bool is_rhs_scalar,
binary_operator op,
rmm::cuda_stream_view stream)
__forceinline__ void operator_dispatcher(mutable_column_device_view& out,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested with inline instead of force inline? We should prefer standard keywords wherever possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically requested this as the regression is caused by a lack of inlining due to crossing a threshold of complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is the wrong place to put it because this is host code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the force_inline to the correct place and re-running the benchmarks.

@revans2
Copy link
Contributor

revans2 commented Dec 2, 2021

@abellina and others ran through a large set of performance tests for the Spark plugin yesterday and last night. This patch improves the performance quite a bit vs the 21.10 release. We are seeing a median improvement in performance for the queries we ran of about 7.4%. This is great because the same tests run against #9802 showed about a 0.8% improvement. That is a great win. The tests we ran last night were mostly targeting the non-decimal use case, as we were concerned about regressions in performance there.

We will likely be doing more performance tests today, but overall this is looking really good.

@karthikeyann
Copy link
Contributor

Compile time difference: +4s only! 👍
new - touch binary_op.cuh
real 1m20.158s

old - touch binary_op.cuh
real 1m16.862s

@robertmaynard
Copy link
Contributor Author

robertmaynard commented Dec 2, 2021

Update performance numbers comparing this PR with and without forceinline. In short we see the performance regression for NULL_MIN and NULL_MAX being resolved, with no serious improvement or adverse effect on the other binary operators.

For file size we see a reduction of the binary by 20KB by using __forceinline__

Comparing optimized.json to baseline.json
Benchmark                                                                                                                                  Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000/1/manual_time                               -0.0376         +0.0265             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000/2/manual_time                               -0.0603         +0.0139             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000/5/manual_time                               -0.0320         +0.0210             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000/10/manual_time                              -0.0019         +0.0163             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/1000000/1/manual_time                              -0.0316         +0.0100             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/1000000/2/manual_time                              -0.0122         +0.0124             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/1000000/5/manual_time                              -0.0030         +0.0081             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/1000000/10/manual_time                             +0.0047         +0.0097             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/10000000/1/manual_time                             -0.0017         +0.0061             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/10000000/2/manual_time                             +0.0008         +0.0057             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/10000000/5/manual_time                             +0.0034         +0.0052             1             1             1             1
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/10000000/10/manual_time                            +0.0054         +0.0063             2             2             2             2
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000000/1/manual_time                            +0.0046         +0.0053             2             2             2             2
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000000/2/manual_time                            +0.0045         +0.0049             4             4             4             4
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000000/5/manual_time                            +0.0059         +0.0061            11            11            11            11
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, false>/binaryop_int32_imbalanced_unique/100000000/10/manual_time                           +0.0054         +0.0054            21            21            21            21
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000/1/manual_time                                 +0.0035         +0.0276             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000/2/manual_time                                 -0.0139         +0.0516             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000/5/manual_time                                 -0.0009         +0.0492             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000/10/manual_time                                -0.0292         +0.0099             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/1000000/1/manual_time                                -0.0237         +0.0189             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/1000000/2/manual_time                                -0.0113         +0.0133             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/1000000/5/manual_time                                +0.0004         +0.0089             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/1000000/10/manual_time                               +0.0022         +0.0072             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/10000000/1/manual_time                               -0.0008         +0.0076             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/10000000/2/manual_time                               +0.0025         +0.0066             0             0             0             0
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/10000000/5/manual_time                               +0.0043         +0.0053             1             1             1             1
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/10000000/10/manual_time                              +0.0046         +0.0055             2             2             2             2
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000000/1/manual_time                              +0.0064         +0.0603             1             2             2             2
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000000/2/manual_time                              +0.0079         +0.0083             4             4             4             4
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000000/5/manual_time                              +0.0058         +0.0058            10            10            10            10
BINARYOP<int32_t, TreeType::IMBALANCED_LEFT, true>/binaryop_int32_imbalanced_reuse/100000000/10/manual_time                             +0.0058         +0.0060            20            21            20            21
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000/1/manual_time                               -0.0439         +0.0193             0             0             0             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000/2/manual_time                               -0.0348         +0.0154             0             0             0             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000/5/manual_time                               -0.0093         +0.0150             0             0             0             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000/10/manual_time                              -0.0091         +0.0114             0             0             0             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/1000000/1/manual_time                              +0.0083         +0.0247             0             0             0             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/1000000/2/manual_time                              +0.0115         +0.0213             0             0             0             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/1000000/5/manual_time                              +0.0123         +0.0174             0             0             0             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/1000000/10/manual_time                             +0.0120         +0.0153             0             0             0             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/10000000/1/manual_time                             +0.0044         +0.0080             0             0             0             0
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/10000000/2/manual_time                             +0.0058         +0.0080             1             1             1             1
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/10000000/5/manual_time                             +0.0068         +0.0075             2             2             2             2
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/10000000/10/manual_time                            +0.0067         +0.0072             4             4             4             4
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000000/1/manual_time                            +0.0051         +0.0056             4             4             4             4
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000000/2/manual_time                            +0.0061         +0.0063             8             8             8             8
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000000/5/manual_time                            +0.0026         +0.0027            21            21            21            21
BINARYOP<double, TreeType::IMBALANCED_LEFT, false>/binaryop_double_imbalanced_unique/100000000/10/manual_time                           +0.0044         +0.0044            42            42            42            42
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::TRUE_DIV>/TRUE_DIV/10000/manual_time                                +0.0057         +0.0179            10            10            28            29
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::TRUE_DIV>/TRUE_DIV/100000/manual_time                               -0.0643         +0.0085            13            12            29            29
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::TRUE_DIV>/TRUE_DIV/1000000/manual_time                              -0.0015         +0.0159            54            54            69            70
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::TRUE_DIV>/TRUE_DIV/10000000/manual_time                             +0.0017         +0.0059           463           464           479           482
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::TRUE_DIV>/TRUE_DIV/100000000/manual_time                            +0.0009         +0.0013          4561          4565          4607          4613
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::FLOOR_DIV>/FLOOR_DIV/10000/manual_time                              -0.0120         +0.0110            11            11            29            29
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::FLOOR_DIV>/FLOOR_DIV/100000/manual_time                             -0.0559         +0.0093            14            13            30            30
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::FLOOR_DIV>/FLOOR_DIV/1000000/manual_time                            -0.0033         +0.0133            59            59            74            75
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::FLOOR_DIV>/FLOOR_DIV/10000000/manual_time                           +0.0041         +0.0083           508           510           524           529
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::FLOOR_DIV>/FLOOR_DIV/100000000/manual_time                          +0.0074         +0.0077          4998          5035          5051          5090
COMPILED_BINARYOP<int32_t, int64_t, double, cudf::binary_operator::PMOD>/PMOD/10000/manual_time                                         +0.0184         +0.0193             9             9            27            28
COMPILED_BINARYOP<int32_t, int64_t, double, cudf::binary_operator::PMOD>/PMOD/100000/manual_time                                        -0.0473         +0.0173            11            11            27            28
COMPILED_BINARYOP<int32_t, int64_t, double, cudf::binary_operator::PMOD>/PMOD/1000000/manual_time                                       -0.0119         +0.0151            48            48            64            65
COMPILED_BINARYOP<int32_t, int64_t, double, cudf::binary_operator::PMOD>/PMOD/10000000/manual_time                                      -0.0026         +0.0020           419           418           436           437
COMPILED_BINARYOP<int32_t, int64_t, double, cudf::binary_operator::PMOD>/PMOD/100000000/manual_time                                     -0.0034         -0.0029          4126          4112          4167          4155
COMPILED_BINARYOP<int32_t, uint8_t, int64_t, cudf::binary_operator::PYMOD>/PYMOD/10000/manual_time                                      -0.0253         +0.0087             6             6            24            24
COMPILED_BINARYOP<int32_t, uint8_t, int64_t, cudf::binary_operator::PYMOD>/PYMOD/100000/manual_time                                     -0.0103         +0.0353             7             7            24            25
COMPILED_BINARYOP<int32_t, uint8_t, int64_t, cudf::binary_operator::PYMOD>/PYMOD/1000000/manual_time                                    -0.0195         +0.0200            31            31            47            48
COMPILED_BINARYOP<int32_t, uint8_t, int64_t, cudf::binary_operator::PYMOD>/PYMOD/10000000/manual_time                                   +0.0005         +0.0069           266           266           283           285
COMPILED_BINARYOP<int32_t, uint8_t, int64_t, cudf::binary_operator::PYMOD>/PYMOD/100000000/manual_time                                  +0.0057         +0.0063          2551          2565          2577          2594
COMPILED_BINARYOP<int64_t, int64_t, double, cudf::binary_operator::POW>/POW/10000/manual_time                                           +0.0070         +0.0104            62            63            80            81
COMPILED_BINARYOP<int64_t, int64_t, double, cudf::binary_operator::POW>/POW/100000/manual_time                                          -0.0028         +0.0113            64            64            80            81
COMPILED_BINARYOP<int64_t, int64_t, double, cudf::binary_operator::POW>/POW/1000000/manual_time                                         +0.0067         +0.0092           420           423           435           439
COMPILED_BINARYOP<int64_t, int64_t, double, cudf::binary_operator::POW>/POW/10000000/manual_time                                        +0.0093         +0.0098          4014          4051          4054          4093
COMPILED_BINARYOP<int64_t, int64_t, double, cudf::binary_operator::POW>/POW/100000000/manual_time                                       +0.0084         +0.0116         39903         40238         42136         42625
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::LOG_BASE>/LOG_BASE/10000/manual_time                                    +0.0109         +0.0085            46            47            65            65
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::LOG_BASE>/LOG_BASE/100000/manual_time                                   +0.0046         +0.0193            48            48            64            65
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::LOG_BASE>/LOG_BASE/1000000/manual_time                                  +0.0068         +0.0095           307           309           323           326
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::LOG_BASE>/LOG_BASE/10000000/manual_time                                 +0.0092         +0.0091          2932          2959          2964          2991
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::LOG_BASE>/LOG_BASE/100000000/manual_time                                +0.0083         +0.0081         29134         29376         30375         30621
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::ATAN2>/ATAN2/10000/manual_time                                          +0.0001         +0.0058            29            29            47            48
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::ATAN2>/ATAN2/100000/manual_time                                         -0.0054         +0.0199            30            30            46            47
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::ATAN2>/ATAN2/1000000/manual_time                                        +0.0068         +0.0168           186           188           201           205
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::ATAN2>/ATAN2/10000000/manual_time                                       +0.0081         +0.0085          1761          1776          1784          1799
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::ATAN2>/ATAN2/100000000/manual_time                                      +0.0083         +0.0083         17470         17615         17926         18075
COMPILED_BINARYOP<int, int, int, cudf::binary_operator::SHIFT_LEFT>/SHIFT_LEFT/10000/manual_time                                        +0.0513         +0.0293             5             6            24            24
COMPILED_BINARYOP<int, int, int, cudf::binary_operator::SHIFT_LEFT>/SHIFT_LEFT/100000/manual_time                                       -0.0290         +0.0281             7             7            24            25
COMPILED_BINARYOP<int, int, int, cudf::binary_operator::SHIFT_LEFT>/SHIFT_LEFT/1000000/manual_time                                      -0.0327         +0.0039            27            27            42            42
COMPILED_BINARYOP<int, int, int, cudf::binary_operator::SHIFT_LEFT>/SHIFT_LEFT/10000000/manual_time                                     -0.0010         +0.0063           217           216           234           235
COMPILED_BINARYOP<int, int, int, cudf::binary_operator::SHIFT_LEFT>/SHIFT_LEFT/100000000/manual_time                                    +0.0056         +0.0141          2106          2118          2113          2143
COMPILED_BINARYOP<int16_t, int64_t, int, cudf::binary_operator::SHIFT_RIGHT>/SHIFT_RIGHT/10000/manual_time                              +0.0368         +0.0192             5             5            23            24
COMPILED_BINARYOP<int16_t, int64_t, int, cudf::binary_operator::SHIFT_RIGHT>/SHIFT_RIGHT/100000/manual_time                             -0.0616         +0.0537             8             8            25            26
COMPILED_BINARYOP<int16_t, int64_t, int, cudf::binary_operator::SHIFT_RIGHT>/SHIFT_RIGHT/1000000/manual_time                            -0.0137         +0.0209            33            32            46            47
COMPILED_BINARYOP<int16_t, int64_t, int, cudf::binary_operator::SHIFT_RIGHT>/SHIFT_RIGHT/10000000/manual_time                           -0.0005         +0.0048           254           254           270           272
COMPILED_BINARYOP<int16_t, int64_t, int, cudf::binary_operator::SHIFT_RIGHT>/SHIFT_RIGHT/100000000/manual_time                          +0.0082         +0.0075          2463          2483          2490          2508
COMPILED_BINARYOP<int64_t, int32_t, int16_t, cudf::binary_operator::BITWISE_AND>/BITWISE_AND/10000/manual_time                          +0.0241         +0.0187             5             5            23            24
COMPILED_BINARYOP<int64_t, int32_t, int16_t, cudf::binary_operator::BITWISE_AND>/BITWISE_AND/100000/manual_time                         -0.1059         +0.0153             8             8            25            25
COMPILED_BINARYOP<int64_t, int32_t, int16_t, cudf::binary_operator::BITWISE_AND>/BITWISE_AND/1000000/manual_time                        -0.0071         +0.0021            33            33            46            46
COMPILED_BINARYOP<int64_t, int32_t, int16_t, cudf::binary_operator::BITWISE_AND>/BITWISE_AND/10000000/manual_time                       -0.0031         +0.0031           258           257           272           273
COMPILED_BINARYOP<int64_t, int32_t, int16_t, cudf::binary_operator::BITWISE_AND>/BITWISE_AND/100000000/manual_time                      +0.0016         +0.0013          2501          2505          2526          2529
COMPILED_BINARYOP<int16_t, int32_t, int64_t, cudf::binary_operator::BITWISE_OR>/BITWISE_OR/10000/manual_time                            +0.0965         +0.0356             5             6            23            24
COMPILED_BINARYOP<int16_t, int32_t, int64_t, cudf::binary_operator::BITWISE_OR>/BITWISE_OR/100000/manual_time                           -0.0366         +0.0074             7             7            24            24
COMPILED_BINARYOP<int16_t, int32_t, int64_t, cudf::binary_operator::BITWISE_OR>/BITWISE_OR/1000000/manual_time                          -0.0236         +0.0239            30            29            45            47
COMPILED_BINARYOP<int16_t, int32_t, int64_t, cudf::binary_operator::BITWISE_OR>/BITWISE_OR/10000000/manual_time                         +0.0037         +0.0093           262           263           280           282
COMPILED_BINARYOP<int16_t, int32_t, int64_t, cudf::binary_operator::BITWISE_OR>/BITWISE_OR/100000000/manual_time                        +0.0082         +0.0084          2575          2596          2603          2625
COMPILED_BINARYOP<int16_t, int64_t, int32_t, cudf::binary_operator::BITWISE_XOR>/BITWISE_XOR/10000/manual_time                          +0.0587         +0.0273             5             5            23            24
COMPILED_BINARYOP<int16_t, int64_t, int32_t, cudf::binary_operator::BITWISE_XOR>/BITWISE_XOR/100000/manual_time                         -0.0616         +0.0259             8             8            25            25
COMPILED_BINARYOP<int16_t, int64_t, int32_t, cudf::binary_operator::BITWISE_XOR>/BITWISE_XOR/1000000/manual_time                        -0.0142         +0.0008            33            32            46            46
COMPILED_BINARYOP<int16_t, int64_t, int32_t, cudf::binary_operator::BITWISE_XOR>/BITWISE_XOR/10000000/manual_time                       +0.0003         +0.0068           254           254           270           272
COMPILED_BINARYOP<int16_t, int64_t, int32_t, cudf::binary_operator::BITWISE_XOR>/BITWISE_XOR/100000000/manual_time                      +0.0070         +0.0071          2464          2482          2491          2508
COMPILED_BINARYOP<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/LOGICAL_AND/10000/manual_time                               -0.0108         +0.0119             6             6            25            25
COMPILED_BINARYOP<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/LOGICAL_AND/100000/manual_time                              -0.0659         +0.0209             8             8            25            26
COMPILED_BINARYOP<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/LOGICAL_AND/1000000/manual_time                             -0.0123         +0.0002            30            30            43            43
COMPILED_BINARYOP<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/LOGICAL_AND/10000000/manual_time                            +0.0011         +0.0043           220           220           233           234
COMPILED_BINARYOP<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/LOGICAL_AND/100000000/manual_time                           -0.0019         -0.0013          2132          2128          2152          2149
COMPILED_BINARYOP<int16_t, int64_t, bool, cudf::binary_operator::LOGICAL_OR>/LOGICAL_OR/10000/manual_time                               +0.0433         +0.0230             5             5            24            24
COMPILED_BINARYOP<int16_t, int64_t, bool, cudf::binary_operator::LOGICAL_OR>/LOGICAL_OR/100000/manual_time                              -0.0741         +0.0300             8             7            25            25
COMPILED_BINARYOP<int16_t, int64_t, bool, cudf::binary_operator::LOGICAL_OR>/LOGICAL_OR/1000000/manual_time                             -0.0126         -0.0020            31            30            44            44
COMPILED_BINARYOP<int16_t, int64_t, bool, cudf::binary_operator::LOGICAL_OR>/LOGICAL_OR/10000000/manual_time                            -0.0004         +0.0016           220           220           233           234
COMPILED_BINARYOP<int16_t, int64_t, bool, cudf::binary_operator::LOGICAL_OR>/LOGICAL_OR/100000000/manual_time                           +0.0086         +0.0087          2099          2117          2120          2139
COMPILED_BINARYOP<decimal32, decimal32, bool, cudf::binary_operator::NOT_EQUAL>/NOT_EQUAL/10000/manual_time                             +0.0273         +0.0224             5             5            23            24
COMPILED_BINARYOP<decimal32, decimal32, bool, cudf::binary_operator::NOT_EQUAL>/NOT_EQUAL/100000/manual_time                            -0.0469         +0.0252             7             7            24            25
COMPILED_BINARYOP<decimal32, decimal32, bool, cudf::binary_operator::NOT_EQUAL>/NOT_EQUAL/1000000/manual_time                           -0.0152         +0.0064            25            25            38            38
COMPILED_BINARYOP<decimal32, decimal32, bool, cudf::binary_operator::NOT_EQUAL>/NOT_EQUAL/10000000/manual_time                          -0.0011         +0.0038           175           175           189           190
COMPILED_BINARYOP<decimal32, decimal32, bool, cudf::binary_operator::NOT_EQUAL>/NOT_EQUAL/100000000/manual_time                         +0.0023         +0.0033          1662          1666          1681          1687
COMPILED_BINARYOP<timestamp_s, timestamp_s, bool, cudf::binary_operator::LESS>/LESS/10000/manual_time                                   +0.0638         +0.0303             5             5            23            24
COMPILED_BINARYOP<timestamp_s, timestamp_s, bool, cudf::binary_operator::LESS>/LESS/100000/manual_time                                  -0.0678         +0.0128             9             8            26            26
COMPILED_BINARYOP<timestamp_s, timestamp_s, bool, cudf::binary_operator::LESS>/LESS/1000000/manual_time                                 -0.0069         +0.0078            40            40            53            53
COMPILED_BINARYOP<timestamp_s, timestamp_s, bool, cudf::binary_operator::LESS>/LESS/10000000/manual_time                                +0.0029         +0.0056           313           314           326           328
COMPILED_BINARYOP<timestamp_s, timestamp_s, bool, cudf::binary_operator::LESS>/LESS/100000000/manual_time                               +0.0061         +0.0062          3036          3054          3064          3083
COMPILED_BINARYOP<timestamp_ms, timestamp_s, bool, cudf::binary_operator::GREATER>/GREATER/10000/manual_time                            +0.0353         +0.0596             5             5            23            25
COMPILED_BINARYOP<timestamp_ms, timestamp_s, bool, cudf::binary_operator::GREATER>/GREATER/100000/manual_time                           -0.0687         +0.0314             9             8            26            26
COMPILED_BINARYOP<timestamp_ms, timestamp_s, bool, cudf::binary_operator::GREATER>/GREATER/1000000/manual_time                          -0.0083         +0.0076            40            40            53            53
COMPILED_BINARYOP<timestamp_ms, timestamp_s, bool, cudf::binary_operator::GREATER>/GREATER/10000000/manual_time                         +0.0004         +0.0094           313           314           325           328
COMPILED_BINARYOP<timestamp_ms, timestamp_s, bool, cudf::binary_operator::GREATER>/GREATER/100000000/manual_time                        +0.0052         +0.0054          3041          3057          3068          3085
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::NULL_EQUALS>/NULL_EQUALS/10000/manual_time                     +0.0518         +0.0333             6             7            24            25
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::NULL_EQUALS>/NULL_EQUALS/100000/manual_time                    -0.0913         -0.0003            11            10            28            28
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::NULL_EQUALS>/NULL_EQUALS/1000000/manual_time                   -0.0075         +0.0118            42            42            55            56
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::NULL_EQUALS>/NULL_EQUALS/10000000/manual_time                  +0.0024         +0.0046           316           316           329           331
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::NULL_EQUALS>/NULL_EQUALS/100000000/manual_time                 +0.0061         +0.0066          3069          3088          3096          3117
COMPILED_BINARYOP<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/NULL_MAX/10000/manual_time                          -0.4987         -0.2008            15             7            32            26
COMPILED_BINARYOP<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/NULL_MAX/100000/manual_time                         -0.7805         -0.5282            41             9            56            27
COMPILED_BINARYOP<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/NULL_MAX/1000000/manual_time                        -0.9226         -0.8900           382            30           399            44
COMPILED_BINARYOP<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/NULL_MAX/10000000/manual_time                       -0.9430         -0.9389          3895           222          3934           240
COMPILED_BINARYOP<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/NULL_MAX/100000000/manual_time                      -0.9446         -0.9465         39060          2163         40918          2188
COMPILED_BINARYOP<timestamp_D, timestamp_s, timestamp_s, cudf::binary_operator::NULL_MIN>/NULL_MIN/10000/manual_time                    -0.5298         -0.2134            14             7            32            25
COMPILED_BINARYOP<timestamp_D, timestamp_s, timestamp_s, cudf::binary_operator::NULL_MIN>/NULL_MIN/100000/manual_time                   -0.7567         -0.5205            40            10            55            26
COMPILED_BINARYOP<timestamp_D, timestamp_s, timestamp_s, cudf::binary_operator::NULL_MIN>/NULL_MIN/1000000/manual_time                  -0.8917         -0.8534           376            41           393            58
COMPILED_BINARYOP<timestamp_D, timestamp_s, timestamp_s, cudf::binary_operator::NULL_MIN>/NULL_MIN/10000000/manual_time                 -0.9061         -0.9021          3847           361          3887           381
COMPILED_BINARYOP<timestamp_D, timestamp_s, timestamp_s, cudf::binary_operator::NULL_MIN>/NULL_MIN/100000000/manual_time                -0.9071         -0.9111         38607          3586         40772          3623

@jrhemstad jrhemstad added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Dec 2, 2021
@vyasr vyasr linked an issue Dec 2, 2021 that may be closed by this pull request
@karthikeyann
Copy link
Contributor

__forceinline__ FTW! All the runtimes are back to original.
forced inline makes sense here because of 1 level or 2 level switch cases. (type_dispatcher and double_type_dispatcher)

Benchmark Details
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                                                             Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
COMPILED_BINARYOP<float, float, float, cudf::binary_operator::ADD>/ADD_1/10000/manual_time                                         7.60 us         26.9 us        93814
COMPILED_BINARYOP<float, float, float, cudf::binary_operator::ADD>/ADD_1/100000/manual_time                                        9.08 us         27.1 us        78654
COMPILED_BINARYOP<float, float, float, cudf::binary_operator::ADD>/ADD_1/1000000/manual_time                                       30.7 us         46.8 us        23073
COMPILED_BINARYOP<float, float, float, cudf::binary_operator::ADD>/ADD_1/10000000/manual_time                                       217 us          236 us         3218
COMPILED_BINARYOP<float, float, float, cudf::binary_operator::ADD>/ADD_1/100000000/manual_time                                     2134 us         2159 us          328
COMPILED_BINARYOP<timestamp_s, duration_s, timestamp_s, cudf::binary_operator::ADD>/ADD_2/10000/manual_time                        7.07 us         26.5 us        98532
COMPILED_BINARYOP<timestamp_s, duration_s, timestamp_s, cudf::binary_operator::ADD>/ADD_2/100000/manual_time                       10.5 us         28.4 us        66926
COMPILED_BINARYOP<timestamp_s, duration_s, timestamp_s, cudf::binary_operator::ADD>/ADD_2/1000000/manual_time                      44.5 us         62.0 us        15735
COMPILED_BINARYOP<timestamp_s, duration_s, timestamp_s, cudf::binary_operator::ADD>/ADD_2/10000000/manual_time                      399 us          419 us         1753
COMPILED_BINARYOP<timestamp_s, duration_s, timestamp_s, cudf::binary_operator::ADD>/ADD_2/100000000/manual_time                    3917 us         3958 us          179
COMPILED_BINARYOP<duration_s, duration_D, duration_ms, cudf::binary_operator::SUB>/SUB_1/10000/manual_time                         8.16 us         27.2 us        85388
COMPILED_BINARYOP<duration_s, duration_D, duration_ms, cudf::binary_operator::SUB>/SUB_1/100000/manual_time                        10.8 us         28.3 us        64764
COMPILED_BINARYOP<duration_s, duration_D, duration_ms, cudf::binary_operator::SUB>/SUB_1/1000000/manual_time                       40.8 us         58.2 us        17159
COMPILED_BINARYOP<duration_s, duration_D, duration_ms, cudf::binary_operator::SUB>/SUB_1/10000000/manual_time                       356 us          375 us         1967
COMPILED_BINARYOP<duration_s, duration_D, duration_ms, cudf::binary_operator::SUB>/SUB_1/100000000/manual_time                     3475 us         3512 us          201
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::SUB>/SUB_2/10000/manual_time                                   7.63 us         26.7 us        92563
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::SUB>/SUB_2/100000/manual_time                                  11.0 us         28.7 us        63200
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::SUB>/SUB_2/1000000/manual_time                                 45.7 us         62.9 us        15324
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::SUB>/SUB_2/10000000/manual_time                                 404 us          423 us         1731
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::SUB>/SUB_2/100000000/manual_time                               3969 us         4010 us          176
COMPILED_BINARYOP<float, float, int64_t, cudf::binary_operator::MUL>/MUL_1/10000/manual_time                                       7.34 us         26.6 us        95547
COMPILED_BINARYOP<float, float, int64_t, cudf::binary_operator::MUL>/MUL_1/100000/manual_time                                      9.08 us         26.5 us        77140
COMPILED_BINARYOP<float, float, int64_t, cudf::binary_operator::MUL>/MUL_1/1000000/manual_time                                     33.4 us         50.9 us        20945
COMPILED_BINARYOP<float, float, int64_t, cudf::binary_operator::MUL>/MUL_1/10000000/manual_time                                     286 us          305 us         2451
COMPILED_BINARYOP<float, float, int64_t, cudf::binary_operator::MUL>/MUL_1/100000000/manual_time                                   2806 us         2836 us          250
COMPILED_BINARYOP<duration_s, int64_t, duration_s, cudf::binary_operator::MUL>/MUL_2/10000/manual_time                             7.55 us         26.9 us        92540
COMPILED_BINARYOP<duration_s, int64_t, duration_s, cudf::binary_operator::MUL>/MUL_2/100000/manual_time                            10.7 us         28.5 us        65414
COMPILED_BINARYOP<duration_s, int64_t, duration_s, cudf::binary_operator::MUL>/MUL_2/1000000/manual_time                           44.7 us         62.1 us        15656
COMPILED_BINARYOP<duration_s, int64_t, duration_s, cudf::binary_operator::MUL>/MUL_2/10000000/manual_time                           400 us          419 us         1751
COMPILED_BINARYOP<duration_s, int64_t, duration_s, cudf::binary_operator::MUL>/MUL_2/100000000/manual_time                         3919 us         3961 us          179
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::DIV>/DIV_1/10000/manual_time                                   7.91 us         27.0 us        88915
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::DIV>/DIV_1/100000/manual_time                                  11.3 us         28.9 us        61636
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::DIV>/DIV_1/1000000/manual_time                                 46.0 us         63.4 us        15398
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::DIV>/DIV_1/10000000/manual_time                                 404 us          423 us         1734
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::DIV>/DIV_1/100000000/manual_time                               3949 us         3989 us          177
COMPILED_BINARYOP<duration_ms, int32_t, duration_ms, cudf::binary_operator::DIV>/DIV_2/10000/manual_time                           7.58 us         26.8 us        92176
COMPILED_BINARYOP<duration_ms, int32_t, duration_ms, cudf::binary_operator::DIV>/DIV_2/100000/manual_time                          10.1 us         27.6 us        69134
COMPILED_BINARYOP<duration_ms, int32_t, duration_ms, cudf::binary_operator::DIV>/DIV_2/1000000/manual_time                         38.2 us         55.9 us        18288
COMPILED_BINARYOP<duration_ms, int32_t, duration_ms, cudf::binary_operator::DIV>/DIV_2/10000000/manual_time                         338 us          357 us         2069
COMPILED_BINARYOP<duration_ms, int32_t, duration_ms, cudf::binary_operator::DIV>/DIV_2/100000000/manual_time                       3322 us         3356 us          211
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::TRUE_DIV>/TRUE_DIV/10000/manual_time                           7.96 us         27.1 us        88998
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::TRUE_DIV>/TRUE_DIV/100000/manual_time                          11.3 us         29.0 us        62280
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::TRUE_DIV>/TRUE_DIV/1000000/manual_time                         45.4 us         62.7 us        15440
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::TRUE_DIV>/TRUE_DIV/10000000/manual_time                         403 us          423 us         1735
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::TRUE_DIV>/TRUE_DIV/100000000/manual_time                       3948 us         3989 us          177
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::FLOOR_DIV>/FLOOR_DIV/10000/manual_time                         7.86 us         27.0 us        89265
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::FLOOR_DIV>/FLOOR_DIV/100000/manual_time                        11.2 us         28.8 us        62908
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::FLOOR_DIV>/FLOOR_DIV/1000000/manual_time                       45.2 us         62.7 us        15415
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::FLOOR_DIV>/FLOOR_DIV/10000000/manual_time                       403 us          423 us         1735
COMPILED_BINARYOP<int64_t, int64_t, int64_t, cudf::binary_operator::FLOOR_DIV>/FLOOR_DIV/100000000/manual_time                     3949 us         3990 us          177
COMPILED_BINARYOP<double, double, double, cudf::binary_operator::MOD>/MOD_1/10000/manual_time                                      8.46 us         27.5 us        75276
COMPILED_BINARYOP<double, double, double, cudf::binary_operator::MOD>/MOD_1/100000/manual_time                                     11.3 us         29.0 us        62100
COMPILED_BINARYOP<double, double, double, cudf::binary_operator::MOD>/MOD_1/1000000/manual_time                                    47.1 us         64.7 us        14930
COMPILED_BINARYOP<double, double, double, cudf::binary_operator::MOD>/MOD_1/10000000/manual_time                                    405 us          424 us         1731
COMPILED_BINARYOP<double, double, double, cudf::binary_operator::MOD>/MOD_1/100000000/manual_time                                  3955 us         3998 us          177
COMPILED_BINARYOP<duration_ms, int64_t, duration_ms, cudf::binary_operator::MOD>/MOD_2/10000/manual_time                           7.63 us         27.0 us        91551
COMPILED_BINARYOP<duration_ms, int64_t, duration_ms, cudf::binary_operator::MOD>/MOD_2/100000/manual_time                          10.8 us         28.6 us        65010
COMPILED_BINARYOP<duration_ms, int64_t, duration_ms, cudf::binary_operator::MOD>/MOD_2/1000000/manual_time                         44.8 us         62.2 us        15620
COMPILED_BINARYOP<duration_ms, int64_t, duration_ms, cudf::binary_operator::MOD>/MOD_2/10000000/manual_time                         400 us          419 us         1750
COMPILED_BINARYOP<duration_ms, int64_t, duration_ms, cudf::binary_operator::MOD>/MOD_2/100000000/manual_time                       3920 us         3961 us          179
COMPILED_BINARYOP<int32_t, int64_t, double, cudf::binary_operator::PMOD>/PMOD/10000/manual_time                                    8.99 us         28.0 us        78040
COMPILED_BINARYOP<int32_t, int64_t, double, cudf::binary_operator::PMOD>/PMOD/100000/manual_time                                   11.1 us         28.4 us        63161
COMPILED_BINARYOP<int32_t, int64_t, double, cudf::binary_operator::PMOD>/PMOD/1000000/manual_time                                  41.8 us         59.2 us        16810
COMPILED_BINARYOP<int32_t, int64_t, double, cudf::binary_operator::PMOD>/PMOD/10000000/manual_time                                  358 us          377 us         1960
COMPILED_BINARYOP<int32_t, int64_t, double, cudf::binary_operator::PMOD>/PMOD/100000000/manual_time                                3450 us         3484 us          203
COMPILED_BINARYOP<int32_t, uint8_t, int64_t, cudf::binary_operator::PYMOD>/PYMOD/10000/manual_time                                 8.43 us         27.6 us        82966
COMPILED_BINARYOP<int32_t, uint8_t, int64_t, cudf::binary_operator::PYMOD>/PYMOD/100000/manual_time                                9.88 us         27.6 us        70823
COMPILED_BINARYOP<int32_t, uint8_t, int64_t, cudf::binary_operator::PYMOD>/PYMOD/1000000/manual_time                               33.0 us         50.6 us        21265
COMPILED_BINARYOP<int32_t, uint8_t, int64_t, cudf::binary_operator::PYMOD>/PYMOD/10000000/manual_time                               279 us          298 us         2513
COMPILED_BINARYOP<int32_t, uint8_t, int64_t, cudf::binary_operator::PYMOD>/PYMOD/100000000/manual_time                             2718 us         2747 us          258
COMPILED_BINARYOP<int64_t, int64_t, double, cudf::binary_operator::POW>/POW/10000/manual_time                                      11.5 us         30.7 us        62355
COMPILED_BINARYOP<int64_t, int64_t, double, cudf::binary_operator::POW>/POW/100000/manual_time                                     13.4 us         31.1 us        52333
COMPILED_BINARYOP<int64_t, int64_t, double, cudf::binary_operator::POW>/POW/1000000/manual_time                                    57.2 us         73.8 us        12239
COMPILED_BINARYOP<int64_t, int64_t, double, cudf::binary_operator::POW>/POW/10000000/manual_time                                    480 us          498 us         1458
COMPILED_BINARYOP<int64_t, int64_t, double, cudf::binary_operator::POW>/POW/100000000/manual_time                                  4675 us         4724 us          150
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::LOG_BASE>/LOG_BASE/10000/manual_time                               11.2 us         30.5 us        62561
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::LOG_BASE>/LOG_BASE/100000/manual_time                              13.1 us         30.6 us        53520
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::LOG_BASE>/LOG_BASE/1000000/manual_time                             45.0 us         62.3 us        15553
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::LOG_BASE>/LOG_BASE/10000000/manual_time                             363 us          383 us         1926
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::LOG_BASE>/LOG_BASE/100000000/manual_time                           3512 us         3548 us          199
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::ATAN2>/ATAN2/10000/manual_time                                     9.90 us         29.1 us        70438
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::ATAN2>/ATAN2/100000/manual_time                                    11.6 us         29.2 us        60091
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::ATAN2>/ATAN2/1000000/manual_time                                   41.5 us         58.9 us        16880
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::ATAN2>/ATAN2/10000000/manual_time                                   350 us          369 us         2002
COMPILED_BINARYOP<float, double, double, cudf::binary_operator::ATAN2>/ATAN2/100000000/manual_time                                 3385 us         3422 us          207
COMPILED_BINARYOP<int, int, int, cudf::binary_operator::SHIFT_LEFT>/SHIFT_LEFT/10000/manual_time                                   7.37 us         26.6 us        95919
COMPILED_BINARYOP<int, int, int, cudf::binary_operator::SHIFT_LEFT>/SHIFT_LEFT/100000/manual_time                                  8.67 us         26.7 us        80259
COMPILED_BINARYOP<int, int, int, cudf::binary_operator::SHIFT_LEFT>/SHIFT_LEFT/1000000/manual_time                                 29.3 us         45.2 us        23890
COMPILED_BINARYOP<int, int, int, cudf::binary_operator::SHIFT_LEFT>/SHIFT_LEFT/10000000/manual_time                                 211 us          230 us         3314
COMPILED_BINARYOP<int, int, int, cudf::binary_operator::SHIFT_LEFT>/SHIFT_LEFT/100000000/manual_time                               2066 us         2091 us          339
COMPILED_BINARYOP<int16_t, int64_t, int, cudf::binary_operator::SHIFT_RIGHT>/SHIFT_RIGHT/10000/manual_time                         7.93 us         27.1 us        88109
COMPILED_BINARYOP<int16_t, int64_t, int, cudf::binary_operator::SHIFT_RIGHT>/SHIFT_RIGHT/100000/manual_time                        9.97 us         27.7 us        69537
COMPILED_BINARYOP<int16_t, int64_t, int, cudf::binary_operator::SHIFT_RIGHT>/SHIFT_RIGHT/1000000/manual_time                       35.0 us         51.2 us        19859
COMPILED_BINARYOP<int16_t, int64_t, int, cudf::binary_operator::SHIFT_RIGHT>/SHIFT_RIGHT/10000000/manual_time                       269 us          286 us         2599
COMPILED_BINARYOP<int16_t, int64_t, int, cudf::binary_operator::SHIFT_RIGHT>/SHIFT_RIGHT/100000000/manual_time                     2629 us         2656 us          266
COMPILED_BINARYOP<int64_t, int32_t, int64_t, cudf::binary_operator::SHIFT_RIGHT_UNSIGNED>/USHIFT_RIGHT/10000/manual_time           7.53 us         26.6 us        92554
COMPILED_BINARYOP<int64_t, int32_t, int64_t, cudf::binary_operator::SHIFT_RIGHT_UNSIGNED>/USHIFT_RIGHT/100000/manual_time          10.3 us         27.7 us        67555
COMPILED_BINARYOP<int64_t, int32_t, int64_t, cudf::binary_operator::SHIFT_RIGHT_UNSIGNED>/USHIFT_RIGHT/1000000/manual_time         38.8 us         56.2 us        18019
COMPILED_BINARYOP<int64_t, int32_t, int64_t, cudf::binary_operator::SHIFT_RIGHT_UNSIGNED>/USHIFT_RIGHT/10000000/manual_time         345 us          364 us         2029
COMPILED_BINARYOP<int64_t, int32_t, int64_t, cudf::binary_operator::SHIFT_RIGHT_UNSIGNED>/USHIFT_RIGHT/100000000/manual_time       3377 us         3410 us          207
COMPILED_BINARYOP<int64_t, int32_t, int16_t, cudf::binary_operator::BITWISE_AND>/BITWISE_AND/10000/manual_time                     7.52 us         26.7 us        92786
COMPILED_BINARYOP<int64_t, int32_t, int16_t, cudf::binary_operator::BITWISE_AND>/BITWISE_AND/100000/manual_time                    9.93 us         28.0 us        70845
COMPILED_BINARYOP<int64_t, int32_t, int16_t, cudf::binary_operator::BITWISE_AND>/BITWISE_AND/1000000/manual_time                   35.8 us         51.3 us        19540
COMPILED_BINARYOP<int64_t, int32_t, int16_t, cudf::binary_operator::BITWISE_AND>/BITWISE_AND/10000000/manual_time                   253 us          269 us         2768
COMPILED_BINARYOP<int64_t, int32_t, int16_t, cudf::binary_operator::BITWISE_AND>/BITWISE_AND/100000000/manual_time                 2401 us         2426 us          292
COMPILED_BINARYOP<int16_t, int32_t, int64_t, cudf::binary_operator::BITWISE_OR>/BITWISE_OR/10000/manual_time                       7.96 us         27.2 us        87850
COMPILED_BINARYOP<int16_t, int32_t, int64_t, cudf::binary_operator::BITWISE_OR>/BITWISE_OR/100000/manual_time                      9.51 us         27.0 us        73590
COMPILED_BINARYOP<int16_t, int32_t, int64_t, cudf::binary_operator::BITWISE_OR>/BITWISE_OR/1000000/manual_time                     31.3 us         48.8 us        22278
COMPILED_BINARYOP<int16_t, int32_t, int64_t, cudf::binary_operator::BITWISE_OR>/BITWISE_OR/10000000/manual_time                     263 us          283 us         2657
COMPILED_BINARYOP<int16_t, int32_t, int64_t, cudf::binary_operator::BITWISE_OR>/BITWISE_OR/100000000/manual_time                   2542 us         2570 us          275
COMPILED_BINARYOP<int16_t, int64_t, int32_t, cudf::binary_operator::BITWISE_XOR>/BITWISE_XOR/10000/manual_time                     7.94 us         27.2 us        87897
COMPILED_BINARYOP<int16_t, int64_t, int32_t, cudf::binary_operator::BITWISE_XOR>/BITWISE_XOR/100000/manual_time                   10.00 us         27.7 us        69361
COMPILED_BINARYOP<int16_t, int64_t, int32_t, cudf::binary_operator::BITWISE_XOR>/BITWISE_XOR/1000000/manual_time                   35.0 us         51.2 us        19966
COMPILED_BINARYOP<int16_t, int64_t, int32_t, cudf::binary_operator::BITWISE_XOR>/BITWISE_XOR/10000000/manual_time                   269 us          287 us         2598
COMPILED_BINARYOP<int16_t, int64_t, int32_t, cudf::binary_operator::BITWISE_XOR>/BITWISE_XOR/100000000/manual_time                 2632 us         2659 us          266
COMPILED_BINARYOP<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/LOGICAL_AND/10000/manual_time                          7.37 us         26.9 us        94916
COMPILED_BINARYOP<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/LOGICAL_AND/100000/manual_time                         8.89 us         27.1 us        78970
COMPILED_BINARYOP<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/LOGICAL_AND/1000000/manual_time                        26.9 us         42.5 us        26113
COMPILED_BINARYOP<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/LOGICAL_AND/10000000/manual_time                        182 us          197 us         3854
COMPILED_BINARYOP<double, int8_t, bool, cudf::binary_operator::LOGICAL_AND>/LOGICAL_AND/100000000/manual_time                      1729 us         1750 us          405
COMPILED_BINARYOP<int16_t, int64_t, bool, cudf::binary_operator::LOGICAL_OR>/LOGICAL_OR/10000/manual_time                          8.18 us         27.6 us        85248
COMPILED_BINARYOP<int16_t, int64_t, bool, cudf::binary_operator::LOGICAL_OR>/LOGICAL_OR/100000/manual_time                         10.2 us         28.4 us        68412
COMPILED_BINARYOP<int16_t, int64_t, bool, cudf::binary_operator::LOGICAL_OR>/LOGICAL_OR/1000000/manual_time                        31.5 us         47.1 us        22214
COMPILED_BINARYOP<int16_t, int64_t, bool, cudf::binary_operator::LOGICAL_OR>/LOGICAL_OR/10000000/manual_time                        214 us          229 us         3276
COMPILED_BINARYOP<int16_t, int64_t, bool, cudf::binary_operator::LOGICAL_OR>/LOGICAL_OR/100000000/manual_time                      2036 us         2058 us          344
COMPILED_BINARYOP<int32_t, int64_t, bool, cudf::binary_operator::EQUAL>/EQUAL_1/10000/manual_time                                  7.98 us         27.1 us        87811
COMPILED_BINARYOP<int32_t, int64_t, bool, cudf::binary_operator::EQUAL>/EQUAL_1/100000/manual_time                                 9.87 us         28.1 us        71067
COMPILED_BINARYOP<int32_t, int64_t, bool, cudf::binary_operator::EQUAL>/EQUAL_1/1000000/manual_time                                37.1 us         52.7 us        18886
COMPILED_BINARYOP<int32_t, int64_t, bool, cudf::binary_operator::EQUAL>/EQUAL_1/10000000/manual_time                                256 us          272 us         2728
COMPILED_BINARYOP<int32_t, int64_t, bool, cudf::binary_operator::EQUAL>/EQUAL_1/100000000/manual_time                              2437 us         2461 us          287
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::EQUAL>/EQUAL_2/10000/manual_time                          8.11 us         27.3 us        86093
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::EQUAL>/EQUAL_2/100000/manual_time                         10.8 us         29.3 us        64431
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::EQUAL>/EQUAL_2/1000000/manual_time                        43.7 us         59.0 us        16083
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::EQUAL>/EQUAL_2/10000000/manual_time                        306 us          322 us         2284
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::EQUAL>/EQUAL_2/100000000/manual_time                      2912 us         2940 us          240
COMPILED_BINARYOP<decimal32, decimal32, bool, cudf::binary_operator::NOT_EQUAL>/NOT_EQUAL/10000/manual_time                        7.14 us         26.4 us        97828
COMPILED_BINARYOP<decimal32, decimal32, bool, cudf::binary_operator::NOT_EQUAL>/NOT_EQUAL/100000/manual_time                       8.46 us         26.7 us        82608
COMPILED_BINARYOP<decimal32, decimal32, bool, cudf::binary_operator::NOT_EQUAL>/NOT_EQUAL/1000000/manual_time                      29.3 us         44.6 us        23933
COMPILED_BINARYOP<decimal32, decimal32, bool, cudf::binary_operator::NOT_EQUAL>/NOT_EQUAL/10000000/manual_time                      187 us          202 us         3747
COMPILED_BINARYOP<decimal32, decimal32, bool, cudf::binary_operator::NOT_EQUAL>/NOT_EQUAL/100000000/manual_time                    1777 us         1796 us          394
COMPILED_BINARYOP<timestamp_s, timestamp_s, bool, cudf::binary_operator::LESS>/LESS/10000/manual_time                              6.87 us         26.0 us       101847
COMPILED_BINARYOP<timestamp_s, timestamp_s, bool, cudf::binary_operator::LESS>/LESS/100000/manual_time                             10.0 us         28.5 us        69837
COMPILED_BINARYOP<timestamp_s, timestamp_s, bool, cudf::binary_operator::LESS>/LESS/1000000/manual_time                            42.3 us         57.6 us        16563
COMPILED_BINARYOP<timestamp_s, timestamp_s, bool, cudf::binary_operator::LESS>/LESS/10000000/manual_time                            298 us          314 us         2348
COMPILED_BINARYOP<timestamp_s, timestamp_s, bool, cudf::binary_operator::LESS>/LESS/100000000/manual_time                          2864 us         2892 us          244
COMPILED_BINARYOP<timestamp_ms, timestamp_s, bool, cudf::binary_operator::GREATER>/GREATER/10000/manual_time                       6.92 us         26.1 us       101475
COMPILED_BINARYOP<timestamp_ms, timestamp_s, bool, cudf::binary_operator::GREATER>/GREATER/100000/manual_time                      10.0 us         28.4 us        69718
COMPILED_BINARYOP<timestamp_ms, timestamp_s, bool, cudf::binary_operator::GREATER>/GREATER/1000000/manual_time                     42.5 us         57.8 us        16498
COMPILED_BINARYOP<timestamp_ms, timestamp_s, bool, cudf::binary_operator::GREATER>/GREATER/10000000/manual_time                     298 us          314 us         2348
COMPILED_BINARYOP<timestamp_ms, timestamp_s, bool, cudf::binary_operator::GREATER>/GREATER/100000000/manual_time                   2865 us         2893 us          244
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::NULL_EQUALS>/NULL_EQUALS/10000/manual_time                9.75 us         28.9 us        71810
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::NULL_EQUALS>/NULL_EQUALS/100000/manual_time               13.0 us         31.2 us        53794
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::NULL_EQUALS>/NULL_EQUALS/1000000/manual_time              46.2 us         61.6 us        15158
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::NULL_EQUALS>/NULL_EQUALS/10000000/manual_time              317 us          332 us         2212
COMPILED_BINARYOP<duration_ms, duration_ns, bool, cudf::binary_operator::NULL_EQUALS>/NULL_EQUALS/100000000/manual_time            3021 us         3049 us          232
COMPILED_BINARYOP<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/NULL_MAX/10000/manual_time                     9.34 us         28.4 us        74803
COMPILED_BINARYOP<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/NULL_MAX/100000/manual_time                    11.4 us         29.1 us        61485
COMPILED_BINARYOP<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/NULL_MAX/1000000/manual_time                   33.6 us         49.3 us        20802
COMPILED_BINARYOP<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/NULL_MAX/10000000/manual_time                   228 us          246 us         3070
COMPILED_BINARYOP<decimal32, decimal32, decimal32, cudf::binary_operator::NULL_MAX>/NULL_MAX/100000000/manual_time                 2244 us         2269 us          312
COMPILED_BINARYOP<timestamp_D, timestamp_s, timestamp_s, cudf::binary_operator::NULL_MIN>/NULL_MIN/10000/manual_time               9.53 us         28.5 us        73805
COMPILED_BINARYOP<timestamp_D, timestamp_s, timestamp_s, cudf::binary_operator::NULL_MIN>/NULL_MIN/100000/manual_time              12.7 us         29.8 us        54605
COMPILED_BINARYOP<timestamp_D, timestamp_s, timestamp_s, cudf::binary_operator::NULL_MIN>/NULL_MIN/1000000/manual_time             43.1 us         60.3 us        16184
COMPILED_BINARYOP<timestamp_D, timestamp_s, timestamp_s, cudf::binary_operator::NULL_MIN>/NULL_MIN/10000000/manual_time             362 us          381 us         1934
COMPILED_BINARYOP<timestamp_D, timestamp_s, timestamp_s, cudf::binary_operator::NULL_MIN>/NULL_MIN/100000000/manual_time           3489 us         3526 us          201

Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Will initiate PR for templated benchmark separately to make things easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge ! - Hotfix Hotfix is a bug that affects the majority of users for which there is no reasonable workaround CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Dispatch to different binary op kernels for common data types
8 participants