-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Convert Binary Operations to use device-side type dispatch rather than JIT #7801
Comments
For this, I think it'd make sense to make a cudf/cpp/include/cudf/utilities/type_dispatcher.hpp Lines 546 to 570 in 24f3016
|
This issue has been labeled |
Tried triple_dispatcher. Compile time for each triple_dispatcher adds 60 seconds. |
edit: I missed the link in the original post. |
@cwharris one important Spark use case runs on platforms where they can't write a JIT cache so every worker pays the JIT overhead. |
Add compiled binary operations (instead of jit). _(column_view lhs OP column_view rhs) => output_type_ _(scalar lhs OP column_view rhs) => output_type_ _(column_view lhs OP scalar rhs) => output_type_ This PR adds compiled binary operations for fixed_width types. string operations are already compiled. Partially addresses #7801 #### Approach: Each triple dispatcher compilation takes 60 seconds. So, the device side dispatch is done in 2 ways. 1. for types with `common_type<Out, Lhs, Rhs>`,single device side dispatch with nested individual type dispatch. 2. double side dispatch for non-common types, with nested output type dispatch. (compiled faster than device side triple dispatch). Each operation compile time will be close to 30 seconds. Adds ~4 minutes compilation time (ninja -j 12) (1.5 minutes for cu files, util.cpp takes 2.5 minutes) Identifying common_type is done in host. Operations functor is defined only for types supported. and for operation not supported, no operation is done in device code. - [x] Identifying supported operations and types in host. - [x] host side operation dispatch - different file for different operation to parallelize compile time. - [x] All jit supported operations support added as compiled. ~- [ ] ~Coalesce~, redirect Userdefined to jit.~ - [x] scalar support added. #### Unit tests: - [x] tests added for numeric types and chrono type combinations for all supported operators It can't test all types. compilation time would be huge. while, covering all valid combinations of types in all operations, while keeping it minimal. #### Benchmark: - [x] jit binops benchmark - [x] compiled binops benchmark Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Robert Maynard (https://github.com/robertmaynard) - GALI PREM SAGAR (https://github.com/galipremsagar) - Charles Blackmon-Luca (https://github.com/charlesbluca) - Nghia Truong (https://github.com/ttnghia) - Jake Hemstad (https://github.com/jrhemstad) URL: #8192
integration to sparks, python pending. |
closes #7801 `cudf::binary_operation` calls compiled binary ops. `cudf::jit::binary_operation` calls jit binary ops So, compiled binary ops is called in libcudf (groupby, rescale), python (binary ops) and java (binary ops) **Breaking change:** New: Logical and Comparison operators can have output type to be only bool type. Old: Logical operators can have integer or any other output type that can be constructed from bool type. Comparison operators required bool type only. In this release (21.10), `experimental` namespace is dropped, and compiled binary ops replaces jit binary ops in libcudf, except for user defined op. Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Robert (Bobby) Evans (https://github.com/revans2) - Charles Blackmon-Luca (https://github.com/charlesbluca) - https://github.com/nvdbaranec - Nghia Truong (https://github.com/ttnghia) - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) - GALI PREM SAGAR (https://github.com/galipremsagar) - Mark Harris (https://github.com/harrism) URL: #8741
JIT overhead for binary ops is a problem in some situations. @karthikeyann did an experiment a while back to use device-side dispatch rather than JIT. The compile time cost was low, and performance was good.
Compiled binops use a single compiled kernel (per operation) with device-side type dispatch, so the code should be simpler. The operation dispatch can be on the host, but there are only a small number of operations relative to the combinations of types and operations, so this should not be expensive.
The text was updated successfully, but these errors were encountered: