Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Convert Binary Operations to use device-side type dispatch rather than JIT #7801

Closed
harrism opened this issue Mar 31, 2021 · 6 comments · Fixed by #8192 or #8741
Closed

[FEA] Convert Binary Operations to use device-side type dispatch rather than JIT #7801

harrism opened this issue Mar 31, 2021 · 6 comments · Fixed by #8192 or #8741
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@harrism
Copy link
Member

harrism commented Mar 31, 2021

JIT overhead for binary ops is a problem in some situations. @karthikeyann did an experiment a while back to use device-side dispatch rather than JIT. The compile time cost was low, and performance was good.

Compiled binops use a single compiled kernel (per operation) with device-side type dispatch, so the code should be simpler. The operation dispatch can be on the host, but there are only a small number of operations relative to the combinations of types and operations, so this should not be expensive.

@harrism harrism added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. labels Mar 31, 2021
@jrhemstad
Copy link
Contributor

For this, I think it'd make sense to make a triple_type_dispatcher, ala

/**
* @brief Dispatches two type template parameters to a callable.
*
* This function expects a callable `f` with an `operator()` template accepting
* two typename template parameters `T1` and `T2`.
*
* @param type1 The `data_type` used to dispatch a type for the first template
* parameter of the callable `F`
* @param type2 The `data_type` used to dispatch a type for the second template
* parameter of the callable `F`
* @param args Parameter pack forwarded to the `operator()` invocation `F`.
*/
#pragma nv_exec_check_disable
template <template <cudf::type_id> typename IdTypeMap = id_to_type_impl, typename F, typename... Ts>
CUDA_HOST_DEVICE_CALLABLE constexpr decltype(auto) double_type_dispatcher(cudf::data_type type1,
cudf::data_type type2,
F&& f,
Ts&&... args)
{
return type_dispatcher<IdTypeMap>(type1,
detail::double_type_dispatcher_first_type<IdTypeMap>{},
type2,
std::forward<F>(f),
std::forward<Ts>(args)...);
}

@github-actions
Copy link

github-actions bot commented May 1, 2021

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@harrism harrism changed the title [FEA] Convert Binary Options to use device-side type dispatch rather than JIT [FEA] Convert Binary Operations to use device-side type dispatch rather than JIT May 4, 2021
@karthikeyann
Copy link
Contributor

Tried triple_dispatcher. Compile time for each triple_dispatcher adds 60 seconds.
So, going with different approach. (common_type type_dispatcher, and double_type_dispatcher).

@harrism harrism linked a pull request May 24, 2021 that will close this issue
7 tasks
@cwharris
Copy link
Contributor

cwharris commented May 26, 2021

JIT overhead for binary ops is a problem in some situations.

Can we get more specific here? How we handle JIT compilation was recently changed with jitify2. Is it possible some of these problems are alleviated?

edit: I missed the link in the original post.

@harrism
Copy link
Member Author

harrism commented May 27, 2021

@cwharris one important Spark use case runs on platforms where they can't write a JIT cache so every worker pays the JIT overhead.

rapids-bot bot pushed a commit that referenced this issue Jul 13, 2021
Add compiled binary operations (instead of jit).
_(column_view lhs OP column_view rhs) => output_type_
_(scalar      lhs OP column_view rhs) => output_type_
_(column_view lhs OP scalar      rhs) => output_type_


This PR adds compiled binary operations for fixed_width types. 
string operations are already compiled.

Partially addresses #7801

#### Approach:
Each triple dispatcher compilation takes 60 seconds.
So, the device side dispatch is done in 2 ways.
1. for types with `common_type<Out, Lhs, Rhs>`,single device side dispatch with nested individual type dispatch.
2. double side dispatch for non-common types, with nested output type dispatch. (compiled faster than device side triple dispatch).

Each operation compile time will be close to 30 seconds.
Adds ~4 minutes compilation time (ninja -j 12) (1.5 minutes for cu files, util.cpp takes 2.5 minutes)

Identifying common_type is done in host. 
Operations functor is defined only for types supported.
and for operation not supported, no operation is done in device code.
- [x] Identifying supported operations and types in host.
- [x] host side operation dispatch - different file for different operation to parallelize compile time.
- [x] All jit supported operations support added as compiled. 
~- [ ] ~Coalesce~, redirect Userdefined to jit.~
- [x] scalar support added.

#### Unit tests:
- [x] tests added for numeric types and chrono type combinations for all supported operators
It can't test all types. compilation time would be huge.
while, covering all valid combinations of types in all operations, while keeping it minimal.

#### Benchmark:
- [x] jit binops benchmark
- [x] compiled binops benchmark

Authors:
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Robert (Bobby) Evans (https://github.com/revans2)
  - Robert Maynard (https://github.com/robertmaynard)
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Charles Blackmon-Luca (https://github.com/charlesbluca)
  - Nghia Truong (https://github.com/ttnghia)
  - Jake Hemstad (https://github.com/jrhemstad)

URL: #8192
@karthikeyann
Copy link
Contributor

integration to sparks, python pending.

rapids-bot bot pushed a commit that referenced this issue Aug 23, 2021
closes #7801

`cudf::binary_operation` calls compiled binary ops.
`cudf::jit::binary_operation` calls  jit binary ops 
So, compiled binary ops is called in libcudf (groupby, rescale), python (binary ops) and java (binary ops)

**Breaking change:**
New: Logical and Comparison operators can have output type to be only bool type.
Old: Logical operators can have integer or any other output type that can be constructed from bool type. Comparison operators required bool type only.



In this release (21.10), `experimental` namespace is dropped, and compiled binary ops replaces jit binary ops in libcudf, except for user defined op.

Authors:
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Robert (Bobby) Evans (https://github.com/revans2)
  - Charles Blackmon-Luca (https://github.com/charlesbluca)
  - https://github.com/nvdbaranec
  - Nghia Truong (https://github.com/ttnghia)
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Mark Harris (https://github.com/harrism)

URL: #8741
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
4 participants