Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose expression base class publicly and simplify public AST API #9045

Merged
merged 25 commits into from
Aug 18, 2021
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
b32cf27
Move all node APIs requiring components of detail out of public heade…
vyasr Jul 19, 2021
f1983e5
Move public-facing operators to nodes.hpp and remove operators header.
vyasr Jul 19, 2021
dfb7494
Rename nodes to expressions.
vyasr Jul 19, 2021
02e5933
Remove detail header for expressions and move node class out of detai…
vyasr Aug 9, 2021
5a129bd
Move nodes.cpp to expressions.cpp.
vyasr Aug 9, 2021
338aaa1
Rename expression class to operation.
vyasr Aug 9, 2021
51de658
Add missing virtual destructor.
vyasr Aug 9, 2021
9e379f9
Change all APIs to accept arbitrary nodes.
vyasr Aug 9, 2021
a90540f
Add test that should fail for literal/col ref inputs.
vyasr Aug 9, 2021
bd80c4d
Wrap raw literal/column reference expressions in an IDENTITY to enabl…
vyasr Aug 10, 2021
a658f6b
Rename node class to expression.
vyasr Aug 10, 2021
40a81b3
Accept all table_view inputs by const ref.
vyasr Aug 10, 2021
12215b8
Update meta.yaml.
vyasr Aug 16, 2021
6924df0
Minor typo fix.
vyasr Aug 16, 2021
a2db4f0
Update Java AST API bindings.
vyasr Aug 16, 2021
1c7807b
Update all Java code.
vyasr Aug 17, 2021
1684194
Rename files to match classes, update imports, and fix tests.
vyasr Aug 17, 2021
ad4deb1
Update copyright.
vyasr Aug 17, 2021
6aee7a1
Merge remote-tracking branch 'origin/branch-21.10' into refactor/expr…
vyasr Aug 17, 2021
f930fe4
Update gather map size JNI APIs.
vyasr Aug 17, 2021
9fa3b22
Change compiled_expr to store a single list of expression.
vyasr Aug 17, 2021
443f3ad
Fix new tests using old BinaryExpression API.
vyasr Aug 17, 2021
c7d41f5
Move compile to AstExpression, remove Operation, and update all assoc…
vyasr Aug 17, 2021
57cef77
Remove explicit UnaryOperator.IDENTITY for col ref and literals, and …
vyasr Aug 18, 2021
fb05dd7
Call compile_expression from compile_serialized_ast to reduce duplica…
vyasr Aug 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions conda/recipes/libcudf/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,7 @@ test:
- test -f $PREFIX/include/cudf/aggregation.hpp
- test -f $PREFIX/include/cudf/ast/detail/expression_parser.hpp
- test -f $PREFIX/include/cudf/ast/detail/operators.hpp
- test -f $PREFIX/include/cudf/ast/nodes.hpp
- test -f $PREFIX/include/cudf/ast/operators.hpp
- test -f $PREFIX/include/cudf/ast/expressions.hpp
- test -f $PREFIX/include/cudf/binaryop.hpp
- test -f $PREFIX/include/cudf/labeling/label_bins.hpp
- test -f $PREFIX/include/cudf/column/column_factories.hpp
Expand Down
1 change: 1 addition & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ add_library(cudf
src/aggregation/aggregation.cu
src/aggregation/result_cache.cpp
src/ast/expression_parser.cpp
src/ast/expressions.cpp
src/binaryop/binaryop.cpp
src/binaryop/compiled/binary_ops.cu
src/binaryop/compiled/Add.cu
Expand Down
10 changes: 5 additions & 5 deletions cpp/benchmarks/ast/transform_benchmark.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -95,22 +95,22 @@ static void BM_ast_transform(benchmark::State& state)
// Note that a std::list is required here because of its guarantees against reference invalidation
// when items are added or removed. References to items in a std::vector are not safe if the
// vector must re-allocate.
auto expressions = std::list<cudf::ast::expression>();
auto expressions = std::list<cudf::ast::operation>();

// Construct tree that chains additions like (((a + b) + c) + d)
auto const op = cudf::ast::ast_operator::ADD;
if (reuse_columns) {
expressions.push_back(cudf::ast::expression(op, column_refs.at(0), column_refs.at(0)));
expressions.push_back(cudf::ast::operation(op, column_refs.at(0), column_refs.at(0)));
for (cudf::size_type i = 0; i < tree_levels - 1; i++) {
expressions.push_back(cudf::ast::expression(op, expressions.back(), column_refs.at(0)));
expressions.push_back(cudf::ast::operation(op, expressions.back(), column_refs.at(0)));
}
} else {
expressions.push_back(cudf::ast::expression(op, column_refs.at(0), column_refs.at(1)));
expressions.push_back(cudf::ast::operation(op, column_refs.at(0), column_refs.at(1)));
std::transform(std::next(column_refs.cbegin(), 2),
column_refs.cend(),
std::back_inserter(expressions),
[&](auto const& column_ref) {
return cudf::ast::expression(op, expressions.back(), column_ref);
return cudf::ast::operation(op, expressions.back(), column_ref);
});
}

Expand Down
15 changes: 5 additions & 10 deletions cpp/benchmarks/join/conditional_join_benchmark.cu
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ class ConditionalJoin : public cudf::benchmark {
{ \
auto join = [](cudf::table_view const& left, \
cudf::table_view const& right, \
cudf::ast::expression binary_pred, \
cudf::ast::operation binary_pred, \
cudf::null_equality compare_nulls) { \
return cudf::conditional_inner_join(left, right, binary_pred, compare_nulls); \
}; \
Expand All @@ -45,7 +45,7 @@ CONDITIONAL_INNER_JOIN_BENCHMARK_DEFINE(conditional_inner_join_64bit_nulls, int6
{ \
auto join = [](cudf::table_view const& left, \
cudf::table_view const& right, \
cudf::ast::expression binary_pred, \
cudf::ast::operation binary_pred, \
cudf::null_equality compare_nulls) { \
return cudf::conditional_left_join(left, right, binary_pred, compare_nulls); \
}; \
Expand All @@ -64,7 +64,7 @@ CONDITIONAL_LEFT_JOIN_BENCHMARK_DEFINE(conditional_left_join_64bit_nulls, int64_
{ \
auto join = [](cudf::table_view const& left, \
cudf::table_view const& right, \
cudf::ast::expression binary_pred, \
cudf::ast::operation binary_pred, \
cudf::null_equality compare_nulls) { \
return cudf::conditional_inner_join(left, right, binary_pred, compare_nulls); \
}; \
Expand All @@ -83,7 +83,7 @@ CONDITIONAL_FULL_JOIN_BENCHMARK_DEFINE(conditional_full_join_64bit_nulls, int64_
{ \
auto join = [](cudf::table_view const& left, \
cudf::table_view const& right, \
cudf::ast::expression binary_pred, \
cudf::ast::operation binary_pred, \
cudf::null_equality compare_nulls) { \
return cudf::conditional_left_anti_join(left, right, binary_pred, compare_nulls); \
}; \
Expand Down Expand Up @@ -114,7 +114,7 @@ CONDITIONAL_LEFT_ANTI_JOIN_BENCHMARK_DEFINE(conditional_left_anti_join_64bit_nul
{ \
auto join = [](cudf::table_view const& left, \
cudf::table_view const& right, \
cudf::ast::expression binary_pred, \
cudf::ast::operation binary_pred, \
cudf::null_equality compare_nulls) { \
return cudf::conditional_left_semi_join(left, right, binary_pred, compare_nulls); \
}; \
Expand Down Expand Up @@ -145,11 +145,6 @@ BENCHMARK_REGISTER_F(ConditionalJoin, conditional_inner_join_32bit)
->Args({100'000, 100'000})
->Args({100'000, 400'000})
->Args({100'000, 1'000'000})
// TODO: The below benchmark is slow, but can be useful to validate that the
// code works for large data sets. This benchmark was used to compare to the
// otherwise equivalent nullable benchmark below, which has memory errors for
// sufficiently large data sets.
//->Args({1'000'000, 1'000'000})
->UseManualTime();

BENCHMARK_REGISTER_F(ConditionalJoin, conditional_inner_join_64bit)
Expand Down
3 changes: 2 additions & 1 deletion cpp/benchmarks/join/join_benchmark_common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

#include <thrust/iterator/counting_iterator.h>

#include <cudf/ast/expressions.hpp>
#include <cudf/join.hpp>
#include <cudf/table/table_view.hpp>
#include <cudf/utilities/error.hpp>
Expand Down Expand Up @@ -139,7 +140,7 @@ static void BM_join(state_type& state, Join JoinFunc)
const auto col_ref_left_0 = cudf::ast::column_reference(0);
const auto col_ref_right_0 = cudf::ast::column_reference(0, cudf::ast::table_reference::RIGHT);
auto left_zero_eq_right_zero =
cudf::ast::expression(cudf::ast::ast_operator::EQUAL, col_ref_left_0, col_ref_right_0);
cudf::ast::operation(cudf::ast::ast_operator::EQUAL, col_ref_left_0, col_ref_right_0);

for (auto _ : state) {
cuda_event_timer raii(state, true, rmm::cuda_stream_default);
Expand Down
3 changes: 1 addition & 2 deletions cpp/include/cudf/ast/detail/expression_evaluator.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,7 @@

#include <cudf/ast/detail/expression_parser.hpp>
#include <cudf/ast/detail/operators.hpp>
#include <cudf/ast/nodes.hpp>
#include <cudf/ast/operators.hpp>
#include <cudf/ast/expressions.hpp>
#include <cudf/column/column_device_view.cuh>
#include <cudf/column/column_factories.hpp>
#include <cudf/detail/utilities/assert.cuh>
Expand Down
47 changes: 25 additions & 22 deletions cpp/include/cudf/ast/detail/expression_parser.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,7 @@
*/
#pragma once

#include <cudf/ast/nodes.hpp>
#include <cudf/ast/operators.hpp>
#include <cudf/ast/expressions.hpp>
#include <cudf/scalar/scalar_device_view.cuh>
#include <cudf/table/table_view.hpp>
#include <cudf/types.hpp>
Expand Down Expand Up @@ -44,7 +43,7 @@ enum class device_data_reference_type {
};

/**
* @brief A device data reference describes a source of data used by a node.
* @brief A device data reference describes a source of data used by a expression.
*
* This is a POD class used to create references describing data type and locations for consumption
* by the `row_evaluator`.
Expand Down Expand Up @@ -115,11 +114,11 @@ struct expression_device_view {
* @brief The expression_parser traverses an expression and converts it into a form suitable for
* execution on the device.
*
* This class is part of a "visitor" pattern with the `node` class.
* This class is part of a "visitor" pattern with the `expression` class.
*
* This class does pre-processing work on the host, validating operators and operand data types. It
* traverses downward from a root node in a depth-first fashion, capturing information about
* the nodes and constructing vectors of information that are later used by the device for
* traverses downward from a root expression in a depth-first fashion, capturing information about
* the expressions and constructing vectors of information that are later used by the device for
* evaluating the abstract syntax tree as a "linear" list of operators whose input dependencies are
* resolved into intermediate data storage in shared memory.
*/
Expand All @@ -132,13 +131,17 @@ class expression_parser {
* @param left The left table used for evaluating the abstract syntax tree.
* @param right The right table used for evaluating the abstract syntax tree.
*/
expression_parser(node const& expr,
expression_parser(expression const& expr,
cudf::table_view const& left,
std::optional<std::reference_wrapper<cudf::table_view const>> right,
bool has_nulls,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
: _left{left}, _right{right}, _node_count{0}, _intermediate_counter{}, _has_nulls(has_nulls)
: _left{left},
_right{right},
_expression_count{0},
_intermediate_counter{},
_has_nulls(has_nulls)
{
expr.accept(*this);
move_to_device(stream, mr);
Expand All @@ -150,7 +153,7 @@ class expression_parser {
* @param expr The expression to create an evaluable expression_parser for.
* @param table The table used for evaluating the abstract syntax tree.
*/
expression_parser(node const& expr,
expression_parser(expression const& expr,
cudf::table_view const& table,
bool has_nulls,
rmm::cuda_stream_view stream,
Expand All @@ -167,33 +170,33 @@ class expression_parser {
cudf::data_type output_type() const;

/**
* @brief Visit a literal node.
* @brief Visit a literal expression.
*
* @param expr Literal node.
* @return cudf::size_type Index of device data reference for the node.
* @param expr Literal expression.
* @return cudf::size_type Index of device data reference for the expression.
*/
cudf::size_type visit(literal const& expr);

/**
* @brief Visit a column reference node.
* @brief Visit a column reference expression.
*
* @param expr Column reference node.
* @return cudf::size_type Index of device data reference for the node.
* @param expr Column reference expression.
* @return cudf::size_type Index of device data reference for the expression.
*/
cudf::size_type visit(column_reference const& expr);

/**
* @brief Visit an expression node.
* @brief Visit an expression expression.
*
* @param expr Expression node.
* @return cudf::size_type Index of device data reference for the node.
* @param expr Expression expression.
* @return cudf::size_type Index of device data reference for the expression.
*/
cudf::size_type visit(expression const& expr);
cudf::size_type visit(operation const& expr);

/**
* @brief Internal class used to track the utilization of intermediate storage locations.
*
* As nodes are being evaluated, they may generate "intermediate" data that is immediately
* As expressions are being evaluated, they may generate "intermediate" data that is immediately
* consumed. Rather than manifesting this data in global memory, we can store intermediates of any
* fixed width type (up to 8 bytes) by placing them in shared memory. This class helps to track
* the number and indices of intermediate data in shared memory using a give-take model. Locations
Expand Down Expand Up @@ -308,7 +311,7 @@ class expression_parser {
* @return The indices of the operands stored in the data references.
*/
std::vector<cudf::size_type> visit_operands(
std::vector<std::reference_wrapper<node const>> operands);
std::vector<std::reference_wrapper<expression const>> operands);

/**
* @brief Add a data reference to the internal list.
Expand All @@ -325,7 +328,7 @@ class expression_parser {

cudf::table_view const& _left;
std::optional<std::reference_wrapper<cudf::table_view const>> _right;
cudf::size_type _node_count;
cudf::size_type _expression_count;
intermediate_counter _intermediate_counter;
bool _has_nulls;
std::vector<detail::device_data_reference> _data_references;
Expand Down
2 changes: 1 addition & 1 deletion cpp/include/cudf/ast/detail/operators.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
*/
#pragma once

#include <cudf/ast/operators.hpp>
#include <cudf/ast/expressions.hpp>
#include <cudf/types.hpp>
#include <cudf/utilities/error.hpp>
#include <cudf/utilities/type_dispatcher.hpp>
Expand Down
Loading