-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-15582: [C++] Add support for registering tricky functions with the Substrait consumer (or add a bunch of substrait meta functions) #13285
base: main
Are you sure you want to change the base?
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make a pass at simplifying some things. A lot of these lambdas seem to follow a consistent pattern. This is a good start however! Excited to see it.
Status FunctionMapping::AddArrowToSubstrait(std::string arrow_function_name, ArrowToSubstrait conversion_func){ | ||
if (arrow_to_substrait.find(arrow_function_name) != arrow_to_substrait.end()){ | ||
arrow_to_substrait[arrow_function_name] = conversion_func; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return an invalid status as an else clause here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning an AlreadyExist status.
Status FunctionMapping::AddSubstraitToArrow(std::string substrait_function_name, SubstraitToArrow conversion_func){ | ||
if (substrait_to_arrow.find(substrait_function_name) != substrait_to_arrow.end()){ | ||
substrait_to_arrow[substrait_function_name] = conversion_func; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, perhaps the else should be an invalid status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning a AlreadyExist status here, will that be better?
a56a556
to
b28e5b1
Compare
41fc8fe
to
7e27a7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions from a quick scan.
return compute::call(func_name, std::move(arguments), std::move(cast_options)); | ||
} | ||
case substrait::Expression::kEnum: { | ||
auto enum_expr = expr.enum_(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this convert to the string value of the enum? Can you add a small comment here explaining that.
@@ -204,6 +209,521 @@ const int* GetIndex(const KeyToIndex& key_to_index, const Key& key) { | |||
return &it->second; | |||
} | |||
|
|||
Status FunctionMapping::AddArrowToSubstrait(std::string arrow_function_name, ArrowToSubstrait conversion_func){ | |||
if (arrow_to_substrait.find(arrow_function_name) != arrow_to_substrait.end()){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic seems backwards to me...wouldn't umap.find(...) != umap.end()
mean the item already existed?
} | ||
|
||
Status FunctionMapping::AddSubstraitToArrow(std::string substrait_function_name, SubstraitToArrow conversion_func){ | ||
if (substrait_to_arrow.find(substrait_function_name) != substrait_to_arrow.end()){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. This seems backwards (but maybe I'm just not thinking right)
} | ||
} | ||
|
||
std::vector<arrow::compute::Expression> substrait_convert_arguments(const substrait::Expression::ScalarFunction& call){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector<arrow::compute::Expression> substrait_convert_arguments(const substrait::Expression::ScalarFunction& call){ | |
std::vector<arrow::compute::Expression> ConvertSubstraitArguments(const substrait::Expression::ScalarFunction& call){ |
|
||
std::vector<arrow::compute::Expression> substrait_convert_arguments(const substrait::Expression::ScalarFunction& call){ | ||
substrait::Expression value; | ||
ExtensionSet ext_set_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExtensionSet ext_set_; | |
ExtensionSet ext_set; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems strange. Wouldn't this function take in an extension set as an argument?
|
||
substrait::Expression::ScalarFunction arrow_convert_enum_arguments(const arrow::compute::Expression::Call& call, substrait::Expression::ScalarFunction& substrait_call, ExtensionSet* ext_set_, std::string overflow_handling){ | ||
substrait::Expression::Enum options; | ||
options.set_specified(overflow_handling); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overflow_handling seems like an odd name given this is a generic function
return arrow::compute::call("abs", substrait_convert_arguments(call)); | ||
}; | ||
|
||
ArrowToSubstrait arrow_add_to_substrait = [] (const arrow::compute::Expression::Call& call, ExtensionSet* ext_set_) -> Result<substrait::Expression::ScalarFunction> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of places where you have ext_set_
and it should probably be ext_set
. For the sake of brevity I'm not going to mark them all.
substrait::Expression::ScalarFunction substrait_call; | ||
ARROW_ASSIGN_OR_RAISE(auto function_reference, ext_set_->EncodeFunction("extract")); | ||
substrait_call.set_function_reference(function_reference); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these calls to EncodeFunction
seem pretty repetitive. Is there any way we can move this into the part that calls GetArrowToSubstrait
? Also, I don't see anything today that calls GetArrowToSubstrait
} | ||
}; | ||
|
||
ArrowToSubstrait arrow_year_to_arrow = [] (const arrow::compute::Expression::Call& call, ExtensionSet* ext_set_) -> Result<substrait::Expression::ScalarFunction> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrow_...to_arrow
?
DCHECK_OK(functions_map.AddSubstraitToArrow(id.name.to_string(), conversion_func)); | ||
return RegisterFunction(id, id.name.to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a little odd that we need two maps. What happens if two functions exist with the same name but different URIs? Thinking on this longer, maybe substrait_to_arrow
should replace the map in the extension id registry (that gets updated by the call to RegisterFunction
?)
…ctions (#13613) This picks up where #13285 has left off. It mostly focuses on the Substrait->Arrow direction at the moment. In addition, basic support is added for named tables. This makes it possible to create unit tests that read from in-memory tables instead of requiring unit tests to do a scan. The PR creates some utilities in `test_plan_builder.h` which allow for the construction of simple Substrait plans programmatically. This is used to create unit tests for the function mapping. The PR extracts id "ownership" out of the `ExtensionIdRegistry` and into its own `IdStorage` class. The PR gets rid of `NestedExtensionIdRegistryImpl` and instead makes `ExtensionIdRegistryImpl` nested if `parent_ != nullptr`. Authored-by: Weston Pace <[email protected]> Signed-off-by: Weston Pace <[email protected]>
[WIP]
This PR adds function mappings for compute functions from substrait to arrow and vice-versa. This introduces a
FunctionMapping
class to register and store the mappings and supply when required. Registering a function includes encoding the various options and arguments in the respective mapping function's definition.