Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Return unsupported expression in partition pruning in remaining predicate #2491

Closed
wants to merge 1 commit into from

Conversation

vkorukanti
Copy link
Collaborator

Description

Currently the DefaultExpressionHandler only supports a few expressions. If a user passes an unsupported expression, Kernel partition pruning fails with an unsupported operation exception. Instead, this PR changes it to return the unsupported part of the expression in the remaining filter of Scan. It makes use of ExpressionHandler.getPredicateEvaluator to decide whether an expression is supported or not.

How was this patch tested?

Unit tests and integration tests.

try {
exprHandler.getPredicateEvaluator(tableSchema, predicate);
return false;
} catch (UnsupportedOperationException ex) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we should throw a defined exception like UnsupportedExpressionException?

Copy link
Contributor

@tdas tdas Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets leave it as is for now. I think its better to think through all different exceptions and take a call if you want to make custom exceptions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this is a bad pattern, to use exceptions for control flow. I think we should consider defining the method in ExpressionHandler which validates whether an expression is supported or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, also worried about the cost of creating an evaluator in some engines. Is this API ok?

    /**
     * Is the given expression evaluation supported on the data with given schema?
     *
     * @param inputSchema Schema of input data on which the expression is evaluated.
     * @param expression Expression to evaluate.
     * @param outputType Expected result data type.
     * @return true if supported and false otherwise.
     */
    boolean isSupported(StructType inputSchema, Expression expression, DataType outputType);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make that a separate PR. I think we need to think about whether we want the schema as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can make a separate PR. I think we need it because the connector may support an expression but it may only support it on a selected input data types.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdas here is the PR: #2492

new Predicate(name, children.asJava)
}
private def and(left: Predicate, right: Predicate): Predicate = predicate("AND", left, right)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these will be removed once #2492 is merged

Comment on lines 71 to 72
// Subset of the given predicate that Kernel can't guarantee that it can complete satisfy
// It could be predicate on the data columns and/or unsupported predicate on partition columns
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions

  1. What does satisfy mean? Can we be more specific and clear?
  2. Why can't we handle a predicate on the data columns? We have column stats and data skipping?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment saying how this is all in relation to what we can evaluate just using input X (scan file path or whatever) and without reading data

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a best-effort predicate. Updated the comment.

Comment on lines +72 to +85
public static Predicate combineWithAndOp(Predicate left, Predicate right) {
String leftName = left.getName().toUpperCase();
String rightName = right.getName().toUpperCase();
if (leftName.equals("ALWAYS_FALSE") || rightName.equals("ALWAYS_FALSE")) {
return ALWAYS_FALSE;
}
if (leftName.equals("ALWAYS_TRUE")) {
return right;
}
if (rightName.equals("ALWAYS_TRUE")) {
return left;
}
return new And(left, right);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something feels off here. Why are we doing string comparisons? Isn't everything strongly typed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our expressions name (=, like etc) based , we don't enforce any restrictions on the expression name to allow the connector to pass through Kernel any arbitrary expressions which its own expression handler can support

* Utility method to check whether the given predicate has any expressions that are not
* supported by the given expression handler.
*/
private static boolean hasUnsupportedExpr(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about this method is specific to PartitionUtils and makes it not be a part of ExpressionUtils?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removed as part of the PR #2492 on which this PR is based on.

@vkorukanti vkorukanti force-pushed the fixPruningExpr branch 2 times, most recently from 2b24241 to 33c45d2 Compare March 8, 2024 17:03
if (areFiltersSplit) {
return;
}
filter.map(predicate -> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: filter.ifPresent, that's in JDK 8 right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants