Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use object[] instead of string[] for vector expressions to be consistent with vector object selectors #13209

Merged

Conversation

clintropolis
Copy link
Member

Description

Changes vectorized string expressions (such as concat) to deal in Object[] instead of String[], to be consistent with the behavior of VectorObjectSelector, as well as with a similar change I did recently to array expressions in #12914.

This fixes a class of bugs such as in the added test example, which would fail with an error of the form:

    Caused by: java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Ljava.lang.String;
	at org.apache.druid.math.expr.IdentifierExpr$4.evalVector(IdentifierExpr.java:198) ~[classes/:?]
	at org.apache.druid.math.expr.vector.StringOutMultiStringInVectorProcessor.evalVector(StringOutMultiStringInVectorProcessor.java:58) ~[classes/:?]
	at org.apache.druid.segment.virtual.ExpressionVectorObjectSelector.getObjectVector(ExpressionVectorObjectSelector.java:49) ~[classes/:?]

before the changes in this patch.


This PR has:
  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

Copy link
Contributor

@cheddar cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks fine. I left a few comments where I think we could maybe do things to be a bit more defensive. In the worst case, though, these changes won't break new things, they just might not fix all of the things that they wanted to (i.e. in some cases, the exception might've just moved to be whack-a-moled a different day).

@@ -118,7 +119,7 @@ public int getCurrentVectorSize()
*/
private static final class StringLookupVectorInputBindings implements Expr.VectorInputBinding
{
private final String[] currentValue = new String[1];
private final Object[] currentValue = new Object[1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitish: The naming of this private static class doesn't really line up with its implementation anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is still used exclusively for 'deferred' single string expressions, so maybe is ok?

{
// in sql compatible mode, nulls are handled by super class and never make it here...
return leftVal + rightVal;
return leftVal + (String) rightVal;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could call .toString() and avoid ClassCastExceptions here? Or, if it's important to do the cast, we should really do an instanceof check first so that we can generate a better error message than just ClassCastException

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as #13209 (comment) about why is safe

output[i] = Evals.asLong(bool);
outputNulls[i] = bool;
return;
} else if (rightNull) {
final boolean bool = Evals.asBoolean(leftInput[i]);
final boolean bool = Evals.asBoolean((String) leftInput[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, why is it safe to cast? do we need to check the type first? Or maybe check the type of the array that we are gotten, so we can amortize the cost of the check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this safe because (for better or for worse) the base classes explicitly wrap their input processors with CastToTypeVectorProcessor.cast (which is a no-op if the processor is already the correct type) to ensure that it is being used with the correct type. I don't remember exactly why I did this, presumably because its important for vector processing that everything is the right type so we don't get ugly class cast exceptions, but I can look into if this is actually necessary or not.

@clintropolis clintropolis merged commit 59e2afc into apache:master Oct 12, 2022
@clintropolis clintropolis deleted the better-vector-expr-string-processing branch October 12, 2022 09:53
clintropolis added a commit to clintropolis/druid that referenced this pull request Oct 28, 2022
…ent with vector object selectors (apache#13209)

* use object[] instead of string[] for vector expressions to be consistent with vector object selectors

* simplify
@kfaraz kfaraz added this to the 24.0.1 milestone Nov 1, 2022
kfaraz pushed a commit that referenced this pull request Nov 1, 2022
* use object[] instead of string[] for vector expressions to be consistent with vector object selectors (#13209)

* fix issue with nested column null value index incorrectly matching non-null values (#13211)

* fix json_value sql planning with decimal type, fix vectorized expression math null value handling in default mode (#13214)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants