Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

various fixes and improvements to vectorization fallback #17098

Merged

Conversation

clintropolis
Copy link
Member

changes:

  • add ApplyFunction support to vectorization fallback, allowing many of the remaining expressions to be vectorized
  • add CastToObjectVectorProcessor so that vector engine can correctly cast any type
  • add support for array and complex vector constants
  • reduce number of cases which can block vectorization in expression planner to be unknown inputs (such as unknown multi-valuedness)
  • fix array constructor expression, apply map expression to make actual evaluated type match the output type inference
  • fix bug in array_contains where something like array_contains([null], 'hello') would return true if the array was a numeric array since the non-null string value would cast to a null numeric
  • fix isNull/isNotNull to correctly handle any type of input argument

changes:
* add `ApplyFunction` support to vectorization fallback, allowing many of the remaining expressions to be vectorized
* add `CastToObjectVectorProcessor` so that vector engine can correctly cast any type
* add support for array and complex vector constants
* reduce number of cases which can block vectorization in expression planner to be unknown inputs (such as unknown multi-valuedness)
* fix array constructor expression, apply map expression to make actual evaluated type match the output type inference
* fix bug in array_contains where something like array_contains([null], 'hello') would return true if the array was a numeric array since the non-null string value would cast to a null numeric
* fix isNull/isNotNull to correctly handle any type of input argument
@clintropolis clintropolis added this to the 31.0.0 milestone Sep 18, 2024
final ExprEvalVector<?> delegateOutput = delegate.evalVector(bindings);
final Object[] toCast = delegateOutput.getObjectVector();
for (int i = 0; i < bindings.getCurrentVectorSize(); i++) {
ExprEval<?> cast = ExprEval.ofType(delegateType, toCast[i]).castTo(outputType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC from work on #16994, ExprEval.ofType(inType, o).castTo(outType) and ExprEval.ofType(outType, o) have subtly different behavior and I ended up preferring the latter. I think part of the reason was that the former could throw exceptions in cases where the cast was invalid, and the latter was more "forgiving" (like, returning null instead of throwing an exception). It happened that forgiving is what I wanted in that particular patch.

Here, what's the right thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the vectorized cast operator for objects, so it should totally use castTo to be consistent with non-vectorized cast

}

@Override
public ExprEvalVector<Object[]> evalVector(Expr.VectorInputBinding bindings)
{
ExprEvalVector<?> result = delegate.evalVector(bindings);
final Object[] objects = result.getObjectVector();
final Object[] output = new String[objects.length];
for (int i = 0; i < objects.length; i++) {
for (int i = 0; i < bindings.getCurrentVectorSize(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this a bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, previously it would just cast the whole vector size instead of only what was needed

final ExprEvalObjectVector eval = new ExprEvalObjectVector(strings, ExpressionType.STRING);
final Object[] objects = new Object[maxVectorSize];
if (type.isNumeric()) {
return constant((Long) null, maxVectorSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this use null instead of looking at what constant is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, idk why this is here, i think this is from a bad merge between some stashes with experiments i had going on. It should just be removed, and wasn't failing any tests because numeric types use their dedicated constant functions

@@ -85,11 +85,14 @@ public static <T> ExprVectorProcessor<T> makeSymmetricalProcessor(
*
* @see org.apache.druid.math.expr.ConstantExpr
*/
public static <T> ExprVectorProcessor<T> constant(@Nullable String constant, int maxVectorSize)
public static <T> ExprVectorProcessor<T> constant(@Nullable Object constant, int maxVectorSize, ExpressionType type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did this change to Object? what sorts of non-String objects might be passed in? could you please update the javadoc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i needed a way to make vector constants for basically anything not a numeric primitive, so this method was updated to make all of the object constants, will update javadoc

final Object[] strings = new Object[maxVectorSize];
Arrays.fill(strings, constant);
final ExprEvalObjectVector eval = new ExprEvalObjectVector(strings, ExpressionType.STRING);
final Object[] objects = new Object[maxVectorSize];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are creating this unnecessarily if type is numeric.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numeric types shouldn't ever be calling this, added a defensive check

final ExprEvalObjectVector eval = new ExprEvalObjectVector(strings, ExpressionType.STRING);
final Object[] objects = new Object[maxVectorSize];
if (type.isNumeric()) {
return constant((Long) null, maxVectorSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why null though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clintropolis clintropolis merged commit 77a362c into apache:master Sep 24, 2024
90 checks passed
@clintropolis clintropolis deleted the various-expression-vector-fixes branch September 24, 2024 11:29
clintropolis added a commit to clintropolis/druid that referenced this pull request Sep 24, 2024
changes:
* add `ApplyFunction` support to vectorization fallback, allowing many of the remaining expressions to be vectorized
* add `CastToObjectVectorProcessor` so that vector engine can correctly cast any type
* add support for array and complex vector constants
* reduce number of cases which can block vectorization in expression planner to be unknown inputs (such as unknown multi-valuedness)
* fix array constructor expression, apply map expression to make actual evaluated type match the output type inference
* fix bug in array_contains where something like array_contains([null], 'hello') would return true if the array was a numeric array since the non-null string value would cast to a null numeric
* fix isNull/isNotNull to correctly handle any type of input argument
clintropolis added a commit that referenced this pull request Sep 24, 2024
…7142)

changes:
* add `ApplyFunction` support to vectorization fallback, allowing many of the remaining expressions to be vectorized
* add `CastToObjectVectorProcessor` so that vector engine can correctly cast any type
* add support for array and complex vector constants
* reduce number of cases which can block vectorization in expression planner to be unknown inputs (such as unknown multi-valuedness)
* fix array constructor expression, apply map expression to make actual evaluated type match the output type inference
* fix bug in array_contains where something like array_contains([null], 'hello') would return true if the array was a numeric array since the non-null string value would cast to a null numeric
* fix isNull/isNotNull to correctly handle any type of input argument
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants