Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spark): bitwise functions #309

Merged
merged 1 commit into from
Oct 25, 2024
Merged

Conversation

andrew-coleman
Copy link
Contributor

Adds support in the spark module for 8-bit and 16-bit integer types and for some bitwise functions. The catalyst optimizer generates expressions using these for certain query types.

Note that shift_right (and other bit shifting functions) might want to be considered for the core substrait function catalog, but it has been added here (temporarily?) as spark extension pending a longer term discussion/decision on their wider utility.

@@ -29,6 +29,8 @@ import scala.collection.JavaConverters.asScalaBufferConverter
private class ToSparkType
extends TypeVisitor.TypeThrowsVisitor[DataType, RuntimeException]("Unknown expression type.") {

override def visit(expr: Type.I8): DataType = ByteType
override def visit(expr: Type.I16): DataType = ShortType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while you're at it, mind adding these also to ToSparkType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in ToSparkType. The conversion in ToSubstraitType is already there further down this file. It converts both ways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, all good then!

}

override def visit(expr: SExpression.I16Literal): Expression = {
Literal(expr.value().asInstanceOf[Short], ToSubstraitType.convert(expr.getType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here also the other direction for the conversion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already covered in the other direction in ToSubstraitExpression on the following line:
case SubstraitLiteral(substraitLiteral) => Some(substraitLiteral)
The unapply method in this object invokes the conversion.

@@ -144,6 +144,7 @@ abstract class ToSubstraitExpression extends HasOutputStack[Seq[Attribute]] {
"org.apache.spark.sql.catalyst.expressions.PromotePrecision") =>
translateUp(p.children.head)
case CaseWhen(branches, elseValue) => translateCaseWhen(branches, elseValue)
case InSet(value, set) => translateIn(value, set.toSeq.map(v => Literal(v)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: mind moving this next to the case In(..)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, although InSet needs to come before ScalarFunction otherwise it matches the latter (since it's a UnaryExpression). I've moved In up the list so they are together.

Params:
base – the base number to shift.
shift – number of bits to right shift.
impls:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like in Spark the base can be either int or long, and return type is set accordingly. I we should add both options?

Quickly testing this, it works for select shiftright(col, 2) from (values (bigint(1234)) as table(col)) but not for select shiftright(col, 2) from (values (1234) as table(col)), so yep I think we need to list both versions here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think it's fine to add the function here initially, but it'd be good to also file the PR/Issue on the core functions since this seems general enough, I think :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, although that might take a bit longer ;)

@andrew-coleman
Copy link
Contributor Author

I'll fixup the conflict :)

Adds support in the spark module for 8-bit and 16-bit integer types and for some bitwise functions.
The catalyst optimizer generates expressions using these for certain query types.

Note that `shift_right` (and other bit shifting functions) might want to be considered for the
core substrait function catalog, but it has been added here (temporarily?) as spark extension
pending a longer term discussion/decision on their wider utility.

Signed-off-by: Andrew Coleman <[email protected]>
Copy link
Contributor

@Blizzara Blizzara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Blizzara Blizzara merged commit b8ccd8b into substrait-io:main Oct 25, 2024
13 checks passed
@andrew-coleman andrew-coleman deleted the bitwise branch October 25, 2024 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants