-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(spark): bitwise functions #309
Conversation
3bf4782
to
89a79ba
Compare
@@ -29,6 +29,8 @@ import scala.collection.JavaConverters.asScalaBufferConverter | |||
private class ToSparkType | |||
extends TypeVisitor.TypeThrowsVisitor[DataType, RuntimeException]("Unknown expression type.") { | |||
|
|||
override def visit(expr: Type.I8): DataType = ByteType | |||
override def visit(expr: Type.I16): DataType = ShortType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while you're at it, mind adding these also to ToSparkType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in ToSparkType
. The conversion in ToSubstraitType
is already there further down this file. It converts both ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, all good then!
} | ||
|
||
override def visit(expr: SExpression.I16Literal): Expression = { | ||
Literal(expr.value().asInstanceOf[Short], ToSubstraitType.convert(expr.getType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here also the other direction for the conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already covered in the other direction in ToSubstraitExpression
on the following line:
case SubstraitLiteral(substraitLiteral) => Some(substraitLiteral)
The unapply
method in this object invokes the conversion.
@@ -144,6 +144,7 @@ abstract class ToSubstraitExpression extends HasOutputStack[Seq[Attribute]] { | |||
"org.apache.spark.sql.catalyst.expressions.PromotePrecision") => | |||
translateUp(p.children.head) | |||
case CaseWhen(branches, elseValue) => translateCaseWhen(branches, elseValue) | |||
case InSet(value, set) => translateIn(value, set.toSeq.map(v => Literal(v))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: mind moving this next to the case In(..)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, although InSet
needs to come before ScalarFunction
otherwise it matches the latter (since it's a UnaryExpression
). I've moved In
up the list so they are together.
Params: | ||
base – the base number to shift. | ||
shift – number of bits to right shift. | ||
impls: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like in Spark the base can be either int or long, and return type is set accordingly. I we should add both options?
Quickly testing this, it works for select shiftright(col, 2) from (values (bigint(1234)) as table(col))
but not for select shiftright(col, 2) from (values (1234) as table(col))
, so yep I think we need to list both versions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think it's fine to add the function here initially, but it'd be good to also file the PR/Issue on the core functions since this seems general enough, I think :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, although that might take a bit longer ;)
89a79ba
to
8544c81
Compare
I'll fixup the conflict :) |
8544c81
to
024b800
Compare
Adds support in the spark module for 8-bit and 16-bit integer types and for some bitwise functions. The catalyst optimizer generates expressions using these for certain query types. Note that `shift_right` (and other bit shifting functions) might want to be considered for the core substrait function catalog, but it has been added here (temporarily?) as spark extension pending a longer term discussion/decision on their wider utility. Signed-off-by: Andrew Coleman <[email protected]>
024b800
to
dda5582
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Adds support in the spark module for 8-bit and 16-bit integer types and for some bitwise functions. The catalyst optimizer generates expressions using these for certain query types.
Note that
shift_right
(and other bit shifting functions) might want to be considered for the core substrait function catalog, but it has been added here (temporarily?) as spark extension pending a longer term discussion/decision on their wider utility.