-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12506][SQL]push down WHERE clause arithmetic operator to JDBC #10750
Conversation
Can one of the admins verify this patch? |
@viirya |
private def translateArithemiticOPFilter(predicate: Expression): Option[Filter] = { | ||
predicate match { | ||
case expressions.EqualTo(Add(left, right), Literal(v, t)) => | ||
Some(sources.ArithmeticOPEqualTo(Add(left, right), convertToScala(v, t))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described in SPARK-10195, it looks now data sources API exposes Catalyst's internal types through its Filter interfaces. I think this might have to be hidden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look of SPARK-10195. Looks like it deals with the issue of exposing internal data types. It uses convertToScala
to convert these internal data types to scala version. Since here convertToScala
is used to convert the values. I think it should not be the same problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.. I see. but for me it still looks expression._
might have to hide. In this way, the expression._
can be accessed in datasource level. I believe the reason way source._
is implemented is, to hide expression._
which has been changed rapidly version by version.
Since AFAIK, Parquet and ORC does not support arithmetic operators and they might anyway have to convert them in Spark-side in the future if we support this in this way. So, for this case, I think the operators might have to be resolved in I believe we might better resolve this issue by |
If we keep going to solve in |
Actually, I also suggested the way similar with this before in SPARK-9182. If we keep adding filters in this way, this could end up with converting all And this is why I said all the issues above in this Jira issue, SPARK-12506. |
@huaxingao please change the title to one not ending with … |
Yes I think using the internal expression API makes more sense. We don't want to add too many expressions to the external data source API. |
Indeed, continuing to add more filters will be a problem. If we can directly pass Catalyst expressions to JDBC datasource, that will be better. |
@viirya Yes, I think so. But the reason why I did not give a try for this is, So, maybe we should try to find a better solution for this.. |
I think most expressions (such as >, >=, <, <=, ==, string ops, arithmetic ops) which are commonly used in filters are relatively stable now. Maybe we can let JDBC datasource implement |
@viirya @HyukjinKwon @rxin |
…layer
For arithmetic operator in WHERE clause such as
select * from table where c1 + c2 > 10
Currently where c1 + c2 >10 is done at spark layer.
Will push this to JDBC layer so it will be done in database