[SPARK-12506][SQL]push down WHERE clause arithmetic operator to JDBC #10750

huaxingao · 2016-01-14T01:39:57Z

…layer

For arithmetic operator in WHERE clause such as
select * from table where c1 + c2 > 10
Currently where c1 + c2 >10 is done at spark layer.
Will push this to JDBC layer so it will be done in database

…layer

AmplabJenkins · 2016-01-14T01:42:13Z

Can one of the admins verify this patch?

huaxingao · 2016-01-14T01:45:09Z

@viirya
I changed the code based on your suggestion. Could you please review again?
Thanks a lot for your help!!

HyukjinKwon · 2016-01-14T02:04:32Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala

+  private def translateArithemiticOPFilter(predicate: Expression): Option[Filter] = {
+    predicate match {
+      case expressions.EqualTo(Add(left, right), Literal(v, t)) =>
+        Some(sources.ArithmeticOPEqualTo(Add(left, right), convertToScala(v, t)))


As described in SPARK-10195, it looks now data sources API exposes Catalyst's internal types through its Filter interfaces. I think this might have to be hidden.

I took a look of SPARK-10195. Looks like it deals with the issue of exposing internal data types. It uses convertToScala to convert these internal data types to scala version. Since here convertToScala is used to convert the values. I think it should not be the same problem.

Hm.. I see. but for me it still looks expression._ might have to hide. In this way, the expression._ can be accessed in datasource level. I believe the reason way source._ is implemented is, to hide expression._ which has been changed rapidly version by version.

HyukjinKwon · 2016-01-14T02:42:00Z

Since source.Filter is shared with Parquet, ORC and etc., I think this might have to resolve the arithmetic operators in DataSourceStrategy itself.

AFAIK, Parquet and ORC does not support arithmetic operators and they might anyway have to convert them in Spark-side in the future if we support this in this way. So, for this case, I think the operators might have to be resolved in DataSourceStrategy.

I believe we might better resolve this issue by ~~modifying~~ implementing CatalystScan as suggested by @liancheng in SPARK-9182 and filed here in SPARK-12126

HyukjinKwon · 2016-01-14T02:44:01Z

If we keep going to solve in DataSourceStrategy in this way, I think we should resolve the operators for other datasources in DataSourceStrategy. For this, dealing with Cast SPARK-9182 might have to be done first.

HyukjinKwon · 2016-01-14T02:52:51Z

Actually, I also suggested the way similar with this before in SPARK-9182. If we keep adding filters in this way, this could end up with converting all expressions.Filter tosources.Filter. This might mean we need to write a new expression library, which might not worth the effort.

And this is why I said all the issues above in this Jira issue, SPARK-12506.

HyukjinKwon · 2016-01-14T03:16:48Z

@huaxingao please change the title to one not ending with …

rxin · 2016-01-14T03:29:41Z

Yes I think using the internal expression API makes more sense. We don't want to add too many expressions to the external data source API.

viirya · 2016-01-14T04:27:22Z

Indeed, continuing to add more filters will be a problem. If we can directly pass Catalyst expressions to JDBC datasource, that will be better.

HyukjinKwon · 2016-01-14T04:35:49Z

@viirya Yes, I think so. But the reason why I did not give a try for this is, expression._ is being rapidly changed, which could affect codes at the datasource implemented by CatalystScan version by version. I believe that this is also why Parquet datasource is changed from its implementation of CatalystScan to another.

So, maybe we should try to find a better solution for this..

viirya · 2016-01-27T05:07:41Z

I think most expressions (such as >, >=, <, <=, ==, string ops, arithmetic ops) which are commonly used in filters are relatively stable now. Maybe we can let JDBC datasource implement CatalystScan and process these expressions.

huaxingao · 2016-02-01T18:43:15Z

@viirya @HyukjinKwon @rxin
Thank you all very much for your comments. I will change JDBCRelation to implement CatalystScan, and then directly access Catalyst expressions in JDBCRDD. I will close this PR and submit a new one.

[SPARK-12506][SQL]push down WHERE clause arithmetic operator to JDBC …

944bee6

…layer

HyukjinKwon reviewed Jan 14, 2016
View reviewed changes

huaxingao changed the title ~~[SPARK-12506][SQL]push down WHERE clause arithmetic operator to JDBC …~~ [SPARK-12506][SQL]push down WHERE clause arithmetic operator to JDBC Jan 14, 2016

huaxingao closed this Feb 1, 2016

HyukjinKwon mentioned this pull request Feb 2, 2016

[SPARK-12506][SPARK-12126][SQL]use CatalystScan for JDBCRelation #11005

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12506][SQL]push down WHERE clause arithmetic operator to JDBC #10750

[SPARK-12506][SQL]push down WHERE clause arithmetic operator to JDBC #10750

huaxingao commented Jan 14, 2016

AmplabJenkins commented Jan 14, 2016

huaxingao commented Jan 14, 2016

HyukjinKwon Jan 14, 2016

viirya Jan 14, 2016

HyukjinKwon Jan 14, 2016

HyukjinKwon commented Jan 14, 2016

HyukjinKwon commented Jan 14, 2016

HyukjinKwon commented Jan 14, 2016

HyukjinKwon commented Jan 14, 2016

rxin commented Jan 14, 2016

viirya commented Jan 14, 2016

HyukjinKwon commented Jan 14, 2016

viirya commented Jan 27, 2016

huaxingao commented Feb 1, 2016

[SPARK-12506][SQL]push down WHERE clause arithmetic operator to JDBC #10750

[SPARK-12506][SQL]push down WHERE clause arithmetic operator to JDBC #10750

Conversation

huaxingao commented Jan 14, 2016

AmplabJenkins commented Jan 14, 2016

huaxingao commented Jan 14, 2016

HyukjinKwon Jan 14, 2016

Choose a reason for hiding this comment

viirya Jan 14, 2016

Choose a reason for hiding this comment

HyukjinKwon Jan 14, 2016

Choose a reason for hiding this comment

HyukjinKwon commented Jan 14, 2016

HyukjinKwon commented Jan 14, 2016

HyukjinKwon commented Jan 14, 2016

HyukjinKwon commented Jan 14, 2016

rxin commented Jan 14, 2016

viirya commented Jan 14, 2016

HyukjinKwon commented Jan 14, 2016

viirya commented Jan 27, 2016

huaxingao commented Feb 1, 2016