[SPARK-9078] [SQL] Allow jdbc dialects to override the query used to check the table. #8676

sureshthalamati · 2015-09-10T02:14:47Z

Current implementation uses query with a LIMIT clause to find if table already exists. This syntax works only in some database systems. This patch changes the default query to the one that is likely to work on most databases, and adds a new method to the JdbcDialect abstract class to allow dialects to override the default query.

I looked at using the JDBC meta data calls, it turns out there is no common way to find the current schema, catalog..etc. There is a new method Connection.getSchema() , but that is available only starting jdk1.7 , and existing jdbc drivers may not have implemented it. Other option was to use jdbc escape syntax clause for LIMIT, not sure on how well this supported in all the databases also. After looking at all the jdbc metadata options my conclusion was most common way is to use the simple select query with 'where 1 =0' , and allow dialects to customize as needed

…ialect implementations to specify the query.

rxin · 2015-09-10T02:16:51Z

FWIW, we dropped JVM 1.6 support in Spark 1.5. Would that make this easier?

sureshthalamati · 2015-09-10T23:28:19Z

@rxin

Even if spark is running on jdk1.7, customers using older version of drivers will run into AbstractMethodError exception. I think adding requirement for customers to use new drivers that implement getSchema() function will be unnecessary.

After implementing the current approach I got curious on how the jdbc read functionality finds the meta data and learned org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.resolveTable also uses s"SELECT * FROM $table WHERE 1=0" to get column information.

Alternative approach is to add getMetadataQuery(table:string) to the JdbcDialect interface that helps to determine if table exists for write case , and column type information in the case of read instead of getTableExistsQuery() as implemented in the current pull request. It might be a milli second slower in the case of write call for dialects that specify “select 1 from $table limit 1", instead of “select * from $table limit 1”. Advantage is one method to the interface will address both the cases.

Any comments ?

rxin · 2015-09-10T23:31:35Z

sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala

+   * @return The SQL query to use for checking the table.
+   */
+  def getTableExistsQuery(table: String): String = {
+    s"SELECT * FROM $table WHERE 1=0"


maybe we should quote the table here actually

actually never mind we cannot quote it.

@rxin What's the specific reason table name cannot be quoted? We happen to have a table with dots and parenthesis in its name, planning to add surrounding backticks before passing it to Spark.

sureshthalamati · 2015-09-11T17:48:08Z

next() will return false because resultset will be empty when query is where 1!=0. executeQuery() will throw an exception if table is not found. next() call is not really required to find if the table exists or not.

sureshthalamati · 2015-09-11T17:49:50Z

Typo in my previous comment, I meant when query is where 1=0.

sureshthalamati · 2015-09-15T19:58:42Z

@rxin Thank you for reviewing the patch . Just to make sure tested with out the next() call on MySql, Postgres, and DB2, it worked fine. Updated the pull request.

rxin · 2015-09-15T20:01:06Z

Thanks. Merging this in master.

rxin · 2015-09-15T20:01:22Z

(Oops spoke too soon - I will merge after tests pass)

SparkQA · 2015-09-15T20:23:23Z

Test build #1760 has finished for PR 8676 at commit ee7b842.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2015-09-15T21:03:13Z

retest this please

SparkQA · 2015-09-15T23:15:04Z

Test build #42504 has finished for PR 8676 at commit ee7b842.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2015-09-16T02:35:06Z

@vanzin do you know what's going on with the tests?

[error] Execution of test test.org.apache.spark.sql.JavaApplySchemaSuite failed: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.ExtendedYarnTest

rxin · 2015-09-16T02:35:56Z

I've merged this.

vanzin · 2015-09-16T02:35:59Z

@rxin I reverted the patch that caused those.

yhuai · 2015-09-16T02:42:45Z

It has been merged to master.

Modifying query to check table exists to be more generic, and allow d…

d478754

…ialect implementations to specify the query.

rxin reviewed Sep 10, 2015
View reviewed changes

Removing next() call that is not required to find if table exists

ee7b842

asfgit closed this in 64c29af Sep 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-9078] [SQL] Allow jdbc dialects to override the query used to check the table. #8676

[SPARK-9078] [SQL] Allow jdbc dialects to override the query used to check the table. #8676

sureshthalamati commented Sep 10, 2015

rxin commented Sep 10, 2015

sureshthalamati commented Sep 10, 2015

rxin Sep 10, 2015

rxin Sep 11, 2015

toddleo Sep 7, 2017 •

edited

Loading

sureshthalamati commented Sep 11, 2015

sureshthalamati commented Sep 11, 2015

sureshthalamati commented Sep 15, 2015

rxin commented Sep 15, 2015

rxin commented Sep 15, 2015

SparkQA commented Sep 15, 2015

vanzin commented Sep 15, 2015

SparkQA commented Sep 15, 2015

rxin commented Sep 16, 2015

rxin commented Sep 16, 2015

vanzin commented Sep 16, 2015

yhuai commented Sep 16, 2015

[SPARK-9078] [SQL] Allow jdbc dialects to override the query used to check the table. #8676

[SPARK-9078] [SQL] Allow jdbc dialects to override the query used to check the table. #8676

Conversation

sureshthalamati commented Sep 10, 2015

rxin commented Sep 10, 2015

sureshthalamati commented Sep 10, 2015

rxin Sep 10, 2015

Choose a reason for hiding this comment

rxin Sep 11, 2015

Choose a reason for hiding this comment

toddleo Sep 7, 2017 • edited Loading

Choose a reason for hiding this comment

sureshthalamati commented Sep 11, 2015

sureshthalamati commented Sep 11, 2015

sureshthalamati commented Sep 15, 2015

rxin commented Sep 15, 2015

rxin commented Sep 15, 2015

SparkQA commented Sep 15, 2015

vanzin commented Sep 15, 2015

SparkQA commented Sep 15, 2015

rxin commented Sep 16, 2015

rxin commented Sep 16, 2015

vanzin commented Sep 16, 2015

yhuai commented Sep 16, 2015

toddleo Sep 7, 2017 •

edited

Loading