Skip to content

Troubleshooting ATK to spark tk

Daniel Smith edited this page Nov 30, 2016 · 8 revisions

Troubleshooting: ATK to spark-tk

This page covers issues/resolutions you may encounter when switching from the Analytics Toolkit to the spark-tk library.

Q: I am seeing the error message below when running my spark-tk application, which includes a PostgresSQL DB. The application worked fine with the Analytics Toolkit. How do I get my spark-tk application to work?

Error message:

java.sql.SQLException: No suitable driver found for <jdbcUrl>  

Summary: The Analytics Toolkit included a driver for the PostgresSQL DB it used, so compatibility was ensured. Since spark-tk doesn't include any drivers, each JDBC connection will need its own driver.

Details: If this error is encountered while running your application, then your JDBC library cannot be found by the node running the application. If you're running in Local mode, make sure that you have used the --driver-class-path parameter. If a Spark cluster is involved, make sure that each cluster member has a copy of the JDBC library, and that each node of the cluster has been restarted since you modified the spark-defaults.conf file. Details on this site.

Then locate and specify the `.jar' file with a driver compatible with the JDBC data sink when creating the TkContext instance:

>>> tc = sparktk.TkContext(pyspark_submit_args='--jars myJDBCDriver.jar')
Clone this wiki locally