-
Notifications
You must be signed in to change notification settings - Fork 8
Troubleshooting ATK to spark tk
This page covers issues/resolutions you may encounter when switching from the Analytics Toolkit to the spark-tk library.
Q: I am seeing the error message below when running my spark-tk application, which includes a PostgresSQL DB. The application worked fine with the Analytics Toolkit. How do I get my spark-tk application to work?
Error message:
java.sql.SQLException: No suitable driver found for <jdbcUrl>
Summary: The Analytics Toolkit included a driver for the PostgresSQL DB it used, so compatibility was ensured. Since spark-tk doesn't include any drivers, each JDBC connection will need its own driver.
Details: If this error is encountered while running your application, then your JDBC library cannot be found by the node running the application. If you're running in Local mode, make sure that you have used the --driver-class-path
parameter. If a Spark cluster is involved, make sure that each cluster member has a copy of the JDBC library, and that
each node of the cluster has been restarted since you modified the spark-defaults.conf
file. Details on this site.
Then locate and specify the `.jar' file with a driver compatible with the JDBC data sink when creating the TkContext instance:
>>> tc = sparktk.TkContext(pyspark_submit_args='--jars myJDBCDriver.jar')