-
Notifications
You must be signed in to change notification settings - Fork 8
Troubleshooting ATK to spark tk
This page provides resolutions to issues you may encounter when switching from the Analytics Toolkit to the spark-tk library.
Q: I am seeing the error message below when running my spark-tk application, which includes a PostgresSQL DB. The application worked fine with the Analytics Toolkit. How do I get my spark-tk application to work?
Error message:
java.sql.SQLException: No suitable driver found for <jdbcUrl>
This troubleshooting tip applies to any JDBC DB connection.
The Analytics Toolkit included a driver for the PostgresSQL DB it used, so compatibility was ensured. Since spark-tk doesn't include any drivers, each JDBC connection will need its own driver.
If this error is encountered while running your application, then your JDBC library cannot be found by the node(s) running the application. Instructions are located on this site.
You need to locate and specify the `.jar' file with the compatible JDBC driver when creating the TkContext instance:
>>> tc = sparktk.TkContext(pyspark_submit_args='--jars myJDBCDriver.jar')
Q: I am using spark-tk and want to save files/export models to my local file system instead of HDFS. How do I do that?
The SparkContext created by TkContext
follows the system's current Spark configuration. If your system defaults to HDFS, but you want to use a local file system instead, include use_local_fs=True
when creating your TkContext
, as follows:
tc = sparktk.TkContext(use_local_fs=True)