ODD Spark Adapter is a Spark Listener designed to send metadata and dependencies of Spark 3.3.1 jobs to platforms that are based on OpenDataDiscovery specification.
To learn more about OpenDataDiscovery and ODD Platform, please refer to project's landing and documentation pages.
ODD Spark adapter v0.0.1 supports:
- RDD low level jobs
- Read/write from/to JDBC data sources
- Read/write from/to Kafka topics (batch only)
- Read/write from/to Snowflake tables
- Read/write from/to S3 Delta tables
- As of now ODD Spark adapter doesn't support Spark structured streaming (in roadmap)
- As of now ODD Spark adapter doesn't support Spark structured streaming (in roadmap)
- As of now ODD Spark adapter supports Spark 3.3.1 only (in roadmap)
ODD Spark Adapter is essentially a simple Spark Listener designed to gather metadata and inputs/outputs of Spark jobs and send it to the ODD Platform or any ODD based backend.
Available JAR files can be found in Releases
spark.odd.host.url
— URL of ODD Platform deploymentspark.odd.oddrn.key
— Unique identifier of Spark cluster. Can be any string that uniquely defines target Spark cluster in the scope of user's data infrastructure.
./spark-submit \
--packages <needed packages for the Spark jobs> \
--jars <path to the ODD Spark adapter JAR> \
--conf "spark.odd.host.url=http://odd-platform:8080" \
--conf "spark.odd.oddrn.key=unique_spark_cluster_key" \
/jobs/simple-delta-lake.py