-
Make sure the Spline Producer instance is running (see instructions)
-
Download the Spline source code from GitHub and switch to the
examples
directorygit clone [email protected]:AbsaOSS/spline-spark-agent.git cd spline-spark-agent mvn install -DskipTests cd examples
-
Execute
pyspark
with a Spline Spark Agent Bundle corresponding to the Spark and Scala versions in use:pyspark \ --packages za.co.absa.spline.agent.spark:spark-3.1-spline-agent-bundle_2.12:0.6.1 \ --conf spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener \ --conf spark.spline.producer.url=http://localhost:8080/producer
In this example we used a so-called codeless initialization method, e.i. the one that requires no changes in your Spark application code.
Alternatively you can enable Spline manually by calling the
SparkLineageInitializer.enableLineageTracking()
method. See python_example.py -
Execute your PySpark code as normal.
Same as pyspark
example above, but use spark-shell
command instead.
# to build it with the default Spark version, simply run
mvn install
# or, if you want to specify a concrete Spark version from the 3.x series (i.e. `3.0`, `3.1` etc.)
mvn install -Pspark-3.1
# or, if you want to specify a concrete Spark version from the 2.x series (`2.2`, `2.3`, `2.4` only)
# 1. switch the project to Scala 2.11 mode
mvn scala-cross-build:change-version -Pscala-2.11
# 2. then run Maven build with the `-Pspark-xxx` profile as above
mvn install -Pspark-2.4
# Execute all available examples
./run.sh --all -Dspline.producer.url=http://localhost:8080/producer
# Execute individual example class
./run.sh -Dspline.producer.url=http://localhost:8080/producer za.co.absa.spline.example.XXX
To add JVM options
./run.sh -jvm_opt1=xxx -jvm_opt2=yyy ... class.to.run.ClassName
# or, if you run all examples, the '--all' argument should go first.
./run.sh --all -jvm_opt1=xxx -jvm_opt2=yyy ...
- Scala
- Java
- Python
- Shell script - custom, non-Spark example, using REST API
Recommended docker settings: cpu=2
, memory=4096M
docker run --rm -e "SPLINE_PRODUCER_URL=http://localhost:8080/producer" absaoss/spline-spark-agent
Available environment variables:
Variable name | Description |
---|---|
SPLINE_PRODUCER_URL | Spline Producer REST API endpoint URL |
SPLINE_MODE | (see Spline mode) |
DISABLE_SSL_VALIDATION | If true , disables validation of the server SSL certificate in the HttpLineageDispatcher |
HTTP_PROXY_HOST | (see Java Networking and Proxies) |
HTTP_PROXY_PORT | (see Java Networking and Proxies) |
HTTP_NON_PROXY_HOSTS | (see Java Networking and Proxies) |
(The default values can be seen in the respective Dockerfile)
Copyright 2019 ABSA Group Limited
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.