This demo is composed of 3 parts:
WriteDemo
: reads the input json files as Spark Dataframes, applies conversions to map the data to Spark data types and writes the records into ArangoDB collectionsReadDemo
: reads the ArangoDB collections created above as Spark Dataframes, specifying columns selection and records filters predicates or custom AQL queriesReadWriteDemo
: reads the ArangoDB collections created above as Spark Dataframes, applies projections and filtering, writes to a new ArangoDB collection
There are demos available written in Scala & Python (using PySpark) as outlined below.
This demo requires:
- JDK 8, 11 or 17
maven
docker
For the python demo, you will also need
python
Set environment variables:
export ARANGO_SPARK_VERSION=1.8.0
Start ArangoDB cluster with docker:
SSL=true STARTER_MODE=cluster ./docker/start_db.sh
The deployed cluster will be accessible at https://172.28.0.1:8529 with username root
and
password test
.
Start Spark cluster:
./docker/start_spark.sh
NB: this is only needed for SNAPSHOT versions.
mvn -f ../pom.xml install -Dmaven.test.skip=true -Dgpg.skip=true -Dmaven.javadoc.skip=true -Pscala-2.12 -Pspark-3.5
Test the Spark application in embedded mode:
mvn \
-Pscala-2.12 -Pspark-3.5 \
test
Test the Spark application against ArangoDB Oasis deployment:
mvn \
-Pscala-2.12 -Pspark-3.5 \
-Dpassword=<root-password> \
-Dendpoints=<endpoint> \
-Dssl.cert.value=<base64-encoded-cert> \
test
Package the application:
mvn package -Dmaven.test.skip=true -Pscala-2.12 -Pspark-3.5
Submit demo program:
docker run -it --rm \
-v $(pwd):/demo \
-v $(pwd)/docker/.ivy2:/opt/bitnami/spark/.ivy2 \
-v $HOME/.m2/repository:/opt/bitnami/spark/.m2/repository \
--network arangodb \
docker.io/bitnami/spark:3.5.2 \
./bin/spark-submit --master spark://spark-master:7077 \
--packages="com.arangodb:arangodb-spark-datasource-3.5_2.12:$ARANGO_SPARK_VERSION" \
--class Demo /demo/target/demo-$ARANGO_SPARK_VERSION.jar
This demo requires the same environment setup as outlined above. Additionally, the python requirements will need to be installed as follows:
pip install -r ./python-demo/requirements.txt
To run the PySpark demo, run
python ./python-demo/demo.py \
--ssl-enabled=true \
--endpoints=172.28.0.1:8529,172.28.0.1:8539,172.28.0.1:8549
To run it against an Oasis deployment, run
python ./python-demo/demo.py \
--password=<root-password> \
--endpoints=<endpoint> \
--ssl-enabled=true \
--ssl-cert-value=<base64-encoded-cert>