This is an example Apache Spark Application that can access S3 storage.
It uses Hadoop 3.2.0 to enable role-based authentication with Spark 2.4.7. It should work with different Spark/Hadoop combinations as well.
This example is designed to run in local mode. You don't need a Spark cluster or AWS EMR to run this example.
The example application is in com.example.spark.cobol.app
package.
The project contains build files for Maven
.
To build the uber jar to run on any local computer
mvn package
After the project is packaged you can copy target/spark-cobol-s3-sa-0.0.1-SNAPSHOT.jar
.
The uber jar contains all dependencies, including Hadoop, AWS SDK and Spark libraries needed to run the application.
Use the following command to run it:
java -cp java -cp spark-cobol-s3-sa-0.0.1-SNAPSHOT.jar com.example.spark.cobol.app.SparkCobolS3App
or, since the main class is registered in the manifest, you can run the job even simpler:
java -jar spark-cobol-s3-sa-0.0.1-SNAPSHOT.jar
Alternatively, you can run the application from an IDE.