-
Notifications
You must be signed in to change notification settings - Fork 559
Quick start Spark
You can use the aws-serverless-java-container
library to run a Spark application in AWS Lambda. You can use the library within your Lambda handler to proxy events to the Spark instance.
In the repository we have included a sample Spark application to get you started.
You can quickly create a new serverless Spark application using our Maven archetype. First, make sure Maven is installed in your environment and available in your PATH
. Next, using a terminal or your favorite IDE create a new application, the archetype groupId
is com.amazonaws.serverless.archetypes
and the artifactId
is aws-serverless-spark-archetype
;
mvn archetype:generate -DgroupId=my.service -DartifactId=my-service -Dversion=1.0-SNAPSHOT \
-DarchetypeGroupId=com.amazonaws.serverless.archetypes \
-DarchetypeArtifactId=aws-serverless-spark-archetype \
-DarchetypeVersion=1.1.2
The archetype sets up a new maven project. The pom.xml
includes the dependencies you will need to build a basic Spark API that can consume and product JSON data. The generated code includes a StreamLambdaHandler
class, the main entry point for AWS Lambda; a SparkResources
class that defines a /ping
resource; and a set of unit tests that exercise the application.
The project also includes a file called sam.yaml
. This is a SAM template that you can use to quickly test your application in local or deploy it to AWS. Open the README.md
file in the project root folder for instructions on how to use SAM Local to run your Serverless API or deploy it to AWS.
The first step is to import the Spark implementation of the library:
<dependency>
<groupId>com.amazonaws.serverless</groupId>
<artifactId>aws-serverless-java-container-spark</artifactId>
<version>1.1</version>
</dependency>
This will automatically also import the aws-serverless-java-container-core
and aws-lambda-java-core
libraries.
In your application package declare a new class that implements Lambda's RequestStreamHandler
interface. If you have configured API Gateway with a proxy integration, you can use the built-in POJOs AwsProxyRequest
and AwsProxyResponse
.
The next step is to declare the container handler object. The library exposes a utility static method that configures a SparkLambdaContainerHandler
object for AWS proxy events. The handler object should be declared as a class property and be static. By doing this, Lambda will re-use the instance for subsequent requests.
The handleRequest
method of the class can use the handler
object we declared in the previous step to send requests to the Spring application.
On the first run, the handleRequest
method initializes the handler
object and then configures the Spark routes. It's important to configure the Spark routes only after the handler is initialized.
public class StreamLambdaHandler implements RequestStreamHandler {
private static SparkLambdaContainerHandler<AwsProxyRequest, AwsProxyResponse> handler;
static {
try {
handler = SparkLambdaContainerHandler.getAwsProxyHandler();
SparkResources.defineResources();
Spark.awaitInitialization();
} catch (ContainerInitializationException e) {
// if we fail here. We re-throw the exception to force another cold start
e.printStackTrace();
throw new RuntimeException("Could not initialize Spark container", e);
}
}
@Override
public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context)
throws IOException {
handler.proxyStream(inputStream, outputStream, context);
// just in case it wasn't closed by the mapper
outputStream.close();
}
}
In our sample application, Spark methods are initialized in a static method of the separate SparkResources
class.
public static void defineResources() {
before((request, response) -> response.type("application/json"));
post("/pets", (req, res) -> {
Pet newPet = LambdaContainerHandler.getObjectMapper().readValue(req.body(), Pet.class);
if (newPet.getName() == null || newPet.getBreed() == null) {
return Response.status(400).entity(new Error("Invalid name or breed")).build();
}
Pet dbPet = newPet;
dbPet.setId(UUID.randomUUID().toString());
res.status(200);
return dbPet;
}, new JsonTransformer());
}
By default, Spark includes an embedded Jetty web server with web sockets support. Because the serverless-java-container library acts as the web server we do not need the Jetty web socket files in our deployment jar. You can configure the maven shade plugin to exclude Jetty from the deployment package.
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<configuration>
<createDependencyReducedPom>false</createDependencyReducedPom>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<exclude>org.eclipse.jetty.websocket:*</exclude>
</excludes>
</artifactSet>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
You can follow the instructions in AWS Lambda's documentation on how to package your function for deployment.