-
Notifications
You must be signed in to change notification settings - Fork 559
Quick start Spark
⚠️ Sparks framework is no longer supported starting with version 2.0: To continue using Sparks, you need to use Serverless Java Container 1.x.
You can use the aws-serverless-java-container
library to run a Spark application in AWS Lambda. You can use the library within your Lambda handler to proxy events to the Spark instance.
In the repository we have included a sample Spark application to get you started.
Serverless Java Container 1.x is tested against Spark version 2.8.0 and above
You can quickly create a new serverless Spark application using our Maven archetype. First, make sure Maven is installed in your environment and available in your PATH
. Next, using a terminal or your favorite IDE create a new application, the archetype groupId
is com.amazonaws.serverless.archetypes
and the artifactId
is aws-serverless-spark-archetype
:
mvn archetype:generate -DgroupId=my.service -DartifactId=my-service -Dversion=1.0-SNAPSHOT \
-DarchetypeGroupId=com.amazonaws.serverless.archetypes \
-DarchetypeArtifactId=aws-serverless-spark-archetype \
-DarchetypeVersion=1.9.4
The archetype sets up a new project that includes a pom.xml
file as well as a build.gradle
file. The generated code includes a StreamLambdaHandler
class, the main entry point for AWS Lambda; a resource
package with a /ping
resource; and a set of unit tests that exercise the application.
The project also includes a file called template.yml
. This is a SAM template that you can use to quickly test your application in local or deploy it to AWS. Open the README.md
file in the project root folder for instructions on how to use the SAM CLI to run your Serverless API or deploy it to AWS.
The first step is to import the Spark implementation of the library:
<dependency>
<groupId>com.amazonaws.serverless</groupId>
<artifactId>aws-serverless-java-container-spark</artifactId>
<version>1.9.4</version>
</dependency>
This will automatically also import the aws-serverless-java-container-core
and aws-lambda-java-core
libraries.
In your application package declare a new class that implements Lambda's RequestStreamHandler
interface. If you have configured API Gateway with a proxy integration, you can use the built-in POJOs AwsProxyRequest
and AwsProxyResponse
.
The next step is to declare the container handler object. The library exposes a utility static method that configures a SparkLambdaContainerHandler
object for AWS proxy events. The handler object should be declared as a class property and be static. By doing this, Lambda will re-use the instance for subsequent requests.
The handleRequest
method of the class can use the handler
object we declared in the previous step to send requests to the Spring application.
On the first run, the handleRequest
method initializes the handler
object and then configures the Spark routes. It's important to configure the Spark routes only after the handler is initialized.
public class StreamLambdaHandler implements RequestStreamHandler {
private static SparkLambdaContainerHandler<AwsProxyRequest, AwsProxyResponse> handler;
static {
try {
handler = SparkLambdaContainerHandler.getAwsProxyHandler();
// If you are using HTTP APIs with the version 2.0 of the proxy model, use the getHttpApiV2ProxyHandler
// method: handler = SparkLambdaContainerHandler.getHttpApiV2ProxyHandler();
SparkResources.defineResources();
Spark.awaitInitialization();
} catch (ContainerInitializationException e) {
// if we fail here. We re-throw the exception to force another cold start
e.printStackTrace();
throw new RuntimeException("Could not initialize Spark container", e);
}
}
@Override
public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context)
throws IOException {
handler.proxyStream(inputStream, outputStream, context);
}
}
In our sample application, Spark methods are initialized in a static method of the separate SparkResources
class.
public static void defineResources() {
before((request, response) -> response.type("application/json"));
post("/pets", (req, res) -> {
Pet newPet = LambdaContainerHandler.getObjectMapper().readValue(req.body(), Pet.class);
if (newPet.getName() == null || newPet.getBreed() == null) {
return Response.status(400).entity(new Error("Invalid name or breed")).build();
}
Pet dbPet = newPet;
dbPet.setId(UUID.randomUUID().toString());
res.status(200);
return dbPet;
}, new JsonTransformer());
}
By default, Spark includes an embedded Jetty web server with web sockets support. Because the serverless-java-container library acts as the web server we do not need the Jetty web socket files in our deployment jar. You can configure the maven shade plugin to exclude Jetty from the deployment package.
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<configuration>
<createDependencyReducedPom>false</createDependencyReducedPom>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<exclude>org.eclipse.jetty.websocket:*</exclude>
</excludes>
</artifactSet>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
You can follow the instructions in AWS Lambda's documentation on how to package your function for deployment.