Skip to content

Quick start Spark

sapessi edited this page Feb 13, 2018 · 24 revisions

You can use the aws-serverless-java-container library to run a Spark application in AWS Lambda. You can use the library within your Lambda handler to proxy events to the Spark instance.

In the repository we have included a sample Spark application to get you started.

Import dependencies

The first step is to import the Spark implementation of the library:

<dependency>
    <groupId>com.amazonaws.serverless</groupId>
    <artifactId>aws-serverless-java-container-spark</artifactId>
    <version>[0.8,)</version>
</dependency>

This will automatically also import the aws-serverless-java-container-core and aws-lambda-java-core libraries.

Create the Lambda handler

In your application package declare a new class that implements Lambda's RequestStreamHandler interface. If you have configured API Gateway with a proxy integration, you can use the built-in POJOs AwsProxyRequest and AwsProxyResponse.

The next step is to declare the container handler object. The library exposes a utility static method that configures a SparkLambdaContainerHandler object for AWS proxy events. The handler object should be declared as a class property and be static. By doing this, Lambda will re-use the instance for subsequent requests.

The handleRequest method of the class can use the handler object we declared in the previous step to send requests to the Spring application.

On the first run, the handleRequest method initializes the handler object and then configures the Spark routes. It's important to configure the Spark routes only after the handler is initialized.

public class StreamLambdaHandler implements RequestStreamHandler {
    private static SparkLambdaContainerHandler<AwsProxyRequest, AwsProxyResponse> handler;
    static {
        try {
            handler = SparkLambdaContainerHandler.getAwsProxyHandler();
            SparkResources.defineResources();
            Spark.awaitInitialization();
        } catch (ContainerInitializationException e) {
            // if we fail here. We re-throw the exception to force another cold start
            e.printStackTrace();
            throw new RuntimeException("Could not initialize Spark container", e);
        }
    }

    @Override
    public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context)
            throws IOException {
        handler.proxyStream(inputStream, outputStream, context);

        // just in case it wasn't closed by the mapper
        outputStream.close();
    }
}

In our sample application, Spark methods are initialized in a static method of the separate SparkResources class.

public static void defineResources() {
    before((request, response) -> response.type("application/json"));

    post("/pets", (req, res) -> {
        Pet newPet = LambdaContainerHandler.getObjectMapper().readValue(req.body(), Pet.class);
       if (newPet.getName() == null || newPet.getBreed() == null) {
           return Response.status(400).entity(new Error("Invalid name or breed")).build();
       }

        Pet dbPet = newPet;
        dbPet.setId(UUID.randomUUID().toString());

        res.status(200);
        return dbPet;
    }, new JsonTransformer());
}

Packaging the application

By default, Spark includes an embedded Jetty web server with web sockets support. Because the serverless-java-container library acts as the web server we do not need the Jetty web socket files in our deployment jar. You can configure the maven shade plugin to exclude Jetty from the deployment package.

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>2.3</version>
            <configuration>
                <createDependencyReducedPom>false</createDependencyReducedPom>
            </configuration>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <artifactSet>
                            <excludes>
                                <exclude>org.eclipse.jetty.websocket:*</exclude>
                            </excludes>
                        </artifactSet>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

Publish your Lambda function

You can follow the instructions in AWS Lambda's documentation on how to package your function for deployment.

Clone this wiki locally