Streaming data use cases follow a similar pattern where data flows from data producers through streaming storage and data consumers to storage destinations. Sources continuously generate data, which is delivered via the ingest stage to the stream storage layer, where it's durably captured and made available for streaming processing. The stream processing layer processes the data in the stream storage layer and sends the processed information to a specified destination.
The challenge with these use cases is the set up time and effort that developers require to create the resources and establish the best practices needed by the streaming data services (such as access control, logging capabilities, and data integrations).
The AWS Streaming Data Solution for Amazon Kinesis and AWS Streaming Data Solution for Amazon MSK automatically configure the AWS services necessary to easily capture, store, process, and deliver streaming data. They provide common streaming data patterns for you to choose from that can serve as a starting point for solving your use case or to improve existing applications. You can try out new service combinations to implement common streaming data use cases, or use the solutions as the basis for your production environment.
- Architecture for AWS Streaming Data Solution for Amazon Kinesis
- Architecture for AWS Streaming Data Solution for Amazon MSK
- AWS CDK Constructs
- Project structure
- Deployment
- Creating a custom build
- Collection of operational metrics
- Known issues
- Additional Resources
AWS CDK Solutions Constructs make it easier to consistently create well-architected applications. All AWS Solutions Constructs are reviewed by AWS and use best practices established by the AWS Well-Architected Framework. This solution uses the following AWS CDK Constructs:
- aws-apigateway-kinesisstreams
- aws-apigateway-lambda
- aws-kinesisfirehose-s3
- aws-kinesisstreams-lambda
├── deployment
│ └── cdk-solution-helper [Lightweight helper that cleans-up synthesized templates from the CDK]
├── source
│ ├── bin [Entrypoint of the CDK application]
│ ├── docs [Architecture diagrams for each solution]
│ ├── labs [Templates for the Amazon MSK Labs]
│ ├── kinesis [Demo applications for the KPL and Apache Flink]
│ ├── lambda [Custom resources for features not supported by CloudFormation]
│ ├── lib [Constructs for the components of the solution]
│ ├── patterns [Stack definitions]
│ └── test [Unit tests]
You can launch this solution with one click from the solution home pages:
Please ensure you test the templates before updating any production deployments.
To customize the solution, follow the steps below:
- AWS Command Line Interface
- Node.js 14.x or later
- Python 3.8 or later
- Java 11 (only required if using Apache Flink)
- Apache Maven 3.1 (only required if using Apache Flink)
Note: The commands listed below will build all patterns. To only include one, you can modify the CDK entrypoint file on
source/bin/streaming-data-solution.ts
git clone https://github.com/awslabs/aws-streaming-data-solution-for-amazon-kinesis-and-amazon-msk
2. After introducing changes, run the unit tests to make sure the customizations don't break existing functionality
cd ./source
chmod +x ./run-all-tests.sh
./run-all-tests.sh
Note: In order to compile the solution, the build-s3 will install the AWS CDK.
ARTIFACT_BUCKET=my-bucket-name # S3 bucket name where customized code will reside
SOLUTION_NAME=my-solution-name # customized solution name
VERSION=my-version # version number for the customized code
cd ./deployment
chmod +x ./build-s3-dist.sh
./build-s3-dist.sh $ARTIFACT_BUCKET $SOLUTION_NAME $VERSION
Why doesn't the solution use CDK deploy? This solution includes a few Lambda functions, and by default CDK deploy will not install any dependencies (it'll only zip the contents of the path specified in fromAsset). In future releases, we'll look into leveraging bundling assets using Docker.
In addition to that, there are also some extra components (such as the demo applications for the KPL and Kinesis Data Analytics) that are implemented in Java, and the build-s3 script takes care of packaging them.
When creating the bucket for solution assets it is recommeded to:
- Use randomized names as part of your bucket naming strategy.
- Ensure buckets are not public.
- Verify bucket ownership prior to uploading templates or code artifacts.
Note: The created bucket name must have the region where the solution is being deployed as a suffix (for example, mybucket-name-us-east-1).
aws s3 sync ./global-s3-assets s3://$ARTIFACT_BUCKET-us-east-1/$SOLUTION_NAME/$VERSION --acl bucket-owner-full-control
aws s3 sync ./regional-s3-assets s3://$ARTIFACT_BUCKET-us-east-1/$SOLUTION_NAME/$VERSION --acl bucket-owner-full-control
- Get the link of the template uploaded to your Amazon S3 bucket (created as $ARTIFACT_BUCKET in the previous step)
- Deploy the solution to your account by launching a new AWS CloudFormation stack
This solution collects anonymous operational metrics to help AWS improve the quality of features of the solution. For more information, including how to disable this capability, please see the implementation guide for the AWS Streaming Data Solution for Amazon Kinesis and the implementation guide for the AWS Streaming Data Solution for Amazon MSK.
- For the options that use Amazon Kinesis Data Analytics, we recommend stopping the application or studio notebook before you delete the stack.
If it is running during the stack deletion, its status will change to
Updating
, and you might see some errors when CloudFormation tries to delete resources such asAWS::KinesisAnalyticsV2::ApplicationCloudWatchLoggingOption
andCustom::VpcConfiguration
(a custom resource that configures the application to connect to a virtual private cloud).
- Amazon Kinesis Data Streams
- Amazon Kinesis Data Firehose
- Amazon Kinesis Data Analytics
- Amazon Managed Streaming for Apache Kafka (Amazon MSK)
- AWS Lambda
- Kinesis Producer Library
- Amazon Kinesis Replay
- Amazon Kinesis Data Analytics Java Examples
- Flink: Hands-on Training
- Streaming Analytics Workshop
- Kinesis Scaling Utility
- Amazon MSK Data Generator
- Amazon MSK Labs
- Using Amazon MSK as an event source for AWS Lambda
- Query your Amazon MSK topics interactively using Amazon Kinesis Data Analytics Studio
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.