This sub-folder contains code examples that demonstrate how to implement real-time processing applications using Kafka Streams, which is a new stream processing library included with the Apache Kafka open source project.
Table of Contents
- Available examples
- Requirements
- Packaging and running the examples
- Development
- Version Compatibility Matrix
Note: See Version Compatibility Matrix below for an overview of which examples are available for which versions of Apache Kafka and Confluent Platform.
Note: We use the label "Lambda" to denote examples that make use of lambda expressions and thus require Java 8+.
- WordCountLambdaExample -- demonstrates, using the Kafka Streams DSL, how to implement the WordCount program that computes a simple word occurrence histogram from an input text.
- MapFunctionLambdaExample -- demonstrates how to perform stateless transformations via map functions, using the Kafka Streams DSL (see also the Scala variant MapFunctionScalaExample)
- SumLambdaExample
-- demonstrates how to perform stateful transformations via
reduceByKey
, using the Kafka Streams DSL - PageViewRegionLambdaExample
-- demonstrates how to perform a join between a
KStream
and aKTable
, i.e. an example of a stateful computation- Variant: PageViewRegionExample, which implements the same example but without lambda expressions and thus works with Java 7+.
- Working with data in Apache Avro format (see also the end-to-end demos under integration tests below):
- Generic Avro: PageViewRegionLambdaExample (Java 8+) and PageViewRegionExample (Java 7+)
- Specific Avro: WikipediaFeedAvroLambdaExample (Java 8+) and WikipediaFeedAvroExample (Java 7+)
- And further examples.
There are also a few integration tests, which demonstrate end-to-end data pipelines. Here, we spawn embedded Kafka clusters and the Confluent Schema Registry, feed input data to them, process the data using Kafka Streams, and finally verify the output results.
Tip: Run
mvn test
to launch the integration tests.
- WordCountLambdaIntegrationTest
- JoinLambdaIntegrationTest
- MapFunctionLambdaIntegrationTest
- PassThroughIntegrationTest
- SumLambdaIntegrationTest
- GenericAvroIntegrationTest.java
- SpecificAvroIntegrationTest.java
- MapFunctionScalaExample -- demonstrates how to perform simple, state-less transformations via map functions, using the Kafka Streams DSL (see also the Java variant MapFunctionLambdaExample)
There is also an integration test, which demonstrates end-to-end data pipelines. Here, we spawn embedded Kafka clusters, feed input data to them, process the data using Kafka Streams, and finally verify the output results.
Tip: Run
mvn test
to launch the integration tests.
The code in this repository requires Apache Kafka 0.10.0+ because from this point onwards Kafka includes its Kafka Streams library.
The code in this repository requires Confluent Platform 3.0.x.
Some code examples require Java 8, primarily because of the usage of lambda expressions.
IntelliJ IDEA users:
- Open File > Project structure
- Select "Project" on the left.
- Set "Project SDK" to Java 1.8.
- Set "Project language level" to "8 - Lambdas, type annotations, etc."
Scala is required only for the Scala examples in this repository. If you are a Java developer you can safely ignore this section.
If you want to experiment with the Scala examples in this repository, you need a version of Scala that supports Java 8
and SAM / Java lambda (e.g. Scala 2.11 with * -Xexperimental
compiler flag, or 2.12).
Tip: You can also run
mvn test
, which executes the included integration tests. These tests spawn embedded Kafka clusters to showcase the Kafka Streams functionality end-to-end. The benefit of the integration tests is that you don't need to install and run a Kafka cluster yourself.
If you want to run the examples against a Kafka cluster, you may want to create a standalone jar ("fat jar") of the Kafka Streams examples via:
# Create a standalone jar
$ mvn clean package
# >>> Creates target/streams-examples-3.0.0-standalone.jar
You can now run the example applications as follows:
# Run an example application from the standalone jar.
# Here: `WordCountLambdaExample`
$ java -cp target/streams-examples-3.0.0-standalone.jar \
io.confluent.examples.streams.WordCountLambdaExample
Keep in mind that the machine on which you run the command above must have access to the Kafka/ZK clusters you
configured in the code examples. By default, the code examples assume the Kafka cluster is accessible via
localhost:9092
(Kafka broker) and the ZooKeeper ensemble via localhost:2181
.
This project uses the standard maven lifecycle and commands such as:
$ mvn compile # This also generates Java classes from the Avro schemas
$ mvn test # But no tests yet!
Branch (this repo) | Apache Kafka | Confluent Platform |
---|---|---|
kafka-0.10.0.0-cp-3.0.0 | 0.10.0.0 | 3.0.0 |
The master
branch of this repository represents active development, and may require additional steps on your side to
make it compile. Check this README as well as pom.xml for any such information.