Skip to content

Akka Integration 0708

Simon Souter edited this page Jul 28, 2016 · 1 revision

The scala-kafka-client-akka module provides configurable Akka Actor components to support asynchronous and non blocking streaming of data to and from an Apache Kafka cluster. The primary components provided are the net.cakesolutions.akka.KafkaConsumerActor and the net.cakesolutions.akka.KafkaProducerActor that can be easily created and integrated into a Akka based application.

Resolve

Artifacts are published to bintray here: Bintray Repo To resolve using sbt, add the following resolver to your build.sbt:

resolvers += Resolver.bintrayRepo("cakesolutions", "maven")

And add the dependency:

// Latest release:
libraryDependencies += "net.cakesolutions" %% "scala-kafka-client-akka" % "0.8.0"

Motivation

Apache Kafka is a real-time, fault tolerant and highly scalable message broker/ commit log. Huge volumes of data can be streamed through a Kafka Cluster, which can provide HA and delivery guarantees. Akka is an Open Source toolkit written in Scala for building highly concurrent, distributed and message driven applications. In combination, these two technologies provides an excellent platform to build custom high-throughput and scalable data pipelines. They also forms a key part of the hugely popular SMACK Stack.

This module provides Akka Actor components that simplify the integration of these technologies using the Apache Kafka Java client under the hood.

The Java driver can be used directly, although he basic KafkaConsumer provided by the Kafka Java client is not thread safe and must be driven by a client poll thread, typically from a blocking style poll loop. While this type of approach may be adequate for many applications, there are some clear drawbacks:

  1. Threading code must be implemented to facilitate the poll loop.
  2. One thread is required per consumer.
  3. Network IO and message processing occurs on the same thread, increasing round-trip latency.

The KafkaConsumerActor provided by this module utilises Akka's message dispatch architecture to implement an asynchronous consumer with an Akka Scheduler driven poll loop that requests and buffers records from Kafka and dispatches to a receiving Actor asynchronously on a separate thread (analogous to the Reactor Pattern).

Other approaches to building streaming applications with Kafka also include:

  1. Reactive Kafka - A Reactive Streams API maintained by the Akka core team to stream data using Scala or Java.
  2. Kafka Streams - A Java based stream processing library.