Table of Contents generated with DocToc
A Puppet module for installing and managing Apache Kafka brokers.
This module is currently being maintained by The Wikimedia Foundation in Gerrit at operations/puppet/kafka and mirrored here on GitHub. It was originally developed for 0.7.2 at https://github.com/wikimedia/puppet-kafka-0.7.2.
- Java
- An Kafka 0.8 package. You can build a .deb package using the operations/debs/kafka debian branch, or just install using this prebuilt .deb
- A running zookeeper cluster. You can set one up using WMF's puppet-zookeeper module.
# Install the kafka libraries and client packages.
class { 'kafka': }
This will install the kafka-common and kafka-cli which includes /usr/bin/kafka, useful for running client (console-consumer, console-producer, etc.) commands.
# Include Kafka Broker Server.
class { 'kafka::server':
log_dirs => ['/var/spool/kafka/a', '/var/spool/kafka/b'],
brokers => {
'kafka-node01.example.com' => { 'id' => 1, 'port' => 12345 },
'kafka-node02.example.com' => { 'id' => 2 },
},
zookeeper_hosts => ['zk-node01:2181', 'zk-node02:2181', 'zk-node03:2181'],
zookeeper_chroot => '/kafka/cluster_name',
}
log_dirs
defaults to a single ['/var/spool/kafka]
, but you may
specify multiple Kafka log data directories here. This is useful for spreading
your topic partitions across multiple disks.
The brokers
parameter is a Hash keyed by $::fqdn
. Each value is another Hash
that contains config settings for that kafka host. id
is required and must
be unique for each Kafka Broker Server host. port
is optional, and defaults
to 9092.
Each Kafka Broker Server's broker_id
and port
properties in server.properties
will be set based by looking up the node's $::fqdn
in the hosts
Hash passed into the kafka
base class.
zookeeper_hosts
is an array of Zookeeper host:port pairs.
zookeeper_chroot
is optional, and allows you to specify a Znode under
which Kafka will store its metadata in Zookeeper. This is useful if you
want to use a single Zookeeper cluster to manage multiple Kafka clusters.
Kafka MirrorMaker will allow you to mirror data from multiple Kafka clusters into another. This is useful for cross DC replication and for aggregation.
# Mirror the 'main' and 'secondary' Kafka clusters
# to the 'aggregate' Kafka cluster.
kafka::mirror::consumer { 'main':
mirror_name => 'aggregate',
zookeeper_url => 'zk:2181/kafka/main',
}
kafka::mirror::consumer { 'secondary':
mirror_name => 'aggregate',
zookeeper_url => 'zk:2181/kafka/secondary',
}
kafka::mirror { 'aggregate':
destination_brokers => ['ka01:9092','ka02:9092'],
whitelist => 'these_topics_only.*',
}
Note that the kafka-mirror service does not subscribe to its config files. If you make changes, you will have to restart the service manually.
kafka::server::jmxtrans
and kafka::mirror::jmxtrans
configure
useful jmxtrans JSON config objects that can be used to tell jmxtrans to send
to any output writer (Ganglia, Graphite, etc.). To you use this, you will need
the puppet-jmxtrans module.
# Include this class on each of your Kafka Broker Servers.
class { '::kafka::server::jmxtrans':
ganglia => 'ganglia.example.com:8649',
}
This will install jmxtrans and render JSON config files for sending JVM and Kafka Broker stats to Ganglia. See kafka-jmxtrans.json.md for a fully rendered jmxtrans Kafka Broker JSON config file.
# Declare this define on hosts where you run Kafka MirrorMaker.
kafka::mirror::jmxtrans { 'aggregate':
statsd => 'statsd.example.org:8125'
}
This will install jmxtrans and render JSON config files for sending JVM and Kafka MirrorMaker (consumers and producer) stats to statsd.