Skip to content
Roberto Rodriguez edited this page Feb 2, 2018 · 17 revisions

Design

Kafka Ecosystem

Producers

Producers publish data to the topics of their choice. The producer is responsible for choosing which record to assign to which partition within the topic.Kafka HELK currently works with machines (producers) running Winlogbeat as their log shipper.

The following Winlogbeat config is recommended:

winlogbeat.event_logs:
  - name: Application
    ignore_older: 72h
  - name: Security
  - name: System
  - name: Microsoft-windows-sysmon/operational
  - name: Microsoft-windows-PowerShell/Operational
    event_id: 4103, 4104

output.kafka:
  hosts: ["<HELK-IP>:9092","<HELK-IP>:9093","<HELK-IP>:9094"]
  topic: "winlogbeat"
  max_retries: 2
  max_message_bytes: 1000000

You can check how your logs are being sent to the HELK by running the following command in your systems (producers):

winlogbeat.exe -e

Producers send data directly to the broker that is the leader for the partition without any intervening routing tier. To help the producer do this all Kafka nodes can answer a request for metadata about which servers are alive and where the leaders for the partitions of a topic are at any given time to allow the producer to appropriately direct its requests.

Kafka Cluster (Brokers)

HELK uses a kafka custer conformed of 3 brokers. Each broker has its own ID number and topic log partitions. Connecting to one broker bootstraps a client to the entire Kafka cluster.

Each HELK broker has the following custom configurations:

Name Description Type Value
broker.id The broker id for this server. If unset, a unique broker id will be generated.To avoid conflicts between zookeeper generated broker id's and user configured broker id's, generated broker ids start from reserved.broker.max.id + 1. int 0,1,2
listeners Listener List - Comma-separated list of URIs we will listen on and the listener names. If the listener name is not a security protocol, listener.security.protocol.map must also be set. Specify hostname as 0.0.0.0 to bind to all interfaces. Leave hostname empty to bind to default interface string PLAINTEXT://:9092 PLAINTEXT://:9093 PLAINTEXT://:9094
advertised.listeners Listeners to publish to ZooKeeper for clients to use, if different than the listeners config property. In IaaS environments, this may need to be different from the interface to which the broker binds. string PLAINTEXT://HELKIP:9092 PLAINTEXT://HELKIP:9093 PLAINTEXT://HELKIP:9094
log.dirs The directories in which the log data is kept. If not set, the value in log.dir is used string /tmp/kafka-logs /tmp/kafka-logs-1 /tmp/kafka-logs-2
auto.create.topics.enable Enable auto creation of topic on the server boolean false
num.replica.fetchers Number of fetcher threads used to replicate messages from a source broker. Increasing this value can increase the degree of I/O parallelism in the follower broker. int 2
unclean.leader.election.enable Indicates whether to enable replicas not in the ISR set to be elected as leader as a last resort, even though doing so may result in data loss boolean true
num.recovery.threads.per.data.dir The number of threads per data directory to be used for log recovery at startup and flushing at shutdown int 1

Zookeeper

Kafka needs ZooKeeper to work efficiently in the cluster. Kafka uses Zookeeper to do leadership election of Kafka Broker and Topic Partition pairs. Kafka uses Zookeeper to manage service discovery for Kafka Brokers that form the cluster. Zookeeper sends changes of the topology to Kafka, so each node in the cluster knows when a new broker joined, a Broker died, a topic was removed or a topic was added, etc. Zookeeper provides an in-sync view of Kafka Cluster configuration.Cloudurable