Wu buck wild wit da trigga.
Kafka is an event streaming platform used to collect, store and process real time data streams at scale. Used for:
- Distributed logging
- stream processing
- pub-sub messaging.
plus other stuff
Primary unit of storage.
- Container for events.
- Also holds filtered and transformed events
- Log of events (not really a queue, although it may feel queue-y)
- not indexred, must seek to a point and then scan
- immutable
- durable
- append-only
A thing that has happened and its description.
- any kind of thing
- combination of notification and state
- key-value pairs
- keys can be complex domain objects serialized
- often just strings or integers
- probably not a unique identifier
State is serialized in some way.
- An ecosystem of pluggable connections.
- Client application
- Server process independent of brokers
- Designed to be scalable. (You can have a cluster)
- Abstracts a lot of the code away from the user
- You don't need to write code to manage connectors to resources like an ElasticSearch service.
Kafka Connect is a tool that allows you to easily integrate Kafka with external systems, such as databases, message queues, or other data sources and sinks. It simplifies the process of building and managing connectors for importing and exporting data to and from Kafka.
With Kafka Connect, you can configure pre-built connectors or develop custom connectors to stream data into Kafka topics or out of Kafka topics into other systems. It's designed to be scalable and fault-tolerant, enabling you to handle large volumes of data reliably.
Overall, Kafka Connect helps streamline the process of integrating Kafka with various data sources and sinks, making it easier to build real-time data pipelines.
- Jar file with all of the JVM connection code
- It's like a runtime
- source connector acts as a producer
- sync connector acts as consumer
Another fundamental component in the Kafka ecosystem is Kafka Streams. Kafka Streams is a client library for building real-time streaming applications using Kafka. It allows developers to process and analyze data directly within Kafka, without needing to set up separate processing clusters.
With Kafka Streams, you can perform tasks like data transformation, aggregation, filtering, and joining on streams of data. It's a powerful tool for building event-driven microservices, real-time analytics, and other stream processing applications.
So, Kafka Connect and Kafka Streams are both essential components that extend the functionality of Apache Kafka and enable developers to build robust, scalable, and real-time data pipelines and applications.
Provides a graphical user interface (GUI) for managing Kafka clusters and monitoring their performance. Control Center allows you to create new Kafka clusters, configure them, and connect them to services like Zookeeper for distributed coordination and management of Kafka brokers. It's a handy tool for administrators and developers to visualize the health and status of Kafka clusters and to perform various administrative tasks.
- What's an event?
- What is a topic?
- How does it scale?
- Messages in
- Read from it
- connect framework for integrating with other services
- streams for advanced stream processing
- ksql db for realtime processing in sql