A streaming data pipeline typically consists of data transformation, wrangling, and (time-based window) aggregation. On top of that, we must also guarantee data integrity. One might think of Kafka Streams to solve all these challenges, and it is definitely a good choice. However, in many cases, ksqlDB queries are simpler, faster to implement and work fine.
This repository was used in a Confluent meetup. You can watch the recording in the Community Forum.
Note
In the meanwhile this repo became a playground for different ways of deployment as well as exploring features such as Cluster Linking, or enabling metrics. You can find them under different branches.
Branch | Additional Features |
---|---|
local | with Metrics (C3, JMX) for broker and Kafka Streams |
local_security | with Encryption: SSL, Authentication: SASL and mTLS, Authorization: ACL |
local_health+ | with Health+ (including CC Metrics API) |
local_c3_reduced | with C3 in reduced infrastructure mode |
cfk_minikube | deployed with CFK (including Health+) |
ccloud | deployed in CC, testing Metrics API (/query and /export into Grafana Cloud), Audit Logs consuming, Cluster Linking via UI (CC to CC), Schema Linking via CLI (CC to CC), RBAC via Confluent CLI |
ccloud_stream_designer | deployed in CC using Stream Designer to deploy the pipeline |
hybrid | deployed locally but mirroring all topics to CC via Cluster Linking |