Skip to content

surgeventures/flink-cdc-pg-ch-demo

Repository files navigation

Flink CDC PostgreSQL to ClickHouse

Real-time data replication from PostgreSQL to ClickHouse using Flink CDC.

Architecture

  • Source: Two PostgreSQL databases (customers and orders)
  • Processing: Flink CDC for Change Data Capture
  • Target: ClickHouse

Prerequisites

  • Docker and Docker Compose
  • jvm
  • scala
  • sbt
  • curl (for deployment scripts)

Quick Start

  1. Start services and deploy job:
./deploy.sh up
  1. Add sample data:
./deploy.sh sample
  1. Check job status:
./deploy.sh status

Available Commands

./deploy.sh up       # Start everything, deploy job and seed data
./deploy.sh services # Start Flink and databases only
./deploy.sh job      # Deploy the CDC job
./deploy.sh sample   # Seed databases with sample data
./deploy.sh stop     # Stop all services
./deploy.sh status   # Check jobs status

Components

PostgreSQL

  • postgres1: Orders database (order_id, customer_id, order_date, total_amount, status)
  • postgres2: Customers database (customer_id, name, email, created_at)

Apache Flink

  • JobManager: Flink cluster management
  • TaskManager: Flink task execution
  • CDC Connectors: PostgreSQL CDC source

ClickHouse

  • Analytical database
  • Enriched orders table with customer data

Cmd to check results of real-time job:

docker-compose exec clickhouse clickhouse-client -q "SELECT * FROM cdc_demo.enriched_orders"

Development

Build the Flink job:

sbt "project flinkJob" clean assembly

Or you can just use sbt in a root as it's multimodule sbt project :)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published