Real-time data replication from PostgreSQL to ClickHouse using Flink CDC.
- Source: Two PostgreSQL databases (customers and orders)
- Processing: Flink CDC for Change Data Capture
- Target: ClickHouse
- Docker and Docker Compose
- jvm
- scala
- sbt
- curl (for deployment scripts)
- Start services and deploy job:
./deploy.sh up
- Add sample data:
./deploy.sh sample
- Check job status:
./deploy.sh status
./deploy.sh up # Start everything, deploy job and seed data
./deploy.sh services # Start Flink and databases only
./deploy.sh job # Deploy the CDC job
./deploy.sh sample # Seed databases with sample data
./deploy.sh stop # Stop all services
./deploy.sh status # Check jobs status
- postgres1: Orders database (order_id, customer_id, order_date, total_amount, status)
- postgres2: Customers database (customer_id, name, email, created_at)
- JobManager: Flink cluster management
- TaskManager: Flink task execution
- CDC Connectors: PostgreSQL CDC source
- Analytical database
- Enriched orders table with customer data
Cmd to check results of real-time job:
docker-compose exec clickhouse clickhouse-client -q "SELECT * FROM cdc_demo.enriched_orders"
Build the Flink job:
sbt "project flinkJob" clean assembly
Or you can just use sbt in a root as it's multimodule sbt project :)