Skip to content

vzakaznikov/clickhouse-sink-connector

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Altinity Sink Connector for ClickHouse

Sink connector is used to transfer data from Kafka to Clickhouse using the Kafka connect framework. The connector is tested with the following converters

Features

  • Inserts, Updates and Deletes using ReplacingMergeTree/CollapsingMergeTree - Updates/Deletes
  • Deduplication logic to dedupe records from Kafka topic.(Based on Primary Key)
  • Exactly once semantics
  • Bulk insert to Clickhouse.
  • Store Kafka metadata Kafka Metadata
  • Kafka topic to ClickHouse table mapping, use case where MySQL table can be mapped to a different CH table name.
  • Store raw data in JSON(For Auditing purposes)
  • Monitoring(Using Grafana/Prometheus) Dashboard to monitor lag.
  • Kafka Offset management in ClickHouse
  • Increased Parallelism(Customize thread pool for JDBC connections)

Source Databases

  • MySQL (Debezium)
  • PostgreSQL (Debezium) (Testing in progress)
Component Version(Tested)
Redpanda 22.1.3
Kafka-connect 1.9.5.Final
Debezium 1.9.5.Final
MySQL 8.0
ClickHouse 22.9

Quick Start (Docker-compose)

Docker image for Sink connector altinity/clickhouse-sink-connector:latest https://hub.docker.com/r/altinity/clickhouse-sink-connector

MySQL:

cd deploy/docker
./start-docker-compose.sh 

For Detailed setup instructions - Setup

Development:

Requirements

mvn install -DskipTests=true

DataTypes

MySQL Kafka
Connect
ClickHouse
Bigint INT64_SCHEMA Int64
Bigint Unsigned INT64_SCHEMA UInt64
Blob String + hex
Char String String / LowCardinality(String)
Date Schema: INT64
Name:
debezium.Date
Date(6)
DateTime(6) Schema: INT64
Name: debezium.Timestamp
DateTime64(6)
Decimal(30,12) Schema: Bytes
Name:
kafka.connect.data.Decimal
Decimal(30,12)
Double Float64
Int INT32 Int32
Int Unsigned INT64 UInt32
Longblob String + hex
Mediumblob String + hex
Mediumint INT32 Int32
Mediumint Unsigned INT32 UInt32
Smallint INT16 Int16
Smallint Unsigned INT32 UInt16
Text String String
Time String
Time(6) String
Timestamp DateTime64
Tinyint INT16 Int8
Tinyint Unsigned INT16 UInt8
varbinary(*) String + hex
varchar(*) String
JSON String
BYTES BYTES, io.debezium.bits String
YEAR INT32 INT32
GEOMETRY Binary of WKB String

ClickHouse Loader(Load Data from MySQL to CH for Initial Load)

Clickhouse Loader is a program that loads data dumped in MySQL into a CH database compatible the sink connector (ReplacingMergeTree with virtual columns _version and _sign)

Grafana Dashboard

Documentation

About

Altinity Sink Connector for ClickHouse

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 45.8%
  • Java 39.1%
  • Shell 14.7%
  • Dockerfile 0.4%