Flink ClickHouse Connector

Flink SQL connector for ClickHouse database, this project Powered by ClickHouse JDBC.

Currently, the project supports Source/Sink Table and Flink Catalog.
Please create issues if you encounter bugs and any help for the project is greatly appreciated.

Connector Options

Option	Required	Default	Type	Description
url	required	none	String	The ClickHouse jdbc url in format `clickhouse://<host>:<port>`.
username	optional	none	String	The 'username' and 'password' must both be specified if any of them is specified.
password	optional	none	String	The ClickHouse password.
database-name	optional	default	String	The ClickHouse database name.
table-name	required	none	String	The ClickHouse table name.
use-local	optional	false	Boolean	Directly read/write local tables in case of distributed table engine.
sink.batch-size	optional	1000	Integer	The max flush size, over this will flush data.
sink.flush-interval	optional	1s	Duration	Over this flush interval mills, asynchronous threads will flush data.
sink.max-retries	optional	3	Integer	The max retry times when writing records to the database failed.
~~sink.write-local~~	optional	false	Boolean	Removed from version 1.15, use `use-local` instead.
sink.update-strategy	optional	update	String	Convert a record of type UPDATE_AFTER to update/insert statement or just discard it, available: update, insert, discard.
sink.partition-strategy	optional	balanced	String	Partition strategy: balanced(round-robin), hash(partition key), shuffle(random).
sink.partition-key	optional	none	String	Partition key used for hash strategy.
sink.sharding.use-table-definition	optional	false	Boolean	Sharding strategy consistent with definition of distributed table, if set to true, the configuration of `sink.partition-strategy` and `sink.partition-key` will be overwritten.
sink.ignore-delete	optional	true	Boolean	Whether to ignore delete statements.
sink.parallelism	optional	none	Integer	Defines a custom parallelism for the sink.
scan.partition.column	optional	none	String	The column name used for partitioning the input.
scan.partition.num	optional	none	Integer	The number of partitions.
scan.partition.lower-bound	optional	none	Long	The smallest value of the first partition.
scan.partition.upper-bound	optional	none	Long	The largest value of the last partition.
catalog.ignore-primary-key	optional	true	Boolean	Whether to ignore primary keys when using ClickHouseCatalog to create table.
properties.*	optional	none	String	This can set and pass `clickhouse-jdbc` configurations.
lookup.cache	optional	NONE	String	The caching strategy for this lookup table, including NONE and PARTIAL(not support FULL yet)
lookup.partial-cache.expire-after-access	optional	none	Duration	Duration to expire an entry in the cache after accessing, over this time, the oldest rows will be expired.
lookup.partial-cache.expire-after-write	optional	none	Duration	Duration to expire an entry in the cache after writing, over this time, the oldest rows will be expired.
lookup.partial-cache.max-rows	optional	none	Long	The max number of rows of lookup cache, over this value, the oldest rows will be expired.
lookup.partial-cache.caching-missing-key	optional	true	Boolean	Flag to cache missing key, true by default
lookup.max-retries	optional	3	Integer	The max retry times if lookup database failed.

Update/Delete Data Considerations:

Distributed table don't support the update/delete statements, if you want to use the update/delete statements, please be sure to write records to local table or set use-local to true.
The data is updated and deleted by the primary key, please be aware of this when using it in the partition table.

breaking

Since version 1.16, we have taken shard weight into consideration, this may affect which shard the data is distributed to.

Data Type Mapping

Flink Type	ClickHouse Type
CHAR	String
VARCHAR	String / IP / UUID
STRING	String / Enum
BOOLEAN	UInt8
BYTES	FixedString
DECIMAL	Decimal / Int128 / Int256 / UInt64 / UInt128 / UInt256
TINYINT	Int8
SMALLINT	Int16 / UInt8
INTEGER	Int32 / UInt16 / Interval
BIGINT	Int64 / UInt32
FLOAT	Float32
DOUBLE	Float64
DATE	Date
TIME	DateTime
TIMESTAMP	DateTime
TIMESTAMP_LTZ	DateTime
INTERVAL_YEAR_MONTH	Int32
INTERVAL_DAY_TIME	Int64
ARRAY	Array
MAP	Map
ROW	Not supported
MULTISET	Not supported
RAW	Not supported

Maven Dependency

The project isn't published to the maven central repository, we need to deploy/install to our own repository before use it, step as follows:

# clone the project
git clone https://github.com/itinycheng/flink-connector-clickhouse.git

# enter the project directory
cd flink-connector-clickhouse/

# display remote branches
git branch -r

# checkout the branch you need
git checkout $branch_name

# install or deploy the project to our own repository
mvn clean install -DskipTests
mvn clean deploy -DskipTests

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-clickhouse</artifactId>
    <version>1.16.0-SNAPSHOT</version>
</dependency>

How to use

Create and read/write table

-- register a clickhouse table `t_user` in flink sql.
CREATE TABLE t_user (
    `user_id` BIGINT,
    `user_type` INTEGER,
    `language` STRING,
    `country` STRING,
    `gender` STRING,
    `score` DOUBLE,
    `list` ARRAY<STRING>,
    `map` Map<STRING, BIGINT>,
    PRIMARY KEY (`user_id`) NOT ENFORCED
) WITH (
    'connector' = 'clickhouse',
    'url' = 'clickhouse://{ip}:{port}',
    'database-name' = 'tutorial',
    'table-name' = 'users',
    'sink.batch-size' = '500',
    'sink.flush-interval' = '1000',
    'sink.max-retries' = '3'
);

-- read data from clickhouse 
SELECT user_id, user_type from t_user;

-- write data into the clickhouse table from the table `T`
INSERT INTO t_user
SELECT cast(`user_id` as BIGINT), `user_type`, `lang`, `country`, `gender`, `score`, ARRAY['CODER', 'SPORTSMAN'], CAST(MAP['BABA', cast(10 as BIGINT), 'NIO', cast(8 as BIGINT)] AS MAP<STRING, BIGINT>) FROM T;

Create and use ClickHouseCatalog

Scala

val tEnv = TableEnvironment.create(setting)

val props = new util.HashMap[String, String]()
props.put(ClickHouseConfig.DATABASE_NAME, "default")
props.put(ClickHouseConfig.URL, "clickhouse://127.0.0.1:8123")
props.put(ClickHouseConfig.USERNAME, "username")
props.put(ClickHouseConfig.PASSWORD, "password")
props.put(ClickHouseConfig.SINK_FLUSH_INTERVAL, "30s")
val cHcatalog = new ClickHouseCatalog("clickhouse", props)
tEnv.registerCatalog("clickhouse", cHcatalog)
tEnv.useCatalog("clickhouse")

tEnv.executeSql("insert into `clickhouse`.`default`.`t_table` select...");

Java

TableEnvironment tEnv = TableEnvironment.create(setting);

Map<String, String> props = new HashMap<>();
props.put(ClickHouseConfig.DATABASE_NAME, "default")
props.put(ClickHouseConfig.URL, "clickhouse://127.0.0.1:8123")
props.put(ClickHouseConfig.USERNAME, "username")
props.put(ClickHouseConfig.PASSWORD, "password")
props.put(ClickHouseConfig.SINK_FLUSH_INTERVAL, "30s");
Catalog cHcatalog = new ClickHouseCatalog("clickhouse", props);
tEnv.registerCatalog("clickhouse", cHcatalog);
tEnv.useCatalog("clickhouse");

tEnv.executeSql("insert into `clickhouse`.`default`.`t_table` select...");

SQL

> CREATE CATALOG clickhouse WITH (
    'type' = 'clickhouse',
    'url' = 'clickhouse://127.0.0.1:8123',
    'username' = 'username',
    'password' = 'password',
    'database-name' = 'default',
    'use-local' = 'false',
    ...
);

> USE CATALOG clickhouse;
> SELECT user_id, user_type FROM `default`.`t_user` limit 10;
> INSERT INTO `default`.`t_user` SELECT ...;

Roadmap

The main branch is currently unstable and should not be used in production

Flink Clickhouse Connector donated to Apache Flink #102 @czy006
Perfect Junit Tests for Connector

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.github		.github
.mvn/wrapper		.mvn/wrapper
docs		docs
flink-connector-clickhouse-e2e-test		flink-connector-clickhouse-e2e-test
flink-connector-clickhouse		flink-connector-clickhouse
flink-sql-connector-clickhouse		flink-sql-connector-clickhouse
tools		tools
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flink ClickHouse Connector

Connector Options

Data Type Mapping

Maven Dependency

How to use

Create and read/write table

Create and use ClickHouseCatalog

Scala

Java

SQL

Roadmap

About

Releases

Packages

Contributors 7

Languages

License

itinycheng/flink-connector-clickhouse

Folders and files

Latest commit

History

Repository files navigation

Flink ClickHouse Connector

Connector Options

Data Type Mapping

Maven Dependency

How to use

Create and read/write table

Create and use ClickHouseCatalog

Scala

Java

SQL

Roadmap

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages