Apache Kafka Sink only Connect can be used to stream messages from Apache Kafka to Google Cloud Platform (GCP) wide column store Bigtable.
Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. For more details, please refer to Apache Kafka home page.
Bigtable is a compressed, high performance, proprietary data storage system built on Google File System, Chubby Lock Service, SSTable and a few other Google technologies. On May 6, 2015, a public version of Bigtable was made available as a service in the Google Cloud Platform. For more details, please refer to GCP Bigtable home page.
This project leverage bigtable-client-core library (NO HBase) to stream data to GCP Bigtable. bigtable-client-core internally use the gRPC framework to talk to GCP Bigtable.
Apache ZooKeeper and Apache Kafka installed and running in your machine. Please refer to respective sites to download and start ZooKeeper and Kafka. You would also need Java version 8 or above.
Software | Version | Note |
---|---|---|
Java | 11 | Tested using Java 11. |
Kafka | 3.3.1 | Please refer. Tested using kafka_2.13-3.3.1, should work with older versions. |
bigtable-client-core | 1.27.1 | Please refer. |
Kafka connect-api | 3.3.1 | Please refer. |
grpc-netty-shaded | 1.51.0 | Please refer. |
Please refer to project Wiki
The current configuration system supports streaming messages from a given topic to a given table. You can subscribe any number of topics, but a topic can be pointed to one and only table. Say for example, if you subscribed from topic named demo-topic, you should have yml file named demo-topic.yml. That yml file contains all the configuration requires to transform and write data into Bigtable.
Please refer to project Wiki
Please refer to project Wiki
Please refer to project Wiki
Either create an issues in this project or send it to [email protected]. Thanks!