This repository contains the PostgresSync
function, a Pulsar function designed to synchronize data from a Debezium source to a PostgreSQL database.
PostgresSync
listens to events produced by Debezium, processes the incoming records, and writes the transformed data to a PostgreSQL database. It's designed to be efficient, robust, and scalable.
- Apache Pulsar setup and running
- PostgreSQL database setup and running
- Debezium connector setup with a source (e.g., MySQL, MongoDB, etc.)
-
Clone the Repository:
git clone <repository-url> cd <repository-directory>
-
Install Dependencies: Ensure you have
pip
installed:pip install -r requirements.txt
-
Configure Debezium: Ensure your Debezium connector is correctly configured and is publishing events to a Pulsar topic.
-
Run the
PostgresSync
Function:./bin/pulsar-admin functions localrun \ --classname PostgresSync \ --py test_postgres_sync.py \ --inputs <YOUR-DEBEZIUM-TOPIC> \ --output <YOUR-OUTPUT-TOPIC> \ --tenant public \ --namespace default \ --name PostgresSyncFunction
-
Monitor Logs: Monitor the function logs to ensure data is being processed and inserted into PostgreSQL.
- Connection Issues: Ensure PostgreSQL and Debezium are both running and accessible.
- Schema Issues: Make sure the schema of the incoming data matches the expected schema.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.