Kafka Producer for Efficient Data Streaming to Kafka

This Python Kafka producer facilitates high-performance data streaming from Ray DataFrame and Pandas DataFrame to Kafka. It is optimized to provide approximately 3-4x performance improvement compared to standard Kafka producers.

Installation

Install the package using pip:

pip3 install ray_kafka_producer@git+https://github.com/ujjawal-khare-27/ray-kafka-producer@main --force-reinstall

Usage

Import the package

from ray_kafka_producer.producer_manager import KafkaProducerManager

Create an instance of KafkaProducerManager

# actor_pool_size is the number of actors that will be created to send data to Kafka
# num_cpu is the number of CPUs that will be allocated to each actor
kafka_producer_manager = KafkaProducerManager(bootstrap_servers="localhost:9092", topic="test", actor_pool_size=12,
            num_cpu=0.25)

Send messages to Kafka (Ray DataFrame)

kafka_producer_manager.flush_ray_df(df = ray_df, is_actor=True)

Send messages to Kafka (Pandas DataFrame)

kafka_producer_manager.flush_pandas_df(df = pandas_df)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
ray_kafka_producer		ray_kafka_producer
tests		tests
.gitignore		.gitignore
index.html		index.html
readme.MD		readme.MD
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka Producer for Efficient Data Streaming to Kafka

Installation

Usage

About

Releases

Packages

Contributors 2

Languages

ujjawal-khare-27/ray-kafka-producer

Folders and files

Latest commit

History

Repository files navigation

Kafka Producer for Efficient Data Streaming to Kafka

Installation

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages