Warning
rkafka
Package ‘rkafka’ was removed from the CRAN repository.
Formerly available versions can be obtained from the archive.
Archived on 2023-05-23 as issues were not corrected in time.
A summary of the most recent check results can be obtained from the check results archive.
Please use the canonical form https://CRAN.R-project.org/package=rkafka to link to this page.
This small tutorial creates a data pipeline from Apache Kafka into R using the rkafka package. It focuses on simplicity and can be seen as a baseline for similar projects. You can read more about it in my blog article: Create a Data Analysis Pipeline with Apache Kafka and RStudio.
docker-compose up -d
It starts:
- Zookeeper
- Kafka Broker
- Kafka Producer
- built docker image executing fat JAR
- RStudio
The Kafka Producer produces fake events of a driving truck into the topic truck-topic
in JSON
format every two seconds.
Verify that data is produced correctly:
docker-compose exec broker bash
kafka-console-consumer --bootstrap-server broker:9092 --topic truck-topic
Open RStudio via:
localhost:8787
The username is user
and password password
.
Under /home
you can run Data.R
. It first creates a simpleConsumer
, then requests all data from the beginning of the topic
and finally converts the JSON string into a dataframe with jsonlite.