Skip to content

Tutorial for data pipeline: Apache Kafka -> R

Notifications You must be signed in to change notification settings

pneff93/Kafka-R

Repository files navigation

Kafka-R

LinkedIn


Warning

rkafka

Package ‘rkafka’ was removed from the CRAN repository.

Formerly available versions can be obtained from the archive.

Archived on 2023-05-23 as issues were not corrected in time.

A summary of the most recent check results can be obtained from the check results archive.

Please use the canonical form https://CRAN.R-project.org/package=rkafka to link to this page.


This small tutorial creates a data pipeline from Apache Kafka into R using the rkafka package. It focuses on simplicity and can be seen as a baseline for similar projects. You can read more about it in my blog article: Create a Data Analysis Pipeline with Apache Kafka and RStudio.

Prerequisites

Set up

docker-compose up -d

It starts:

  • Zookeeper
  • Kafka Broker
  • Kafka Producer
    • built docker image executing fat JAR
  • RStudio
    • built docker image RStudio with rJava installed which is required for rkafka

Kafka Producer

The Kafka Producer produces fake events of a driving truck into the topic truck-topic in JSON format every two seconds. Verify that data is produced correctly:

docker-compose exec broker bash
kafka-console-consumer --bootstrap-server broker:9092 --topic truck-topic

RStudio

Open RStudio via:

localhost:8787

The username is user and password password.

Under /home you can run Data.R. It first creates a simpleConsumer, then requests all data from the beginning of the topic and finally converts the JSON string into a dataframe with jsonlite.

Sources

About

Tutorial for data pipeline: Apache Kafka -> R

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published