Flink/Kafka Meetup.com events processor

Produce mock meetup.com events to Kafka in and perform real-time stream processing using Flink.

How to run it

Requirements:

Docker with Docker Compose

To start all required containers, run make up. To stop them, type make down (this will destroy all volumes to preserve space). Logs from the producer and the consumer are accessible using make logs.

The dataset will be automatically downloaded from a remote server when the container is being built.

Ports 8081 (Flink) and 9000 (Kafdrop) are exposed to the host.

Configuration

Containers come with sensible defaults. If you wish to tinker with any of the parameters, the following can be changed from docker-compose.yml:

ZOOKEEPER_URI specifies the Zookeeper instance that is used
KAFKA_URI specifies the Kafka instance that is used
KAFKA_EVENTS_TOPIC decides to which topics the events are pushed to
KAFKA_TOP_CITIES_TOPIC specifies to which topic the top cities are outputted
KAFKA_MUNICH_HOTSPOTS_TOPIC specifies the topic for hotspots in Munich
KAFKA_PRODUCER_ITERATIONS is the number of times the input file will be pushed to Kafka (note: after EOF is reached, file will be repushed from the start, which will not work with windowed operators).
KAFKA_EVENT_CONSUMER_GROUP is the name of the Flink's consumer group
INPUT_FILE is the path to the input file inside the container
FLINK_PARALLEL is the parallelism level of the pipeline

Data

The dataset is generated based on an actual distribution of meetups on meetup.com.

Pipeline

At the early stages of the processing pipeline, data is first filtered (based on the missing fields) and converted to the appropriate form for further processing.

Later, two distinct processing branches are created:

The first filters out the events in Munich, Germany. Those are then divided into 1km x 1km buckets. Hotspots in Munich are reported based on the count of the events in each bucket (note: aggregation is done in memory for the entire stream so that we don't have to implement the downstream pipeline).
The second reports the most frequent cities for meetups in Europe. Data is aggregated separately for each day and immediately pushed to Kafka.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
eventproc		eventproc
.gitignore		.gitignore
.scalafmt.conf		.scalafmt.conf
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
pipeline.png		pipeline.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flink/Kafka Meetup.com events processor

How to run it

Configuration

Data

Pipeline

About

Releases

Packages

Languages

License

akolar/flink-meetup

Folders and files

Latest commit

History

Repository files navigation

Flink/Kafka Meetup.com events processor

How to run it

Configuration

Data

Pipeline

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages