Dockerized Apache Druid cluster

This project demostrates how you can setup a Dockerized example/development Apache Druid cluster.

The cluster is being composed of the following components:

S3 Compatible Object Storage MinIO for Deep storage
PostgreSQL for metadata storage
Zookeeper for internal service discovery, coordination, and leader election
Apache Druid platform:
- Middle Manager to handle the ingestion of data into the cluster
- Historical to handle the storage and querying on “historical” data
- Broker to receive queries from external clients
- Coordinator to assign segments to Historical nodes
- Overlord to assign ingestion tasks to Middle Managers and to coordinate segment publishing
- Router provides a unified API gateway in front of Brokers, Overlords and Coordinators

Instructions to build Druid image

make image

or by using docker-compose

docker-compose build

You can also specify the version of Druid to build, for example:

make DRUID_VERSION=0.14.1-incubating image

or by using docker-compose

docker-compose build --build-arg ARG_DRUID_VERSION=0.14.1-incubating

Run the cluster

docker-compose up

or to run in the backgroumd:

docker-compose up -d

After a while the Druid console should be available in http://localhost:8888

Load example data

For example data we are using a subset of the NYC Taxi & Limousine Commission - Trip Record Data, specifically from months 2015-01 to 2015-03.

cd dataset
./03-load_to_druid.sh

Please note that you can download data for different months and adjust the sample size by adjusting the parameters of ./dataset/01-download.sh and ./dataset/02-create_sample_tripdata.sh.

The schema of the dataset and the indexing task is being defined in ./dataset/yellow_tripdata-index.json

...enjoy :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dockerized Apache Druid cluster

Instructions to build Druid image

Run the cluster

Load example data

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dockerized Apache Druid cluster

Instructions to build Druid image

Run the cluster

Load example data