Stackable Operator for Apache Druid

GitHub {external-link-icon}
Feature Tracker {external-link-icon}
{crd-docs-base-url}/druid-operator/{crd-docs-version}/[CRD documentation {external-link-icon}^]

The Stackable operator for Apache Druid deploys and manages Apache Druid clusters on Kubernetes. Apache Druid is an open-source, distributed data store designed to quickly process large amounts of data in real-time. It enables users to ingest, store, and query massive amounts of data in real-time, a great tool for handling high-volume data processing and analysis. This operator provides several resources and features to manage Druid clusters efficiently.

Getting started

To get started with the Stackable operator for Apache Druid, follow the Getting Started guide. It guides you through the installation of the operator and its dependencies (ZooKeeper, HDFS, an SQL database) and the steps to query your first sample data.

Resources

The operator is installed along with the DruidCluster CustomResourceDefinition, which supports five roles: Router, Coordinator, Broker, MiddleManager and Historical. These roles correspond to Druid processes.

The operator watches DruidCluster objects and creates multiple Kubernetes resources for each DruidCluster based on its configuration.

A diagram depicting the Kubernetes resources created by the operator

For every RoleGroup a StatefulSet is created. Each StatefulSet can contain multiple replicas (Pods). Each Pod has at least two containers: the main Druid container and a preparation container which just runs once at startup. If usage-guide/logging.adoc is enabled, there is a sidecar container for logging too. For every Role and RoleGroup the operator creates a Service.

A ConfigMap is created for each RoleGroup containing 3 files: jvm.config and runtime.properties files generated from the DruidCluster configuration (See usage-guide/index.adoc for more information), plus a log4j2.properties file used for usage-guide/logging.adoc. For the whole DruidCluster a discovery ConfigMap is created which contains information on how to connect to the Druid cluster.

Dependencies and other operators to connect to

The Druid operator has the following dependencies:

A deep storage backend is required to persist data. Use either HDFS with the hdfs:index.adoc or S3.
An SQL database to store metadata.
Apache ZooKeeper via the zookeeper:index.adoc. Apache ZooKeeper is used by Druid for internal communication between processes.
The commons-operator:index.adoc provides common CRDs such as concepts:s3.adoc CRDs.
The secret-operator:index.adoc is required for things like S3 access credentials or LDAP integration.
The listener-operator:index.adoc exposes the pods to the outside network.

Have a look at the getting started guide for an example of a minimal working setup.

The getting started guide sets up a fully working Druid cluster, but the S3 deep storage backend as well as the metadata SQL database are required external components and need to be set up by you as prerequisites for a production setup.

Druid works well with other Stackable supported products, such as Apache Kafka for data ingestion Trino for data processing or Apache Superset for data visualization. OPA can be connected to create authorization policies. Have a look at the usage-guide/index.adoc for more configuration options and have a look at the demos for complete data pipelines you can install with a single command.

Demos

management:stackablectl:index.adoc supports installing demos:index.adoc with a single command. The demos are complete data piplines which showcase multiple components of the Stackable platform working together and which you can try out interactively. Both demos below include Druid as part of the data pipeline:

Waterlevel demo

The demos:nifi-kafka-druid-water-level-data.adoc demo uses data from PEGELONLINE to visualize water levels in rivers and coastal regions of Germany from historic and real time data.

Earthquake demo

The demos:nifi-kafka-druid-earthquake-data.adoc demo ingests earthquake data into a similar pipeline as is used in the waterlevel demo.

Supported versions

The Stackable operator for Apache Druid currently supports the Druid versions listed below. To use a specific Druid version in your Druid Stacklet, you have to specify an image — this is explained in the concepts:product-image-selection.adoc documentation. The operator also supports running images from a custom registry or running entirely customized images; both of these cases are explained under concepts:product-image-selection.adoc as well.

partial$supported-versions.adoc

Useful links

The druid-operator {external-link-icon} GitHub repository
The operator feature overview in the feature tracker {external-link-icon}
The {crd-docs}/druid.stackable.tech/druidcluster/v1alpha1/[DruidCluster {external-link-icon}^] CRD documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.adoc

index.adoc

Stackable Operator for Apache Druid

Getting started

Resources

Dependencies and other operators to connect to

Demos

Waterlevel demo

Earthquake demo

Supported versions

Useful links

Files

index.adoc

Latest commit

History

index.adoc

File metadata and controls

Stackable Operator for Apache Druid

Getting started

Resources

Dependencies and other operators to connect to

Demos

Waterlevel demo

Earthquake demo

Supported versions

Useful links