Skip to content

Commit

Permalink
Merge pull request #42 from mishmash-io/publish-druid
Browse files Browse the repository at this point in the history
Improve druid input format usability, Add README
  • Loading branch information
arusevm authored Oct 15, 2024
2 parents eaa76f9 + 96af17d commit ff52156
Show file tree
Hide file tree
Showing 10 changed files with 709 additions and 36 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ If the above sounds convincing - keep reading through this document and explore

# OpenTelemetry for Developers, Data Engineers and Data Scientists

We have prepared a few Jupyter notebooks that visually explore OpenTelemetry data that we collected from [a demo Astronomy webshop app](https://github.com/mishmash-io/opentelemetry-demo-to-parquet)
We have prepared a few Jupyter notebooks that visually explore OpenTelemetry data that we collected from [a demo Astronomy webshop app](https://github.com/mishmash-io/opentelemetry-demos)
using the [Apache Parquet Stand-alone server](./server-parquet) contained in this repository.

If you are the sort of person who prefers to learn by looking at **actual data** - start with the [OpenTelemetry Basics Notebook.](./examples/notebooks/basics.ipynb)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,8 @@

import org.junit.jupiter.api.Test;

import io.opentelemetry.proto.collector.logs.v1.ExportLogsServiceRequest;
import io.opentelemetry.proto.collector.metrics.v1.ExportMetricsServiceRequest;
import io.opentelemetry.proto.common.v1.KeyValue;
import io.opentelemetry.proto.logs.v1.LogRecord;
import io.opentelemetry.proto.logs.v1.ResourceLogs;
import io.opentelemetry.proto.logs.v1.ScopeLogs;
import io.opentelemetry.proto.metrics.v1.Gauge;
import io.opentelemetry.proto.metrics.v1.Metric;
import io.opentelemetry.proto.metrics.v1.NumberDataPoint;
Expand Down
298 changes: 296 additions & 2 deletions druid-otlp-format/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,301 @@
# Apache Druid Input Format for OpenTelemetry singals
# Apache Druid extension for OpenTelemetry singals ingestion

Coming soon!
This artifact implements an Apache Druid extension that you can use to ingest
[OpenTelemetry](https://opentelemetry.io) signals - `logs`, `metrics`, `traces` and `profiles` - into Apache Druid, and then query Druid through interactive charts and dashboards.

## What is OpenTelemetry?

OpenTelemetry is high-quality, ubiquitous, and portable telemetry that enables effective observability.

It's a newer generation of telemetry systems that builds on the experience gained with earlier
implementations and offers a number of improvements. We, at [mishmash io,](https://mishmash.io) find
OpenTelemetry particularly useful - see
[the reasons here](../README.md#why-you-should-switch-to-opentelemetry), and
[how we use it in our development process here.](https://mishmash.io/open_source/opentelemetry)

Also, make sure you check [the OpenTelemetry official docs.](https://opentelemetry.io/docs/)

## Why Apache Druid?

Apache Druid is a high performance, real-time analytics database that delivers sub-second queries on streaming and batch data at scale and under load. It performs particularly well on `timestamped`
data, and OpenTelemetry signals are heavily timestamped. See [Apache Druid Use Cases](https://druid.apache.org/use-cases) for more.

# Quick introduction

To get an idea of why and when to use this Druid extension - here is an example setup:

1. Setup [Apache Kafka.](https://kafka.apache.org/)
2. Get the [OpenTelemetry collector](https://opentelemetry.io/docs/collector/) and configure it
to export data to Kafka. Use the [OpenTelemetry Kafka Exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/kafkaexporter/README.md) provided by OpenTelemetry.
3. Setup Apache Druid with this artifact as an extension.
4. Inside Druid, configure [Kafka Streaming Ingestions](https://druid.apache.org/docs/latest/ingestion/streaming/) for `logs`, `metrics` and `traces`.

In the ingestion specs set the `InputFormat` to this extension (more on this below).

***Note:*** OpenTelemetry `profiles` signal is still in development and is not supported
by the Kafka Exporter.
5. Setup [Apache Superset](https://superset.apache.org/) with a [Druid database driver.](https://superset.apache.org/docs/configuration/databases#apache-druid)
6. Explore your telemetry in Superset!

> [!TIP]
> We have prepared a clone of [OpenTelemetry's demo app](https://opentelemetry.io/docs/demo/) with
> the exact same setup as above.
>
> It's quicker to run (no configuration needed) and will also generate telemetry to populate your
> Druid tables.
>
> Get our [Druid ingestion demo app fork here.](https://github.com/mishmash-io/opentelemetry-demos)
> [!NOTE]
> There are more ways of ingesting OpenTelemetry data into Apache Druid.
>
> Watch [this repository](https://github.com/mishmash-io/opentelemetry-server-embedded) for updates
> as we continue to publish our internal OpenTelemetry-related code and add more examples!
# Installation

In order to use this extension you need to install it inside your Druid nodes. This is done
through the typical [extension-loading process](https://druid.apache.org/docs/latest/configuration/extensions/):

1. Pull the extension:
```bash
cd /opt/druid && \
bin/run-java \
-classpath "lib/*" org.apache.druid.cli.Main tools pull-deps \
--no-default-hadoop -c "io.mishmash.opentelemetry:druid-otlp-format:1.1.3"
```
2. Enable it in the Druid configuration:
```
druid.extensions.loadList=[<existing extensions>, "druid-otlp-format"]
```

That's all! Now launch your Druid instances and you're ready to ingest OpenTelemetry data!

## Using a Docker image

We've prepared an [Apache Druid for OpenTelemetry Docker image](https://hub.docker.com/repository/docker/mishmashio/druid-for-opentelemetry) that lets you skip the installation steps above and just
launch a Druid cluster with our OpenTelemetry extenion preinstalled.

It is based on the [official image](https://druid.apache.org/docs/latest/tutorials/docker/) and is
configured and launched the same way. Just replace the tags with `mishmashio/druid-for-opentelemetry` in your `docker-compose.yaml.`

Don't forget to enable the extension when launching!

## Building your own Docker image

If you're building your own, custom Druid image - just include the `pull-deps` installation step
above.

***Note:*** The `pull-deps` command needs to open a number of files simultaneously, and the maximum
number of open files inside a container might be limited on your system. On some Linux distributions
`podman`, for example, has limits that are too low.

If this turns out to be the case on your computer, raise the `number of open files` limit during the
build:

```sh
podman build --ulimit nofile=65535:65535 -f Dockerfile ..
```

# Configuring ingestion jobs

Once the `druid-otlp-format` extension is loaded, you can set Druid to start loading OpenTelemetry
data.

To create a data import task you need to provide Druid with an `ingestion spec`, detailing, among
other things - where it should look for data, what the data format is, etc.

To apply this extension to your data source - write an `ingestion spec` the way you normally would,
just set the ***InputFormat*** section to:

- when ingesting `logs` signals:
- when produced by the **OpenTelemetry Kafka Exporter:**
```json
{
...
"spec": {
"ioConfig": {
...
"inputFormat": {
"type": "otlp",
"otlpInputSignal": "logsRaw"
},
...
},
...
}
...
}
```
- when produced by other modules published by **mishmash io:**
```json
{
...
"spec": {
"ioConfig": {
...
"inputFormat": {
"type": "otlp",
"otlpInputSignal": "logsFlat"
},
...
},
...
}
...
}
```
- when ingesting `metrics` signals:
- when produced by the **OpenTelemetry Kafka Exporter:**
```json
{
...
"spec": {
"ioConfig": {
...
"inputFormat": {
"type": "otlp",
"otlpInputSignal": "metricsRaw"
},
...
},
...
}
...
}
```
- when produced by other modules published by **mishmash io:**
```json
{
...
"spec": {
"ioConfig": {
...
"inputFormat": {
"type": "otlp",
"otlpInputSignal": "metricsFlat"
},
...
},
...
}
...
}
```
- when ingesting `traces` signals:
- when produced by the **OpenTelemetry Kafka Exporter:**
```json
{
...
"spec": {
"ioConfig": {
...
"inputFormat": {
"type": "otlp",
"otlpInputSignal": "tracesRaw"
},
...
},
...
}
...
}
```
- when produced by other modules published by **mishmash io:**
```json
{
...
"spec": {
"ioConfig": {
...
"inputFormat": {
"type": "otlp",
"otlpInputSignal": "tracesFlat"
},
...
},
...
}
...
}
```
- when ingesting `profiles` signals:
- when produced by the **OpenTelemetry Kafka Exporter:**

> `Profiles` are not yet supported by the Kafka Exporter
- when produced by other modules published by **mishmash io:**
```json
{
...
"spec": {
"ioConfig": {
...
"inputFormat": {
"type": "otlp",
"otlpInputSignal": "profilesFlat"
},
...
},
...
}
...
}
```

An example for a `logs` ingestion spec loading data produced by the OpenTelemetry Kafka Exporter might look like this:

```json
{
"type": "kafka",
"spec": {
"ioConfig": {
"type": "kafka",
"consumerProperties": {
"bootstrap.servers": "localhost:9092"
},
"topic": "otlp_logs",
"inputFormat": {
"type": "otlp",
"otlpInputSignal": "logsRaw"
},
"useEarliestOffset": true
},
"tuningConfig": {
"type": "kafka"
},
"dataSchema": {
...
},
"granularitySpec": {
...
}
}
}
}
```

If you're not very familiar with `ingestion specs` - take a look at [Apache Druid Kafka tutorial](https://druid.apache.org/docs/latest/tutorials/tutorial-kafka) to get started.

## Using Druid GUI console

Druid's GUI console provides an interactive way for you to configure your ingestion jobs, without
actually writing JSON specs.

Unfortunately, at the moment we do not support configuring the OTLP Input Format extension through
the GUI console.

If you feel a bit brave - you can still use the GUI, with a little 'hack' - when you get to the
`Parse data` step on the GUI wizard - switch to the `Edit Spec` step where you'll see the JSON
that is being prepared. It has the same format as above and you can paste the correct `inputFormat`
parameters (take the correct config from above). Once you do that - switch back to the `Parse data`
tab and voila!

# OpenTelemetry at mishmash io

OpenTelemetry's main intent is the observability of production environments, but at [mishmash io](https://mishmash.io) it is part of our software development process. By saving telemetry from **experiments** and **tests** of
our own algorithms we ensure things like **performance** and **resource usage** of our distributed database, continuously and across releases.

We believe that adopting OpenTelemetry as a software development tool might be useful to you too, which is why we decided to open-source the tools we've built.

Learn more about the broader set of [OpenTelemetry-related activities](https://mishmash.io/open_source/opentelemetry) at
[mishmash io](https://mishmash.io/) and `follow` [GitHub profile](https://github.com/mishmash-io) for updates and new releases.
Loading

0 comments on commit ff52156

Please sign in to comment.