Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update druid otlp docs #46

Merged
merged 6 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 25 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
# OpenTelemetry Data Sources for Java
# OpenTelemetry Data Sources for Java (and other)

This repository contains [OpenTelemetry](https://opentelemetry.io/) servers that can be embedded into other Java-based systems to act as data sources for `logs`, `metrics`, `traces` and `profiles` signals.

Here you can also find implementations of such data sources for a few popular open source softwares and additional tools to use when working with OpenTelemetry data.

You will also find additional tools, examples and demos that might be of service on your own OpenTelemetry journey.

> [!TIP]
> This is a public release of code we have accumulated internally over time and so far contains only a limited subset of what we intend to share.
>
> Examples of internal software that will be published here in the near future include:
>
> - A small OTLP server based on [Apache BookKeeper](https://bookkeeper.apache.org/) for improved
> data ingestion reliability, even across node failures
> - [Apache Superset](https://superset.apache.org/) charts and dashboards for OpenTelemetry
> visualizations
> - OpenTelemetry Data Sources for [Apache Pulsar](https://pulsar.apache.org/) for when more
> more complex preprocessing is needed
> - Our [Testcontainers](https://testcontainers.com/) implementations that you can use to
Expand All @@ -29,6 +29,7 @@ Here you can also find implementations of such data sources for a few popular op
- [Embed OTLP collectors in Java systems](#embeddable-collectors)
- [Save OpenTelemetry to Apache Parquet files](#apache-parquet-stand-alone-server)
- [Ingest OpenTelemetry into Apache Druid](#apache-druid-otlp-input-format)
- [Visualize OpenTelemetry with Apache Superset](#apache-superset-charts-and-dashboards)
- [More about OpenTelemetry at mishmash io](#opentelemetry-at-mishmash-io)

# Why you should switch to OpenTelemetry
Expand Down Expand Up @@ -78,6 +79,13 @@ using the [Apache Parquet Stand-alone server](./server-parquet) contained in thi

If you are the sort of person who prefers to learn by looking at **actual data** - start with the [OpenTelemetry Basics Notebook.](./examples/notebooks/basics.ipynb)

> [!TIP]
> If you're wondering how to get your first OpenTelemetry data sets - check out [our fork of OpenTelemetry's Demo app.](https://github.com/mishmash-io/opentelemetry-demos)
>
> In there you will find complete deployments that will generate signals, save them and let you play with the data - by writing your own notebooks or creating
> Apache Superset dashboards.
>

# Artifacts

## Embeddable collectors
Expand All @@ -94,6 +102,7 @@ It is not intended for production use, but rather as a quick tool to save and ex
Parquet files as saved by this Stand-alone server.
- [README](./server-parquet)
- [Javadoc on javadoc.io](https://javadoc.io/doc/io.mishmash.opentelemetry/server-parquet)
- [Quick deployment with a demo app](https://github.com/mishmash-io/opentelemetry-demos)

## Apache Druid OTLP Input Format

Expand All @@ -112,7 +121,20 @@ like with [Apache BookKeeper](https://bookkeeper.apache.org) or [Apache Pulsar](
Find out more about the OTLP Input Format for Apache Druid:
- [README](./druid-otlp-format)
- [Javadoc on javadoc.io](https://javadoc.io/doc/io.mishmash.opentelemetry/druid-otlp-format)
- [Quick deployment with a demo app and Apache Superset](https://github.com/mishmash-io/opentelemetry-demos)

## Apache Superset charts and dashboards

![superset-dashboard](https://github.com/user-attachments/assets/8dba1e13-bcb3-41c9-ac40-0c023a3825c8)

[Apache Superset](https://superset.apache.org/) is an open-source modern data exploration and visualization platform.

You can use its rich visualizations, no-code viz builder and its powerful SQL IDE to build your own OpenTelemetry analytics.

To get you started, we're publishing [data sources and visualizations](./superset-visualizations) that you can import into Apache Superset.

- [Quick deployment with a demo app](https://github.com/mishmash-io/opentelemetry-demos)

# OpenTelemetry at mishmash io

OpenTelemetry's main intent is the observability of production environments, but at [mishmash io](https://mishmash.io) it is part of our software development process. By saving telemetry from **experiments** and **tests** of
Expand Down
163 changes: 163 additions & 0 deletions druid-otlp-format/DRUID_GUI_SETUP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# Using the Apache Druid GUI to setup OpenTelemetry ingestion

This is a short, visual guide on how to use the [Apache Druid] GUI console to setup OpenTelemetry ingestion jobs using the
[druid-otlp-format extension](./README.md), developed by [mishmash io](https://mishmash.io).

To find more about the open source software contained in this repository - [click here.](../)

# Intro

At the time of writing the Druid GUI console does not directly allow configuring OTLP ingestion jobs. However, with a little 'hack',
it is possible.

Follow the steps below.

> [!WARNING]
> The guide here assumes you will be ingesting OpenTelemetry data published to Apache Kafka by the [Kafka Exporter.](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/kafkaexporter/README.md)
>
> In other setups the steps might be different.
>

# Walk-through

#### 1. Open the console

Open the Druid GUI, select `Load data` from the top menu and then click `Streaming` in the dropdown:

![druid-step-1](https://github.com/user-attachments/assets/e0e6ed38-18da-4d99-b4f9-5c22f2510386)

#### 2. Start a new streaming spec

Click on the `Start a new streaming spec`:

![druid-step-2](https://github.com/user-attachments/assets/38b54817-d126-4bab-92c6-5dbae6c169e7)

#### 3. Select the source

When you start a new streaming spec you're given a choice to select where the data should come from. Click `Apache Kafka`:

![druid-step-3](https://github.com/user-attachments/assets/c9c16edc-b1b0-4478-be76-78b9377c1623)

#### 4. Connect to Kafka

To connect Druid to Kafka - enter your Kafka brokers and topic name in the right panel:

![druid-step-4](https://github.com/user-attachments/assets/06d63477-e188-44b2-be39-699c8e4bede0)

Click `Apply` when done. This will trigger Druid to connect and in a little while (if the connection was successful), Druid will
show some examples of what it read from the configured Kafka topic:

![druid-step-5](https://github.com/user-attachments/assets/c3280ab8-6372-46a3-aa81-f8b6c0b6a845)

At this step - the data will be messy - Druid doesn't yet know how to interpret it. It's okay, click the `Next: Parse data`
button in the right panel.

#### 5. Apply the OTLP format

Now, you're given a choice of what `Input format` to apply on the incoming data:

![druid-step-6](https://github.com/user-attachments/assets/c870c29a-2199-4b03-801b-ffd38dbea8d4)

Here's the tricky part - the `otlp` format is not available inside the `Input format` dropdown. You'll have to edit a bit of JSON
in order to get past this step.

On the bar just under the top menu, click on the final step - `Edit spec`. A JSON representation of the ingestion spec will be
shown to you. Edit the `inputFormat` JSON object:

![druid-step-7](https://github.com/user-attachments/assets/3429a010-5fa3-407d-ae3e-02f0eff1b8a7)

Use one of the following JSONs:
- when ingesting `logs`:
```json
...
"inputFormat": {
"type": "otlp",
"otlpInputSignal": "logsRaw"
}
...
```
- when ingesting `metrics`:
```json
...
"inputFormat": {
"type": "otlp",
"otlpInputSignal": "metricsRaw"
}
...
```
- when ingesting `traces`:
```json
...
"inputFormat": {
"type": "otlp",
"otlpInputSignal": "tracesRaw"
}
...
```

When done editing, just get back to the `Parse data` step, you don't need to click anywhere to confirm the changes you made.

Now, Druid will re-read the sampled Kafka messages and will parse them:

![druid-step-8](https://github.com/user-attachments/assets/d015b715-da30-4e15-bd06-861e1879350d)

At this point - click the `Next: Parse time` button in the right panel to continue.

#### 6. Configure the data schema

Now, Druid needs to know a few more things about how you'd like your data to be organized.

Select which column will be used for its **time-partitioning.** For example, click on the `time_unix_nano` column header,
make sure it shows in the right bar, and that `Format` says `nano`. Then click 'Apply`:

![druid-step-9](https://github.com/user-attachments/assets/016d6c6d-69d9-4fce-b63c-3b9319bb8103)

When done, click `Next: Transform`:

![druid-step-10](https://github.com/user-attachments/assets/6c166d4a-65b8-41cb-b69d-492a17a665a8)

Here you can configure optional transformations of the data before saving it inside Druid. Continue to the next step - `Filter`.

![druid-step-11](https://github.com/user-attachments/assets/879df0e6-834a-4cb2-b95a-61d94a5b1947)

Filtering is also an optional step. Continue to the `Configure schema` step.

Here you can edit the table schema. In the bar on the right - turn off the `Explicitly specify schema` radio button:

![druid-step-12pre](https://github.com/user-attachments/assets/7c10f38d-4ae9-4e46-82bd-51dd25ac5645)

Doing this will trigger a dialog to pop up, just confirm by clicking `Yes - auto detect schema`:

![druid-step-12](https://github.com/user-attachments/assets/828a8c9a-ff25-4821-bb7c-fe789eb04000)

Then proceed to the `Partition` step. Select a `Primary partitioning (by time)`. In this example, we're setting it to `hour`
for hourly partitions:

![druid-step-13](https://github.com/user-attachments/assets/6c6d1492-63fb-4188-baf4-b9b00eec53ce)

Continue to `Tune`:

![druid-step-14](https://github.com/user-attachments/assets/a67669ec-b7c4-4d02-bba5-3a1039de48ff)

Set the `Use earliest offset` to `True` and continue to `Publish`:

![druid-step-15](https://github.com/user-attachments/assets/1d49b0c0-02e0-487a-8bef-1bc75e0b126f)

See if you would like to rename the `Datasource name` or any of the other options, but the defaults should be okay. Move to the
next step - `Edit spec`:

![druid-step-16](https://github.com/user-attachments/assets/e8628c83-ec73-4cc7-9b90-c62d76ab8fba)

This is the same step we went to earlier in order to set the `InputFormat` to `otlp`. This time - there's nothing to do here,
so, just click `Submit` in the right panel and that's it!

At the end, you should get a screen similar to this:

![druid-step-17](https://github.com/user-attachments/assets/330b2629-9cbc-419c-b29c-99fdf8e6aaff)

##### Congratulations! :)

You can now start querying OpenTelemetry data! :)



6 changes: 6 additions & 0 deletions druid-otlp-format/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Apache Druid extension for OpenTelemetry singals ingestion

![druid-otlp-ingestion](https://github.com/user-attachments/assets/1b6d064e-7335-4365-a694-8cc2eebf1348)

This artifact implements an Apache Druid extension that you can use to ingest
[OpenTelemetry](https://opentelemetry.io) signals - `logs`, `metrics`, `traces` and `profiles` - into Apache Druid, and then query Druid through interactive charts and dashboards.

Expand Down Expand Up @@ -37,6 +39,8 @@ To get an idea of why and when to use this Druid extension - here is an example
5. Setup [Apache Superset](https://superset.apache.org/) with a [Druid database driver.](https://superset.apache.org/docs/configuration/databases#apache-druid)
6. Explore your telemetry in Superset!

![superset-dashboard](https://github.com/user-attachments/assets/8dba1e13-bcb3-41c9-ac40-0c023a3825c8)

> [!TIP]
> We have prepared a clone of [OpenTelemetry's demo app](https://opentelemetry.io/docs/demo/) with
> the exact same setup as above.
Expand Down Expand Up @@ -290,6 +294,8 @@ that is being prepared. It has the same format as above and you can paste the co
parameters (take the correct config from above). Once you do that - switch back to the `Parse data`
tab and voila!

See a [step by step Druid GUI console guide here.](./DRUID_GUI_SETUP.md)

# OpenTelemetry at mishmash io

OpenTelemetry's main intent is the observability of production environments, but at [mishmash io](https://mishmash.io) it is part of our software development process. By saving telemetry from **experiments** and **tests** of
Expand Down
Loading