diff --git a/README.md b/README.md index b0dac04..c482f9a 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,11 @@ This repository contains [OpenTelemetry](https://opentelemetry.io/) servers that Here you can also find implementations of such data sources for a few popular open source softwares and additional tools to use when working with OpenTelemetry data. +**Blend** and **bundle** them to build your own **Observability backends:** +- for batch processing with Apache Spark or Hive +- for real-time analytics with Apache Druid and Apache Superset +- for Machine Learning and AI + You will also find additional tools, examples and demos that might be of service on your own OpenTelemetry journey. > [!TIP] @@ -25,6 +30,7 @@ You will also find additional tools, examples and demos that might be of service - [How OpenTelemetry compares to other telemetry software](#why-you-should-switch-to-opentelemetry) - [Introduction to OpenTelemetry for Developers, Data Engineers and Data Scientists](#opentelemetry-for-developers-data-engineers-and-data-scientists) +- [When and where should you use the code here](#when-and-where-should-you-use-the-software-in-this-repository) - [Software artifacts to:](#artifacts) - [Embed OTLP collectors in Java systems](#embeddable-collectors) - [Save OpenTelemetry to Apache Parquet files](#apache-parquet-stand-alone-server) @@ -77,7 +83,38 @@ If the above sounds convincing - keep reading through this document and explore We have prepared a few Jupyter notebooks that visually explore OpenTelemetry data that we collected from [a demo Astronomy webshop app](https://github.com/mishmash-io/opentelemetry-demos) using the [Apache Parquet Stand-alone server](./server-parquet) contained in this repository. -If you are the sort of person who prefers to learn by looking at **actual data** - start with the [OpenTelemetry Basics Notebook.](./examples/notebooks/basics.ipynb) +> [!TIP] +> If you are the sort of person who prefers to learn by looking at **actual data** - start with the [OpenTelemetry Basics Notebook.](./examples/notebooks/basics.ipynb) + +# When and where should you use the software in this repository + +We, at [mishmsah io,](https://mishmash.io/) have been using OpenTelemetry for quite some time - recording telemetry from experiments, unit and integration tests - to ensure every new release +of software we develop is performing better than the last, and within reasonable computing-resource usage. (More on this [here.](https://mishmash.io/open_source/opentelemetry)) + +> [!TIP] +> OpenTelemetry is great for **monitoring software in production,** but we believe you should adopt it within your **software development process** too. + +Having been through that journey ourselves, we've realised that success depends on strong analytics. OpenTelemetry provides a number of tools to [instrument your code](https://opentelemetry.io/docs/concepts/instrumentation/) to emit signals, and then to compose data transmission pipelines for these signals. And leaves it to you to decide what you ultimately want to do with your signals: where you want to store them depends on how you will work with them. + +You can compose such pipelines for signals transmition using the [OpenTelemetry Collector,](https://opentelemetry.io/docs/collector/) which in turn uses a network protocol called [OTLP.](https://opentelemetry.io/docs/specs/otel/protocol/) At the end - you have to `terminate` the pipelines into an `observability (or OTLP) backend.` + +As a network protocol, OTLP is great at reducing the number of bytes transmitted, keeping the throughput high with minimum overhead. It does this by heavily `nesting` its messages - to avoid +data duplication and take maximum advantage of `dictionary encodings` and data compression. + +On the **analytics side** though - heavily nested structures are not optimal. A simple `count(*)` or +`sum()` query, done over millions of OTLP messages, will have to `unnest` each one of them. Every time you run that query. + +And this is the second reason why we believe you might find the software here useful: + +> [!TIP] +> When doing analytics on your observability data - you need a suitable data schema. +> +> The tools in this repository convert OTLP messages into a 'flatter' schema, that's more suitable +> for analytics. +> +> They preform transformations, **only once** - on **OTLP packet reception,** to minimize the overhead that would otherwise be incurred every time you run an analytics job or query. + +Following are quick introductions of the individual software packages, where you can find more information. > [!TIP] > If you're wondering how to get your first OpenTelemetry data sets - check out [our fork of OpenTelemetry's Demo app.](https://github.com/mishmash-io/opentelemetry-demos)