Updating docs (#50)

* Updating main README
mishmash-io · Oct 22, 2024 · 952b258 · 952b258
1 parent 5398d46
commit 952b258
Showing 1 changed file with 38 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -4,6 +4,11 @@ This repository contains [OpenTelemetry](https://opentelemetry.io/) servers that
 
 Here you can also find implementations of such data sources for a few popular open source softwares and additional tools to use when working with OpenTelemetry data.
 
+**Blend** and **bundle** them to build your own **Observability backends:**
+- for batch processing with Apache Spark or Hive
+- for real-time analytics with Apache Druid and Apache Superset
+- for Machine Learning and AI
+
 You will also find additional tools, examples and demos that might be of service on your own OpenTelemetry journey.
 
 > [!TIP]
@@ -25,6 +30,7 @@ You will also find additional tools, examples and demos that might be of service
 
 - [How OpenTelemetry compares to other telemetry software](#why-you-should-switch-to-opentelemetry)
 - [Introduction to OpenTelemetry for Developers, Data Engineers and Data Scientists](#opentelemetry-for-developers-data-engineers-and-data-scientists)
+- [When and where should you use the code here](#when-and-where-should-you-use-the-software-in-this-repository)
 - [Software artifacts to:](#artifacts)
   - [Embed OTLP collectors in Java systems](#embeddable-collectors)
   - [Save OpenTelemetry to Apache Parquet files](#apache-parquet-stand-alone-server)
@@ -77,7 +83,38 @@ If the above sounds convincing - keep reading through this document and explore
 We have prepared a few Jupyter notebooks that visually explore OpenTelemetry data that we collected from [a demo Astronomy webshop app](https://github.com/mishmash-io/opentelemetry-demos)
 using the [Apache Parquet Stand-alone server](./server-parquet) contained in this repository.
 
-If you are the sort of person who prefers to learn by looking at **actual data** - start with the [OpenTelemetry Basics Notebook.](./examples/notebooks/basics.ipynb)
+> [!TIP]
+> If you are the sort of person who prefers to learn by looking at **actual data** - start with the [OpenTelemetry Basics Notebook.](./examples/notebooks/basics.ipynb)
+
+# When and where should you use the software in this repository
+
+We, at [mishmsah io,](https://mishmash.io/) have been using OpenTelemetry for quite some time - recording telemetry from experiments, unit and integration tests - to ensure every new release
+of software we develop is performing better than the last, and within reasonable computing-resource usage. (More on this [here.](https://mishmash.io/open_source/opentelemetry))
+
+> [!TIP]
+> OpenTelemetry is great for **monitoring software in production,** but we believe you should adopt it within your **software development process** too.
+
+Having been through that journey ourselves, we've realised that success depends on strong analytics. OpenTelemetry provides a number of tools to [instrument your code](https://opentelemetry.io/docs/concepts/instrumentation/) to emit signals, and then to compose data transmission pipelines for these signals. And leaves it to you to decide what you ultimately want to do with your signals: where you want to store them depends on how you will work with them.
+
+You can compose such pipelines for signals transmition using the [OpenTelemetry Collector,](https://opentelemetry.io/docs/collector/) which in turn uses a network protocol called [OTLP.](https://opentelemetry.io/docs/specs/otel/protocol/) At the end - you have to `terminate` the pipelines into an `observability (or OTLP) backend.`
+
+As a network protocol, OTLP is great at reducing the number of bytes transmitted, keeping the throughput high with minimum overhead. It does this by heavily `nesting` its messages - to avoid
+data duplication and take maximum advantage of `dictionary encodings` and data compression.
+
+On the **analytics side** though - heavily nested structures are not optimal. A simple `count(*)` or
+`sum()` query, done over millions of OTLP messages, will have to `unnest` each one of them. Every time you run that query.
+
+And this is the second reason why we believe you might find the software here useful:
+
+> [!TIP]
+> When doing analytics on your observability data - you need a suitable data schema.
+>
+> The tools in this repository convert OTLP messages into a 'flatter' schema, that's more suitable
+> for analytics.
+>
+> They preform transformations, **only once** - on **OTLP packet reception,** to minimize the overhead that would otherwise be incurred every time you run an analytics job or query.
+
+Following are quick introductions of the individual software packages, where you can find more information.
 
 > [!TIP]
 > If you're wondering how to get your first OpenTelemetry data sets - check out [our fork of OpenTelemetry's Demo app.](https://github.com/mishmash-io/opentelemetry-demos)