[RFC] Observability indices naming standards & routing #1405

YANG-DB · 2023-02-14T00:54:11Z

Problem statement

This document describes the index naming standard for ingestion of Observability signals - Traces, Metrics, Logs.
Currently there is no single coherent pattern to use for all Observability signals and potential data sources.

For example - data-prepper use their own index naming and structure to ingest Observability signals.

data-prepper Indices:

Traces data: otel-v1-apm-span-** (Observability Trace mapping)
Supplement: otel-v1-apm-service-map (Proprietary Index Mapping)

The same goes for jaeger trace data type:

Traces data: jaeger-span* (Observability Trace mapping)

This convention is also harder to manage regarding the index revolving for lifecycle management - this would be optimized using the data_stream layer supported by OpenSearch API.

Today due to different index structure and non-standard naming patterns we cant create crosscutting queries that will correlate or aggregate information on top of different Observability data providers.

Proposal

We would use the next structure and naming patterns based on the following conventions :

Add data_stream support for all Observability based standard indices
Use a standard Observability signals naming index conventions
Create customer domain naming degree of freedom to allow arbitrary names for specific customer use-cases
Move the Observability Indices Template creation into Observability Plugin bootstrap

Using the data_stream will encourage simple physical index management and query - each Observability index would actually be a data_stream:

A typical workflow to manage time-series data involves multiple steps, such as creating a rollover index alias, defining a write index, and defining common mappings and settings for the backing indices.

Data streams simplify this process and enforce a setup that best suits time-series data, such as being designed primarily for append-only data and ensuring that each document has a timestamp field.

A data stream is internally composed of multiple backing indices. Search requests are routed to all the backing indices, while indexing requests are routed to the latest write index

Consolidating data using the data_stream concepts patterns and catalog. The next Observability index pattern will be followed:

Index pattern will follow the next naming structure {type}-{dataset}-{namespace}

type - indicated the observability high level types "logs", "metrics", "traces" (prefixed by the sso_ schema convention )
dataset - The field can contain anything that classify the source of the data - such as nginx.access.
namespace - A user defined namespace. Mainly useful to allow grouping of data such as production grade, geography classification

The sso_{type}-{dataset}-{namespace} Pattern address the capability of differentiation of similar information structure to different indices accordingly to customer strategy.

This strategy will be defined by the two degrees of naming freedom: dataset and namespace

For example a customer may want to route the nginx logs from two geographical areas into two different indices:

sso_logs-nginx-us
sso_logs-nginx-eu

This type of distinction also allows for creation of crosscutting queries by setting the next index query pattern sso_logs-nginx-* or by using a geographic based crosscutting query sso_logs-*-eu.

Data index routing

The ingestion component which is responsible for ingesting the Observability signals should route the data into the relevant indices.
The sso_{type}-{dataset}-{namespace} combination dictates the target index, {type} is prefixed with the sso_ prefix into one of the supported type:

Traces - sso_traces
Metrics - sso_metrics
Logs - sso_logs

For example if within the ingested log contains the following section:

{
  ...
  "attributes": {
    "data_stream": {
      "type": "span",
      "dataset": "mysql",
      "namespace": "prod"
    }
  }
}

This indicates that the target index for this observability signal should be sso_traces-mysql-prod index that follows uses the traces schema mapping.

Observability Index templates

With the expectation of multiple Observability data providers and the need to consolidate all to a single common schema - the Observability plugin will take the following responsibilities :

Define and create all the signals index templates upon loading
Publish a versioned schema file (Json Schema) for each signal type for general validation usage by any 3rd party

Note

It is important to mention here that these new capabilities would not change or prevent existing customer usage of the system and continue to allow proprietary usage.

In details

Logs Schema
see - #1403

Traces Schema
see - #1395

Metrics Schema
see - #1397

What alternatives have you considered?
A clear and concise description of any alternative solutions or features you've considered.

Note

Important to mention here that this new suggestion would not change or prevent existing customer usage of the system and continue to allow proprietary usage.

Do you have any additional context?
see opensearch-project/OpenSearch-Dashboards#3412
see https://opensearch.org/docs/latest/opensearch/data-streams/
see https://github.com/opensearch-project/data-prepper

The text was updated successfully, but these errors were encountered:

ryn9 · 2023-03-01T23:36:44Z

As I otherwise mentioned in opensearch-project/OpenSearch-Dashboards#3412....

Regarding index naming, I would like to see the naming structure updated to account for 'tenant' and 'version'.
Where:
tenant - name of client/tenant/usergroup whom should have access to the data *
version - version of the schema in use **

* Often RBAC rules are written in a fashion that clients, tenants, usergroups get access to a set of indexes matching NAME-*. namespace, especially where it would be placed in the index name, could be used in this fashion, but it would make writing RBAC rules a harder and I believe serves a slightly different function than what I am proposing

** As the schema will evolve it will be important to have versioning in the naming schema, which should be coupled with standard mapping definitions. As the schema evolves - so too would the mappings.

As such a complete index (or per the proposal data-stream) name would be:
{tenant}-sso_{type}-{dataset}-{namespace}-{version}

## Description: <Describe what has changed.>  Implementation of exporter to OpenSearch using opensearch-go library. As of now, this PR was heavily inspired by https://github.com/dbason/opentelemetry-collector-contrib/tree/opensearch-exporter/exporter/opensearchexporter. By default, requests sent adhere to the OpenSearch Catalog [schema for logs](https://github.com/opensearch-project/opensearch-catalog/tree/main/schema/observability/logs), but allows users to export using the Elastic Common Schema as well. This PR also: - enables users to define the `bulk_action` between `create` and `index` - enables users to define the logs index without necessarily adhering to the new [index naming conventions](opensearch-project/observability#1405) through the `LogsIndex` config. ## Tracking Issue: [23611](#23611) ## Testing: <Describe what testing was performed and which tests were added.> ### Integration - Successful round-trip to HTTP endpoint, - Permanent error during round-trip, - Retryable error response for first request, followed by successful response on retry, - Two retriable error responses, followed by successful response on second retry. ### Manual - Authentication using `configtls.TLSSetting` (`ca_file`, `cert_file`, `key_file`) - Tested in EKS and K3s clusters running [opni](https://github.com/rancher/opni). --------- Signed-off-by: João Henri <[email protected]> Signed-off-by: João Henri <[email protected]>

## Description: <Describe what has changed.>  Implementation of exporter to OpenSearch using opensearch-go library. As of now, this PR was heavily inspired by https://github.com/dbason/opentelemetry-collector-contrib/tree/opensearch-exporter/exporter/opensearchexporter. By default, requests sent adhere to the OpenSearch Catalog [schema for logs](https://github.com/opensearch-project/opensearch-catalog/tree/main/schema/observability/logs), but allows users to export using the Elastic Common Schema as well. This PR also: - enables users to define the `bulk_action` between `create` and `index` - enables users to define the logs index without necessarily adhering to the new [index naming conventions](opensearch-project/observability#1405) through the `LogsIndex` config. ## Tracking Issue: [23611](open-telemetry#23611) ## Testing: <Describe what testing was performed and which tests were added.> ### Integration - Successful round-trip to HTTP endpoint, - Permanent error during round-trip, - Retryable error response for first request, followed by successful response on retry, - Two retriable error responses, followed by successful response on second retry. ### Manual - Authentication using `configtls.TLSSetting` (`ca_file`, `cert_file`, `key_file`) - Tested in EKS and K3s clusters running [opni](https://github.com/rancher/opni). --------- Signed-off-by: João Henri <[email protected]> Signed-off-by: João Henri <[email protected]>

YANG-DB added enhancement New feature or request untriaged labels Feb 14, 2023

This was referenced Feb 14, 2023

[RFC] Integrations Design opensearch-project/OpenSearch-Dashboards#3412

Open

Support otel metrics mapping #1397

Closed

YANG-DB added documentation Improvements or additions to documentation design and removed untriaged labels Feb 15, 2023

This was referenced Feb 16, 2023

[FEATURE] Add First Integration into Observability #1411

Closed

Add documentation for Simple Schema opensearch-project/documentation-website#2940

Merged

Support sso metrics & traces schema #1427

Merged

Metrics traces sso schema support #1429

Closed

YANG-DB added the integration Integration project label Mar 7, 2023

YANG-DB added this to Observability 2023 and Integration Mar 7, 2023

YANG-DB moved this to In Progress in Integration Mar 7, 2023

YANG-DB added this to the 2.7 milestone Mar 7, 2023

YANG-DB self-assigned this Mar 7, 2023

YANG-DB moved this from In Progress to InReview in Integration Mar 15, 2023

YANG-DB modified the milestones: 2.7, 2.8 Apr 3, 2023

jaehnri mentioned this issue Sep 6, 2023

[exporter/opensearch] Send logs to Opensearch open-telemetry/opentelemetry-collector-contrib#26475

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Observability indices naming standards & routing #1405

[RFC] Observability indices naming standards & routing #1405

YANG-DB commented Feb 14, 2023 •

edited

Loading

ryn9 commented Mar 1, 2023 •

edited

Loading

[RFC] Observability indices naming standards & routing #1405

[RFC] Observability indices naming standards & routing #1405

Comments

YANG-DB commented Feb 14, 2023 • edited Loading

Problem statement

Proposal

Data index routing

Observability Index templates

Note

In details

Note

ryn9 commented Mar 1, 2023 • edited Loading

YANG-DB commented Feb 14, 2023 •

edited

Loading

ryn9 commented Mar 1, 2023 •

edited

Loading