Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] rename query language to tranformation language #6139

Merged
merged 1 commit into from
Sep 23, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 12 additions & 12 deletions docs/processing.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,15 +88,15 @@ environment specific data but the collector is commonly used to fill gaps in cov

The processors implementing this use case are `k8sattributesprocessor`, `resourcedetectionprocessor`.

## Telemetry query language
## OpenTelemetry Transformation Language

When looking at the use cases, there are certain common features for telemetry mutation and metric generation.

- Identify the type of signal (`span`, `metric`, `log`).
- Navigate to a path within the telemetry to operate on it
- Define an operation, and possibly operation arguments

We can try to model these into a query language, in particular allowing the first two points to be shared among all
We can try to model these into a transformation language, in particular allowing the first two points to be shared among all
processing operations, and only have implementation of individual types of processing need to implement operators that
the user can use within an expression.

Expand All @@ -106,7 +106,7 @@ This data can be navigated using field expressions, which are fields within the
the status message of a span is `status.message`. A map lookup can include the key as a string, for example `attributes["http.status_code"]`.

Operations are scoped to the type of a signal (`span`, `metric`, `log`), with all of the flattened points of that
signal being part of a query space. Virtual fields are added to access data from a higher level before flattening, for
signal being part of a tranformation space. Virtual fields are added to access data from a higher level before flattening, for
`resource`, `library_info`. For metrics, the structure presented for processing is actual data points, e.g. `NumberDataPoint`,
`HistogramDataPoint`, with the information from higher levels like `Metric` or the data type available as virtual fields.

Expand Down Expand Up @@ -140,7 +140,7 @@ contrib components, and in the future can even allow user plugins possibly throu
[HTTP proxies](https://github.com/proxy-wasm/spec). The arguments to operations will primarily be field expressions,
allowing the operation to mutate telemetry as needed.

There are times when the query language input and the underlying telemetry model do not translate cleanly. For example, a span ID is represented in pdata as a SpanID struct, but in the query language it is more natural to represent the span ID as a string or a byte array. The solution to this problem is Factories. Factories are functions that help translate between the query language input into the underlying pdata structure. These types of functions do not change the telemetry in any way. Instead, they manipulate the query language input into a form that will make working with the telemetry easier or more efficient.
There are times when the tranformation language input and the underlying telemetry model do not translate cleanly. For example, a span ID is represented in pdata as a SpanID struct, but in the tranformation language it is more natural to represent the span ID as a string or a byte array. The solution to this problem is Factories. Factories are functions that help translate between the tranformation language input into the underlying pdata structure. These types of functions do not change the telemetry in any way. Instead, they manipulate the tranformation language input into a form that will make working with the telemetry easier or more efficient.

### Examples

Expand Down Expand Up @@ -240,7 +240,7 @@ metrics:
```

A lot of processing. Queries are executed in order. While initially performance may degrade compared to more specialized
processors, the expectation is that over time, the query processor's engine would improve to be able to apply optimizations
processors, the expectation is that over time, the transform processor's engine would improve to be able to apply optimizations
across queries, compile into machine code, etc.

```yaml
Expand All @@ -252,7 +252,7 @@ exporters:

processors:
transform:
# Assuming group_by is defined in a contrib extension module, not baked into the "query" processor
# Assuming group_by is defined in a contrib extension module, not baked into the "transform" processor
extensions: [group_by]
traces:
queries:
Expand All @@ -275,18 +275,18 @@ processors:
pipelines:
- receivers: [otlp]
exporters: [otlp]
processors: [query]
processors: [transform]
```

The expressions would be executed in order, with each expression either mutating an input telemetry, dropping input
telemetry, or adding additional telemetry (usually for stateful processors like batch processor which will drop telemetry
for a window and then add them all at the same time). One caveat to note is that we would like to implement optimizations
in the query engine, for example to only apply filtering once for multiple operations with a shared filter. Functions
in the transform engine, for example to only apply filtering once for multiple operations with a shared filter. Functions
with unknown side effects may cause issues with optimization we will need to explore.

## Declarative configuration

The telemetry query language presents an SQL-like experience for defining telemetry transformations - it is made up of
The telemetry tranformation language presents an SQL-like experience for defining telemetry transformations - it is made up of
the three primary components described above, however, and can be presented declaratively instead depending on what makes
sense as a user experience.

Expand All @@ -305,7 +305,7 @@ sense as a user experience.
- attributes["http.request.header.authorization"]
```

An implementation of the query language would likely parse expressions into this sort of structure so given an SQL-like
An implementation of the tranformation language would likely parse expressions into this sort of structure so given an SQL-like
implementation, it would likely be little overhead to support a YAML approach in addition.

## Function syntax
Expand Down Expand Up @@ -355,7 +355,7 @@ the `where ...` clause, as that will be handled by the framework before passing

## Embedded processors

The above describes a query language for configuring processing logic in the OpenTelemetry collector. There will be a
The above describes a tranformation language for configuring processing logic in the OpenTelemetry collector. There will be a
single processor that exposes the processing logic into the collector config; however, the logic will be implemented
within core packages rather than directly inside a processor. This is to ensure that where appropriate, processing
can be embedded into other components, for example metric processing is often most appropriate to execute within a
Expand All @@ -368,4 +368,4 @@ There are some known issues and limitations that we hope to address while iterat
- Handling array-typed attributes
- Working on a array of points, rather than a single point
- Metric alignment - for example defining an expression on two metrics, that may not be at the same timestamp
- The collector has separate pipelines per signal - while the query language could apply cross-signal, we will need to remain single-signal for now
- The collector has separate pipelines per signal - while the tranformation language could apply cross-signal, we will need to remain single-signal for now