diff --git a/docs/processing.md b/docs/processing.md index 68f7dfb1d3e..2adde5230cc 100644 --- a/docs/processing.md +++ b/docs/processing.md @@ -88,7 +88,7 @@ environment specific data but the collector is commonly used to fill gaps in cov The processors implementing this use case are `k8sattributesprocessor`, `resourcedetectionprocessor`. -## Telemetry query language +## OpenTelemetry Transformation Language When looking at the use cases, there are certain common features for telemetry mutation and metric generation. @@ -96,7 +96,7 @@ When looking at the use cases, there are certain common features for telemetry m - Navigate to a path within the telemetry to operate on it - Define an operation, and possibly operation arguments -We can try to model these into a query language, in particular allowing the first two points to be shared among all +We can try to model these into a transformation language, in particular allowing the first two points to be shared among all processing operations, and only have implementation of individual types of processing need to implement operators that the user can use within an expression. @@ -106,7 +106,7 @@ This data can be navigated using field expressions, which are fields within the the status message of a span is `status.message`. A map lookup can include the key as a string, for example `attributes["http.status_code"]`. Operations are scoped to the type of a signal (`span`, `metric`, `log`), with all of the flattened points of that -signal being part of a query space. Virtual fields are added to access data from a higher level before flattening, for +signal being part of a tranformation space. Virtual fields are added to access data from a higher level before flattening, for `resource`, `library_info`. For metrics, the structure presented for processing is actual data points, e.g. `NumberDataPoint`, `HistogramDataPoint`, with the information from higher levels like `Metric` or the data type available as virtual fields. @@ -140,7 +140,7 @@ contrib components, and in the future can even allow user plugins possibly throu [HTTP proxies](https://github.com/proxy-wasm/spec). The arguments to operations will primarily be field expressions, allowing the operation to mutate telemetry as needed. -There are times when the query language input and the underlying telemetry model do not translate cleanly. For example, a span ID is represented in pdata as a SpanID struct, but in the query language it is more natural to represent the span ID as a string or a byte array. The solution to this problem is Factories. Factories are functions that help translate between the query language input into the underlying pdata structure. These types of functions do not change the telemetry in any way. Instead, they manipulate the query language input into a form that will make working with the telemetry easier or more efficient. +There are times when the tranformation language input and the underlying telemetry model do not translate cleanly. For example, a span ID is represented in pdata as a SpanID struct, but in the tranformation language it is more natural to represent the span ID as a string or a byte array. The solution to this problem is Factories. Factories are functions that help translate between the tranformation language input into the underlying pdata structure. These types of functions do not change the telemetry in any way. Instead, they manipulate the tranformation language input into a form that will make working with the telemetry easier or more efficient. ### Examples @@ -240,7 +240,7 @@ metrics: ``` A lot of processing. Queries are executed in order. While initially performance may degrade compared to more specialized -processors, the expectation is that over time, the query processor's engine would improve to be able to apply optimizations +processors, the expectation is that over time, the transform processor's engine would improve to be able to apply optimizations across queries, compile into machine code, etc. ```yaml @@ -252,7 +252,7 @@ exporters: processors: transform: - # Assuming group_by is defined in a contrib extension module, not baked into the "query" processor + # Assuming group_by is defined in a contrib extension module, not baked into the "transform" processor extensions: [group_by] traces: queries: @@ -275,18 +275,18 @@ processors: pipelines: - receivers: [otlp] exporters: [otlp] - processors: [query] + processors: [transform] ``` The expressions would be executed in order, with each expression either mutating an input telemetry, dropping input telemetry, or adding additional telemetry (usually for stateful processors like batch processor which will drop telemetry for a window and then add them all at the same time). One caveat to note is that we would like to implement optimizations -in the query engine, for example to only apply filtering once for multiple operations with a shared filter. Functions +in the transform engine, for example to only apply filtering once for multiple operations with a shared filter. Functions with unknown side effects may cause issues with optimization we will need to explore. ## Declarative configuration -The telemetry query language presents an SQL-like experience for defining telemetry transformations - it is made up of +The telemetry tranformation language presents an SQL-like experience for defining telemetry transformations - it is made up of the three primary components described above, however, and can be presented declaratively instead depending on what makes sense as a user experience. @@ -305,7 +305,7 @@ sense as a user experience. - attributes["http.request.header.authorization"] ``` -An implementation of the query language would likely parse expressions into this sort of structure so given an SQL-like +An implementation of the tranformation language would likely parse expressions into this sort of structure so given an SQL-like implementation, it would likely be little overhead to support a YAML approach in addition. ## Function syntax @@ -355,7 +355,7 @@ the `where ...` clause, as that will be handled by the framework before passing ## Embedded processors -The above describes a query language for configuring processing logic in the OpenTelemetry collector. There will be a +The above describes a tranformation language for configuring processing logic in the OpenTelemetry collector. There will be a single processor that exposes the processing logic into the collector config; however, the logic will be implemented within core packages rather than directly inside a processor. This is to ensure that where appropriate, processing can be embedded into other components, for example metric processing is often most appropriate to execute within a @@ -368,4 +368,4 @@ There are some known issues and limitations that we hope to address while iterat - Handling array-typed attributes - Working on a array of points, rather than a single point - Metric alignment - for example defining an expression on two metrics, that may not be at the same timestamp -- The collector has separate pipelines per signal - while the query language could apply cross-signal, we will need to remain single-signal for now +- The collector has separate pipelines per signal - while the tranformation language could apply cross-signal, we will need to remain single-signal for now