From e8135d3db0a2d36b6bc919df7cfb68664b103dae Mon Sep 17 00:00:00 2001 From: James Newton-King Date: Fri, 15 Nov 2024 12:39:48 +0800 Subject: [PATCH 1/2] Add known Kestrel connection error types (#1548) Co-authored-by: Chris Ross Co-authored-by: Liudmila Molkova --- .chloggen/1548.yaml | 4 ++ docs/dotnet/dotnet-kestrel-metrics.md | 52 ++++++++++++++++++++++++- model/kestrel/metrics.yaml | 55 +++++++++++++++++++++++++++ 3 files changed, 110 insertions(+), 1 deletion(-) create mode 100644 .chloggen/1548.yaml diff --git a/.chloggen/1548.yaml b/.chloggen/1548.yaml new file mode 100644 index 0000000000..9651a2a115 --- /dev/null +++ b/.chloggen/1548.yaml @@ -0,0 +1,4 @@ +change_type: enhancement +component: kestrel +note: Add .NET 9 error reasons to Kestrel connection metric. +issues: [1582] diff --git a/docs/dotnet/dotnet-kestrel-metrics.md b/docs/dotnet/dotnet-kestrel-metrics.md index 73e454c421..f63ace4021 100644 --- a/docs/dotnet/dotnet-kestrel-metrics.md +++ b/docs/dotnet/dotnet-kestrel-metrics.md @@ -116,7 +116,57 @@ of `[ 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 30, 60, 120, 300 ]`. | [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [7] | `80`; `8080`; `443` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | | [`tls.protocol.version`](/docs/attributes-registry/tls.md) | string | Numeric part of the version parsed from the original string of the negotiated [SSL/TLS protocol version](https://www.openssl.org/docs/man1.1.1/man3/SSL_get_version.html#RETURN-VALUES) | `1.2`; `3` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -**[1]:** Captures the exception type when a connection fails. +**[1]:** Starting from .NET 9, Kestrel `kestrel.connection.duration` metric reports +the following errors types when a corresponding error occurs: + +| Value | Description | Stability | +|---|---|---| +| `aborted_by_app` | The HTTP/1.1 connection was aborted when app code aborted an HTTP request with `HttpContext.Abort()`. | +| `app_shutdown_timeout` | The connection was aborted during app shutdown. During shutdown, the server stops accepting new connections and HTTP requests, and it is given time for active requests to complete. If the app shutdown timeout is exceeded, all remaining connections are aborted. | +| `closed_critical_stream` | A critical control stream for an HTTP/3 connection was closed. | +| `connection_reset` | The connection was reset while there were active HTTP/2 or HTTP/3 streams on the connection. | +| `error_after_starting_response` | An error such as an unhandled application exception or invalid request body occurred after the response was started, causing an abort of the HTTP/1.1 connection. | +| `error_reading_headers` | An error occurred when decoding HPACK headers in an HTTP/2 `HEADERS` frame. | +| `error_writing_headers` | An error occurred when encoding HPACK headers in an HTTP/2 `HEADERS` frame. | +| `flow_control_queue_size_exceeded` | The connection exceeded the outgoing flow control maximum queue size and was closed with `INTERNAL_ERROR`. This can be caused by an excessive number of HTTP/2 stream resets. For more information, see [Microsoft Security Advisory CVE-2023-44487](https://github.com/dotnet/runtime/issues/93303). | +| `flow_control_window_exceeded` | The client sent more data than allowed by the current flow-control window. | +| `frame_after_stream_close` | An HTTP/2 frame was received on a closed stream. | +| `insufficient_tls_version` | The connection doesn't have TLS 1.2 or greater, as required by HTTP/2. | +| `invalid_body_reader_state` | An error occurred when draining the request body, aborting the HTTP/1.1 connection. This could be caused by app code reading the request body and missing a call to `PipeReader.AdvanceTo` in a finally block. | +| `invalid_data_padding` | An HTTP/2 `HEADER` or `DATA` frame has an invalid amount of padding. | +| `invalid_frame_length` | An HTTP/2 frame was received with an invalid frame payload length. The frame could contain a payload that is not valid for the type, or a `DATA` frame payload does not match the length specified in the frame header. | +| `invalid_handshake` | An invalid HTTP/2 handshake was received. | +| `invalid_http_version` | The connection received an HTTP request with the wrong version. For example, a browser sends an HTTP/1.1 request to a plain-text HTTP/2 connection. | +| `invalid_request_headers` | The HTTP request contains invalid headers. This error can occur in a number of scenarios: a header might not be allowed by the HTTP protocol, such as a pseudo-header in the `HEADERS` frame of an HTTP/2 request. A header could also have an invalid value, such as a non-integer `content-length`, or a header name or value might contain invalid characters. | +| `invalid_request_line` | The first line of an HTTP/1.1 request was invalid, potentially due to invalid content or exceeding the allowed limit. Configured by `KestrelServerLimits.MaxRequestLineSize`. | +| `invalid_settings` | The connection received an HTTP/2 or HTTP/3 `SETTINGS` frame with invalid settings. | +| `invalid_stream_id` | An HTTP/2 stream with an invalid stream ID was received. | +| `invalid_window_update_size` | The server received an HTTP/2 `WINDOW_UPDATE` frame with a zero increment, or an increment that caused a flow-control window to exceed the maximum size. | +| `io_error` | An `IOException` occurred while reading or writing HTTP/2 or HTTP/3 connection data. | +| `keep_alive_timeout` | There was no activity on the connection, and the keep-alive timeout configured by `KestrelServerLimits.KeepAliveTimeout` was exceeded. | +| `max_concurrent_connections_exceeded` | The connection exceeded the maximum concurrent connection limit. Configured by `KestrelServerLimits.MaxConcurrentConnections`. | +| `max_frame_length_exceeded` | The connection received an HTTP/2 frame that exceeded the size limit specified by `Http2Limits.MaxFrameSize`. | +| `max_request_body_size_exceeded` | The HTTP request body exceeded the maximum request body size limit. Configured by `KestrelServerLimits.MaxRequestBodySize`. | +| `max_request_header_count_exceeded` | The HTTP request headers exceeded the maximum count limit. Configured by `KestrelServerLimits.MaxRequestHeaderCount`. | +| `max_request_headers_total_size_exceeded` | The HTTP request headers exceeded the maximum total size limit. Configured by `KestrelServerLimits.MaxRequestHeadersTotalSize`. | +| `min_request_body_data_rate` | Reading the request body timed out due to data arriving too slowly. Configured by `KestrelServerLimits.MinRequestBodyDataRate`. | +| `min_response_data_rate` | Writing the response timed out because the client did not read it at the specified minimum data rate. Configured by `KestrelServerLimits.MinResponseDataRate`. | +| `missing_stream_end` | The connection received an HTTP/2 `HEADERS` frame for trailers without a stream end flag. | +| `output_queue_size_exceeded` | The connection exceeded the output queue size and was closed with `INTERNAL_ERROR`. This can be caused by an excessive number of HTTP/2 stream resets. For more information, see [Microsoft Security Advisory CVE-2023-44487](https://github.com/dotnet/runtime/issues/93303). | +| `request_headers_timeout` | Request headers timed out while waiting for headers to be received after the request started. Configured by `KestrelServerLimits.RequestHeadersTimeout`. | +| `response_content_length_mismatch` | The HTTP response body sent data that didn't match the response's `content-length` header. | +| `server_timeout` | The connection timed out with the `IConnectionTimeoutFeature`. | +| `stream_creation_error` | The HTTP/3 connection received a stream that it wouldn't accept. For example, the client created duplicate control streams. | +| `stream_reset_limit_exceeded` | The connection received an excessive number of HTTP/2 stream resets and was closed with `ENHANCE_YOUR_CALM`. For more information, see [Microsoft Security Advisory CVE-2023-44487](https://github.com/dotnet/runtime/issues/93303). | +| `stream_self_dependency` | The connection received an HTTP/2 frame that caused a frame to depend on itself. | +| `tls_handshake_failed` | An error occurred during the TLS handshake for a connection. Only reported for HTTP/1.1 and HTTP/2 connections. The TLS handshake for HTTP/3 is internal to QUIC transport. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `tls_not_supported` | A TLS handshake was received by an endpoint that isn't configured to support TLS. | +| `unexpected_end_of_request_content` | The HTTP/1.1 request body ended before the data specified by the `content-length` header or chunked transfer encoding mechanism was received. | +| `unexpected_frame` | An unexpected HTTP/2 or HTTP/3 frame type was received. The frame type is either unknown, unsupported, or invalid for the current stream state. | +| `unknown_stream` | An HTTP/2 frame was received on an unknown stream. | +| `write_canceled` | The cancellation of a response body write aborted the HTTP/1.1 connection. | + +In other cases, `error.type` contains the fully qualified type name of the exception. **[2]:** The value SHOULD be normalized to lowercase. diff --git a/model/kestrel/metrics.yaml b/model/kestrel/metrics.yaml index 0d1499a34c..ba25995767 100644 --- a/model/kestrel/metrics.yaml +++ b/model/kestrel/metrics.yaml @@ -44,7 +44,62 @@ groups: conditionally_required: if and only if an error has occurred. note: "Captures the exception type when a connection fails." examples: ['System.OperationCanceledException', 'Contoso.MyException'] + # yamllint disable rule:line-length + - ref: error.type + # TODO: move note to yaml once https://github.com/open-telemetry/build-tools/issues/192 is supported + note: | + Starting from .NET 9, Kestrel `kestrel.connection.duration` metric reports + the following errors types when a corresponding error occurs: + + | Value | Description | Stability | + |---|---|---| + | `aborted_by_app` | The HTTP/1.1 connection was aborted when app code aborted an HTTP request with `HttpContext.Abort()`. | + | `app_shutdown_timeout` | The connection was aborted during app shutdown. During shutdown, the server stops accepting new connections and HTTP requests, and it is given time for active requests to complete. If the app shutdown timeout is exceeded, all remaining connections are aborted. | + | `closed_critical_stream` | A critical control stream for an HTTP/3 connection was closed. | + | `connection_reset` | The connection was reset while there were active HTTP/2 or HTTP/3 streams on the connection. | + | `error_after_starting_response` | An error such as an unhandled application exception or invalid request body occurred after the response was started, causing an abort of the HTTP/1.1 connection. | + | `error_reading_headers` | An error occurred when decoding HPACK headers in an HTTP/2 `HEADERS` frame. | + | `error_writing_headers` | An error occurred when encoding HPACK headers in an HTTP/2 `HEADERS` frame. | + | `flow_control_queue_size_exceeded` | The connection exceeded the outgoing flow control maximum queue size and was closed with `INTERNAL_ERROR`. This can be caused by an excessive number of HTTP/2 stream resets. For more information, see [Microsoft Security Advisory CVE-2023-44487](https://github.com/dotnet/runtime/issues/93303). | + | `flow_control_window_exceeded` | The client sent more data than allowed by the current flow-control window. | + | `frame_after_stream_close` | An HTTP/2 frame was received on a closed stream. | + | `insufficient_tls_version` | The connection doesn't have TLS 1.2 or greater, as required by HTTP/2. | + | `invalid_body_reader_state` | An error occurred when draining the request body, aborting the HTTP/1.1 connection. This could be caused by app code reading the request body and missing a call to `PipeReader.AdvanceTo` in a finally block. | + | `invalid_data_padding` | An HTTP/2 `HEADER` or `DATA` frame has an invalid amount of padding. | + | `invalid_frame_length` | An HTTP/2 frame was received with an invalid frame payload length. The frame could contain a payload that is not valid for the type, or a `DATA` frame payload does not match the length specified in the frame header. | + | `invalid_handshake` | An invalid HTTP/2 handshake was received. | + | `invalid_http_version` | The connection received an HTTP request with the wrong version. For example, a browser sends an HTTP/1.1 request to a plain-text HTTP/2 connection. | + | `invalid_request_headers` | The HTTP request contains invalid headers. This error can occur in a number of scenarios: a header might not be allowed by the HTTP protocol, such as a pseudo-header in the `HEADERS` frame of an HTTP/2 request. A header could also have an invalid value, such as a non-integer `content-length`, or a header name or value might contain invalid characters. | + | `invalid_request_line` | The first line of an HTTP/1.1 request was invalid, potentially due to invalid content or exceeding the allowed limit. Configured by `KestrelServerLimits.MaxRequestLineSize`. | + | `invalid_settings` | The connection received an HTTP/2 or HTTP/3 `SETTINGS` frame with invalid settings. | + | `invalid_stream_id` | An HTTP/2 stream with an invalid stream ID was received. | + | `invalid_window_update_size` | The server received an HTTP/2 `WINDOW_UPDATE` frame with a zero increment, or an increment that caused a flow-control window to exceed the maximum size. | + | `io_error` | An `IOException` occurred while reading or writing HTTP/2 or HTTP/3 connection data. | + | `keep_alive_timeout` | There was no activity on the connection, and the keep-alive timeout configured by `KestrelServerLimits.KeepAliveTimeout` was exceeded. | + | `max_concurrent_connections_exceeded` | The connection exceeded the maximum concurrent connection limit. Configured by `KestrelServerLimits.MaxConcurrentConnections`. | + | `max_frame_length_exceeded` | The connection received an HTTP/2 frame that exceeded the size limit specified by `Http2Limits.MaxFrameSize`. | + | `max_request_body_size_exceeded` | The HTTP request body exceeded the maximum request body size limit. Configured by `KestrelServerLimits.MaxRequestBodySize`. | + | `max_request_header_count_exceeded` | The HTTP request headers exceeded the maximum count limit. Configured by `KestrelServerLimits.MaxRequestHeaderCount`. | + | `max_request_headers_total_size_exceeded` | The HTTP request headers exceeded the maximum total size limit. Configured by `KestrelServerLimits.MaxRequestHeadersTotalSize`. | + | `min_request_body_data_rate` | Reading the request body timed out due to data arriving too slowly. Configured by `KestrelServerLimits.MinRequestBodyDataRate`. | + | `min_response_data_rate` | Writing the response timed out because the client did not read it at the specified minimum data rate. Configured by `KestrelServerLimits.MinResponseDataRate`. | + | `missing_stream_end` | The connection received an HTTP/2 `HEADERS` frame for trailers without a stream end flag. | + | `output_queue_size_exceeded` | The connection exceeded the output queue size and was closed with `INTERNAL_ERROR`. This can be caused by an excessive number of HTTP/2 stream resets. For more information, see [Microsoft Security Advisory CVE-2023-44487](https://github.com/dotnet/runtime/issues/93303). | + | `request_headers_timeout` | Request headers timed out while waiting for headers to be received after the request started. Configured by `KestrelServerLimits.RequestHeadersTimeout`. | + | `response_content_length_mismatch` | The HTTP response body sent data that didn't match the response's `content-length` header. | + | `server_timeout` | The connection timed out with the `IConnectionTimeoutFeature`. | + | `stream_creation_error` | The HTTP/3 connection received a stream that it wouldn't accept. For example, the client created duplicate control streams. | + | `stream_reset_limit_exceeded` | The connection received an excessive number of HTTP/2 stream resets and was closed with `ENHANCE_YOUR_CALM`. For more information, see [Microsoft Security Advisory CVE-2023-44487](https://github.com/dotnet/runtime/issues/93303). | + | `stream_self_dependency` | The connection received an HTTP/2 frame that caused a frame to depend on itself. | + | `tls_handshake_failed` | An error occurred during the TLS handshake for a connection. Only reported for HTTP/1.1 and HTTP/2 connections. The TLS handshake for HTTP/3 is internal to QUIC transport. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + | `tls_not_supported` | A TLS handshake was received by an endpoint that isn't configured to support TLS. | + | `unexpected_end_of_request_content` | The HTTP/1.1 request body ended before the data specified by the `content-length` header or chunked transfer encoding mechanism was received. | + | `unexpected_frame` | An unexpected HTTP/2 or HTTP/3 frame type was received. The frame type is either unknown, unsupported, or invalid for the current stream state. | + | `unknown_stream` | An HTTP/2 frame was received on an unknown stream. | + | `write_canceled` | The cancellation of a response body write aborted the HTTP/1.1 connection. | + In other cases, `error.type` contains the fully qualified type name of the exception. + # yamllint enable rule:line-length - id: metric.kestrel.rejected_connections type: metric metric_name: kestrel.rejected_connections From fa564c9f6755a73699398ea44d1e22b0e75e48dd Mon Sep 17 00:00:00 2001 From: Daniel Dyla Date: Fri, 15 Nov 2024 11:42:26 -0700 Subject: [PATCH 2/2] Feature Flag evaluation event (#1440) Co-authored-by: Liudmila Molkova Co-authored-by: Josh Suereth --- .chloggen/1440.yaml | 6 ++ docs/attributes-registry/feature-flag.md | 38 ++++++-- docs/feature-flags/README.md | 1 - docs/feature-flags/feature-flags-logs.md | 87 +++++++++++++++---- docs/feature-flags/feature-flags-spans.md | 80 ----------------- docs/general/trace.md | 1 - .../deprecated/registry-deprecated.yaml | 12 +++ model/feature-flag/events.yaml | 14 --- model/feature-flag/logs.yaml | 64 ++++++++++++++ model/feature-flag/registry.yaml | 79 +++++++++++++++-- schema-next.yaml | 5 ++ 11 files changed, 260 insertions(+), 127 deletions(-) create mode 100644 .chloggen/1440.yaml delete mode 100644 docs/feature-flags/feature-flags-spans.md create mode 100644 model/feature-flag/deprecated/registry-deprecated.yaml delete mode 100644 model/feature-flag/events.yaml create mode 100644 model/feature-flag/logs.yaml diff --git a/.chloggen/1440.yaml b/.chloggen/1440.yaml new file mode 100644 index 0000000000..f130197287 --- /dev/null +++ b/.chloggen/1440.yaml @@ -0,0 +1,6 @@ +change_type: breaking +component: feature_flag +note: > + Rename `feature_flag` event to `feature_flag.evaluation` event, define new feature flag attributes and provide body definition. + Remove `feature_flag` span event definition in favor of log-based event. +issues: [1440] diff --git a/docs/attributes-registry/feature-flag.md b/docs/attributes-registry/feature-flag.md index 91e0be7a53..de4cd52d50 100644 --- a/docs/attributes-registry/feature-flag.md +++ b/docs/attributes-registry/feature-flag.md @@ -6,21 +6,47 @@ # Feature Flag +- [Feature Flag Attributes](#feature-flag-attributes) +- [Deprecated Feature Flag Attributes](#deprecated-feature-flag-attributes) + ## Feature Flag Attributes This document defines attributes for Feature Flags. | Attribute | Type | Description | Examples | Stability | |---|---|---|---|---| -| `feature_flag.key` | string | The unique identifier of the feature flag. | `logo-color` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `feature_flag.provider_name` | string | The name of the service provider that performs the flag evaluation. | `Flag Manager` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| `feature_flag.variant` | string | SHOULD be a semantic identifier for a value. If one is unavailable, a stringified version of the value can be used. [1] | `red`; `true`; `on` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `feature_flag.context.id` | string | The unique identifier for the flag evaluation context. For example, the targeting key. | `5157782b-2203-4c80-a857-dbbd5e7761db` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `feature_flag.evaluation.error.message` | string | A message explaining the nature of an error occurring during flag evaluation. | `Flag `header-color` expected type `string` but found type `number`` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `feature_flag.evaluation.reason` | string | The reason code which shows how a feature flag value was determined. | `static`; `targeting_match`; `error`; `default` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `feature_flag.key` | string | The lookup key of the feature flag. | `logo-color` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `feature_flag.set.id` | string | The identifier of the [flag set](https://openfeature.dev/specification/glossary/#flag-set) to which the feature flag belongs. | `proj-1`; `ab98sgs`; `service1/dev` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `feature_flag.system` | string | Identifies the feature flag provider. | `Flag Manager` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `feature_flag.variant` | string | A semantic identifier for an evaluated flag value. [1] | `red`; `true`; `on` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `feature_flag.version` | string | The version of the ruleset used during the evaluation. This may be any stable value which uniquely identifies the ruleset. | `1`; `01ABCDEF` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | **[1]:** A semantic identifier, commonly referred to as a variant, provides a means for referring to a value without including the value itself. This can provide additional context for understanding the meaning behind a value. For example, the variant `red` maybe be used for the value `#c05543`. -A stringified version of the value can be used in situations where a -semantic identifier is unavailable. String representation of the value -should be determined by the implementer. +`feature_flag.evaluation.reason` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `cached` | The resolved value was retrieved from cache. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `default` | The resolved value fell back to a pre-configured value (no dynamic evaluation occurred or dynamic evaluation yielded no result). | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `disabled` | The resolved value was the result of the flag being disabled in the management system. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `error` | The resolved value was the result of an error. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `split` | The resolved value was the result of pseudorandom assignment. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `stale` | The resolved value is non-authoritative or possibly out of date | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `static` | The resolved value is static (no dynamic evaluation). | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `targeting_match` | The resolved value was the result of a dynamic evaluation, such as a rule or specific user-targeting. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `unknown` | The reason for the resolved value could not be determined. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +## Deprecated Feature Flag Attributes + +Describes deprecated Feature Flag attributes. + +| Attribute | Type | Description | Examples | Stability | +|---|---|---|---|---| +| `feature_flag.provider_name` | string | Deprecated, use `feature_flag.system` instead. | `Flag Manager` | ![Deprecated](https://img.shields.io/badge/-deprecated-red)
Replaced by `feature_flag.system`. | diff --git a/docs/feature-flags/README.md b/docs/feature-flags/README.md index 2262031198..1d4a454078 100644 --- a/docs/feature-flags/README.md +++ b/docs/feature-flags/README.md @@ -14,7 +14,6 @@ evaluations in spans and logs. Semantic conventions for feature flags are defined for the following signals: -* [Feature Flags in Spans](feature-flags-spans.md): Semantic Conventions for recording feature flags in *spans*. * [Feature Flags in Logs](feature-flags-logs.md): Semantic Conventions for recording feature flags in *logs*. [DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status diff --git a/docs/feature-flags/feature-flags-logs.md b/docs/feature-flags/feature-flags-logs.md index 32b2519220..85964d8181 100644 --- a/docs/feature-flags/feature-flags-logs.md +++ b/docs/feature-flags/feature-flags-logs.md @@ -11,12 +11,16 @@ a [log record](https://github.com/open-telemetry/opentelemetry-specification/tre [Logger API](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.37.0/specification/logs/bridge-api.md#emit-a-logrecord). This is useful when a flag is evaluated outside of a transaction context such as when the application loads or on a timer. -To record a flag evaluation as a part of a transaction context, -consider [recording it as a span event](feature-flags-spans.md). -For more information about why it is useful to capture feature flag evaluations, -refer to the [motivation](feature-flags-spans.md#motivation) -section of the trace semantic convention for feature flag evaluations. +## Motivation + +Features flags are commonly used in modern applications to decouple feature releases from deployments. +Many feature flagging tools support the ability to update flag configurations in near real-time from a remote feature flag management service. +They also commonly allow rulesets to be defined that return values based on contextual information. +For example, a feature could be enabled only for a specific subset of users based on context (e.g. users email domain, membership tier, country). + +Since feature flags are dynamic and affect runtime behavior, it's important to collect relevant feature flag telemetry signals. +This can be used to determine the impact a feature has on a request, enabling enhanced observability use cases, such as A/B testing or progressive feature releases. @@ -37,7 +41,7 @@ context. The table below indicates which attributes should be added to the [LogRecord](https://github.com/open-telemetry/opentelemetry-specification/tree/v1.37.0/specification/logs/data-model.md#log-and-event-record-definition) and their types. - + @@ -46,24 +50,75 @@ The table below indicates which attributes should be added to the **Status:** ![Experimental](https://img.shields.io/badge/-experimental-blue) -The event name MUST be `feature_flag`. +The event name MUST be `feature_flag.evaluation`. -This event describes feature flag evaluation. +Defines feature flag evaluation as an event. + +A `feature_flag.evaluation` event SHOULD be emitted whenever a feature flag value is evaluated, which may happen many times over the course of an application lifecycle. For example, a website A/B testing different animations may evaluate a flag each time a button is clicked. A `feature_flag.evaluation` event is emitted on each evaluation even if the result is the same. | Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | |---|---|---|---|---|---| -| [`feature_flag.key`](/docs/attributes-registry/feature-flag.md) | string | The unique identifier of the feature flag. | `logo-color` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`feature_flag.provider_name`](/docs/attributes-registry/feature-flag.md) | string | The name of the service provider that performs the flag evaluation. | `Flag Manager` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`feature_flag.variant`](/docs/attributes-registry/feature-flag.md) | string | SHOULD be a semantic identifier for a value. If one is unavailable, a stringified version of the value can be used. [1] | `red`; `true`; `on` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | - -**[1]:** A semantic identifier, commonly referred to as a variant, provides a means +| [`feature_flag.key`](/docs/attributes-registry/feature-flag.md) | string | The lookup key of the feature flag. | `logo-color` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [1] | `provider_not_ready`; `targeting_key_missing`; `provider_fatal`; `general` | `Conditionally Required` [2] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | +| [`feature_flag.variant`](/docs/attributes-registry/feature-flag.md) | string | A semantic identifier for an evaluated flag value. [3] | `red`; `true`; `on` | `Conditionally Required` [4] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`feature_flag.context.id`](/docs/attributes-registry/feature-flag.md) | string | The unique identifier for the flag evaluation context. For example, the targeting key. | `5157782b-2203-4c80-a857-dbbd5e7761db` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`feature_flag.evaluation.error.message`](/docs/attributes-registry/feature-flag.md) | string | A message explaining the nature of an error occurring during flag evaluation. | `Flag `header-color` expected type `string` but found type `number`` | `Recommended` [5] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`feature_flag.evaluation.reason`](/docs/attributes-registry/feature-flag.md) | string | The reason code which shows how a feature flag value was determined. | `static`; `targeting_match`; `error`; `default` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`feature_flag.set.id`](/docs/attributes-registry/feature-flag.md) | string | The identifier of the [flag set](https://openfeature.dev/specification/glossary/#flag-set) to which the feature flag belongs. | `proj-1`; `ab98sgs`; `service1/dev` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`feature_flag.system`](/docs/attributes-registry/feature-flag.md) | string | Identifies the feature flag provider. | `Flag Manager` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| [`feature_flag.version`](/docs/attributes-registry/feature-flag.md) | string | The version of the ruleset used during the evaluation. This may be any stable value which uniquely identifies the ruleset. | `1`; `01ABCDEF` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +**[1]:** If one of these values applies, then it MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `flag_not_found` | The flag could not be found. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `invalid_context` | The evaluation context does not meet provider requirements. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `parse_error` | An error was encountered parsing data, such as a flag configuration. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `provider_fatal` | The provider has entered an irrecoverable error state. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `provider_not_ready` | The value was resolved before the provider was initialized. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `targeting_key_missing` | The provider requires a targeting key and one was not provided in the evaluation context. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `type_mismatch` | The type of the flag value does not match the expected type. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `general` | The error was for a reason not enumerated above. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +**[2]:** If and only if an error occurred during flag evaluation. + +**[3]:** A semantic identifier, commonly referred to as a variant, provides a means for referring to a value without including the value itself. This can provide additional context for understanding the meaning behind a value. For example, the variant `red` maybe be used for the value `#c05543`. -A stringified version of the value can be used in situations where a -semantic identifier is unavailable. String representation of the value -should be determined by the implementer. +**[4]:** If feature flag provider supplies a variant or equivalent concept. + +**[5]:** If and only if an error occurred. It's NOT RECOMMENDED to duplicate the value of `error.type` in `feature_flag.evaluation.error.message`. + +`error.type` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `_OTHER` | A fallback error value to be used when the instrumentation doesn't define a custom value. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) | + +`feature_flag.evaluation.reason` has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used. + +| Value | Description | Stability | +|---|---|---| +| `cached` | The resolved value was retrieved from cache. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `default` | The resolved value fell back to a pre-configured value (no dynamic evaluation occurred or dynamic evaluation yielded no result). | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `disabled` | The resolved value was the result of the flag being disabled in the management system. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `error` | The resolved value was the result of an error. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `split` | The resolved value was the result of pseudorandom assignment. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `stale` | The resolved value is non-authoritative or possibly out of date | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `static` | The resolved value is static (no dynamic evaluation). | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `targeting_match` | The resolved value was the result of a dynamic evaluation, such as a rule or specific user-targeting. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | +| `unknown` | The reason for the resolved value could not be determined. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +**Body fields:** + +| Body Field | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | +|---|---|---|---|---|---| +| `value` | undefined | The evaluated value of the feature flag. | `#ff0000`; `1`; `true` | `Conditionally Required` [1] | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + +**[1]:** If and only if feature flag provider does not supply variant or equivalent concept. Otherwise, `value` should be treated as opt-in. diff --git a/docs/feature-flags/feature-flags-spans.md b/docs/feature-flags/feature-flags-spans.md deleted file mode 100644 index de0ded12fb..0000000000 --- a/docs/feature-flags/feature-flags-spans.md +++ /dev/null @@ -1,80 +0,0 @@ - - -# Semantic Conventions for Feature Flags in Spans - -**Status**: [Experimental][DocumentStatus] - -This document defines semantic conventions for recording dynamic feature flag -evaluations within a transaction as span events. -To record an evaluation outside of a transaction context, consider -[recording it as a log record](feature-flags-logs.md). - - - - - -- [Motivation](#motivation) -- [Overview](#overview) -- [Convention](#convention) - - [Evaluation event](#evaluation-event) - - - -## Motivation - -Features flags are commonly used in modern applications to decouple feature releases from deployments. -Many feature flagging tools support the ability to update flag configurations in near real-time from a remote feature flag management service. -They also commonly allow rulesets to be defined that return values based on contextual information. -For example, a feature could be enabled only for a specific subset of users based on context (e.g. users email domain, membership tier, country). - -Since feature flags are dynamic and affect runtime behavior, it's important to collect relevant feature flag telemetry signals. -This can be used to determine the impact a feature has on a request, enabling enhanced observability use cases, such as A/B testing or progressive feature releases. - -## Overview - -The following semantic convention defines how feature flags can be represented as an `Event` in OpenTelemetry. -The terminology was defined in the [OpenFeature specification](https://docs.openfeature.dev/docs/specification/), which represents an industry consensus. -It's intended to be vendor neutral and provide flexibility for current and future use cases. - -## Convention - -A flag evaluation SHOULD be recorded as an Event on the span during which it occurred. - -### Evaluation event - - - - - - - - -**Status:** ![Experimental](https://img.shields.io/badge/-experimental-blue) - -The event name MUST be `feature_flag`. - -This event describes feature flag evaluation. - -| Attribute | Type | Description | Examples | [Requirement Level](https://opentelemetry.io/docs/specs/semconv/general/attribute-requirement-level/) | Stability | -|---|---|---|---|---|---| -| [`feature_flag.key`](/docs/attributes-registry/feature-flag.md) | string | The unique identifier of the feature flag. | `logo-color` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`feature_flag.provider_name`](/docs/attributes-registry/feature-flag.md) | string | The name of the service provider that performs the flag evaluation. | `Flag Manager` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | -| [`feature_flag.variant`](/docs/attributes-registry/feature-flag.md) | string | SHOULD be a semantic identifier for a value. If one is unavailable, a stringified version of the value can be used. [1] | `red`; `true`; `on` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) | - -**[1]:** A semantic identifier, commonly referred to as a variant, provides a means -for referring to a value without including the value itself. This can -provide additional context for understanding the meaning behind a value. -For example, the variant `red` maybe be used for the value `#c05543`. - -A stringified version of the value can be used in situations where a -semantic identifier is unavailable. String representation of the value -should be determined by the implementer. - - - - - - -[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status diff --git a/docs/general/trace.md b/docs/general/trace.md index 09b164af16..565913a31b 100644 --- a/docs/general/trace.md +++ b/docs/general/trace.md @@ -27,7 +27,6 @@ The following semantic conventions for spans are defined: * [Database](/docs/database/database-spans.md): For SQL and NoSQL client call spans. * [Exceptions](/docs/exceptions/exceptions-spans.md): For recording exceptions associated with a span. * [FaaS](/docs/faas/faas-spans.md): For [Function as a Service](https://wikipedia.org/wiki/Function_as_a_service) (e.g., AWS Lambda) spans. -* [Feature Flags](/docs/feature-flags/feature-flags-spans.md): For recording feature flag evaluations associated with a span. * [HTTP](/docs/http/http-spans.md): For HTTP client and server spans. * [Messaging](/docs/messaging/messaging-spans.md): For messaging systems (queues, publish/subscribe, etc.) spans. * [Object Stores](/docs/object-stores/README.md): Semantic Conventions for object stores spans. diff --git a/model/feature-flag/deprecated/registry-deprecated.yaml b/model/feature-flag/deprecated/registry-deprecated.yaml new file mode 100644 index 0000000000..2f3a681454 --- /dev/null +++ b/model/feature-flag/deprecated/registry-deprecated.yaml @@ -0,0 +1,12 @@ +groups: + - id: registry.feature_flag.deprecated + type: attribute_group + display_name: Deprecated Feature Flag Attributes + brief: "Describes deprecated Feature Flag attributes." + attributes: + - id: feature_flag.provider_name + type: string + brief: 'Deprecated, use `feature_flag.system` instead.' + stability: experimental + deprecated: "Replaced by `feature_flag.system`." + examples: ["Flag Manager"] diff --git a/model/feature-flag/events.yaml b/model/feature-flag/events.yaml deleted file mode 100644 index bd2cff8829..0000000000 --- a/model/feature-flag/events.yaml +++ /dev/null @@ -1,14 +0,0 @@ -groups: - - id: event.feature_flag - type: event - stability: experimental - name: feature_flag - brief: > - This event describes feature flag evaluation. - attributes: - - ref: feature_flag.key - requirement_level: required - - ref: feature_flag.provider_name - requirement_level: recommended - - ref: feature_flag.variant - requirement_level: recommended diff --git a/model/feature-flag/logs.yaml b/model/feature-flag/logs.yaml new file mode 100644 index 0000000000..0de7c95577 --- /dev/null +++ b/model/feature-flag/logs.yaml @@ -0,0 +1,64 @@ +groups: + - id: event.feature_flag.evaluation + type: event + name: feature_flag.evaluation + brief: > + Defines feature flag evaluation as an event. + note: > + A `feature_flag.evaluation` event SHOULD be emitted whenever a feature flag + value is evaluated, which may happen many times over the course of an + application lifecycle. + For example, a website A/B testing different animations may evaluate a + flag each time a button is clicked. + A `feature_flag.evaluation` event is emitted on each evaluation even if the result is the same. + attributes: + - ref: feature_flag.key + requirement_level: required + - ref: feature_flag.variant + requirement_level: + conditionally_required: If feature flag provider supplies a variant or equivalent concept. + - ref: feature_flag.system + requirement_level: recommended + - ref: feature_flag.context.id + requirement_level: recommended + - ref: feature_flag.version + requirement_level: recommended + - ref: feature_flag.set.id + requirement_level: recommended + - ref: feature_flag.evaluation.reason + requirement_level: recommended + - ref: error.type + examples: ["provider_not_ready", "targeting_key_missing", "provider_fatal", "general"] + requirement_level: + conditionally_required: If and only if an error occurred during flag evaluation. + # TODO: move note to yaml once https://github.com/open-telemetry/build-tools/issues/192 is supported + note: | + If one of these values applies, then it MUST be used; otherwise, a custom value MAY be used. + + | Value | Description | Stability | + |---|---|---| + | `flag_not_found` | The flag could not be found. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + | `invalid_context` | The evaluation context does not meet provider requirements. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + | `parse_error` | An error was encountered parsing data, such as a flag configuration. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + | `provider_fatal` | The provider has entered an irrecoverable error state. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + | `provider_not_ready` | The value was resolved before the provider was initialized. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + | `targeting_key_missing` | The provider requires a targeting key and one was not provided in the evaluation context. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + | `type_mismatch` | The type of the flag value does not match the expected type. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + | `general` | The error was for a reason not enumerated above. | ![Experimental](https://img.shields.io/badge/-experimental-blue) | + - ref: feature_flag.evaluation.error.message + requirement_level: + recommended: If and only if an error occurred. It's NOT RECOMMENDED to duplicate the value of `error.type` in `feature_flag.evaluation.error.message`. + body: + id: feature_flag.evaluation + type: map + requirement_level: recommended + fields: + - id: value + type: undefined + stability: experimental + brief: The evaluated value of the feature flag. + requirement_level: + conditionally_required: > + If and only if feature flag provider does not supply variant or equivalent concept. + Otherwise, `value` should be treated as opt-in. + examples: ["#ff0000", "1", "true"] diff --git a/model/feature-flag/registry.yaml b/model/feature-flag/registry.yaml index 380ef79d5f..76b1d5fe3c 100644 --- a/model/feature-flag/registry.yaml +++ b/model/feature-flag/registry.yaml @@ -9,26 +9,87 @@ groups: - id: feature_flag.key type: string stability: experimental - brief: The unique identifier of the feature flag. + brief: The lookup key of the feature flag. examples: ["logo-color"] - - id: feature_flag.provider_name + - id: feature_flag.system type: string stability: experimental - brief: The name of the service provider that performs the flag evaluation. + brief: Identifies the feature flag provider. examples: ["Flag Manager"] - id: feature_flag.variant type: string stability: experimental examples: ["red", "true", "on"] brief: > - SHOULD be a semantic identifier for a value. If one is unavailable, a - stringified version of the value can be used. + A semantic identifier for an evaluated flag value. note: |- A semantic identifier, commonly referred to as a variant, provides a means for referring to a value without including the value itself. This can provide additional context for understanding the meaning behind a value. For example, the variant `red` maybe be used for the value `#c05543`. - - A stringified version of the value can be used in situations where a - semantic identifier is unavailable. String representation of the value - should be determined by the implementer. + - id: feature_flag.context.id + type: string + stability: experimental + examples: ["5157782b-2203-4c80-a857-dbbd5e7761db"] + brief: > + The unique identifier for the flag evaluation context. For example, the targeting key. + - id: feature_flag.version + type: string + stability: experimental + examples: ["1", "01ABCDEF"] + brief: > + The version of the ruleset used during the evaluation. This may be any stable value which uniquely identifies the ruleset. + - id: feature_flag.set.id + type: string + stability: experimental + examples: ["proj-1", "ab98sgs", "service1/dev"] + brief: > + The identifier of the [flag set](https://openfeature.dev/specification/glossary/#flag-set) to which the feature flag belongs. + - id: feature_flag.evaluation.reason + type: + members: + - id: static + value: "static" + brief: The resolved value is static (no dynamic evaluation). + stability: experimental + - id: default + value: "default" + brief: The resolved value fell back to a pre-configured value (no dynamic evaluation occurred or dynamic evaluation yielded no result). + stability: experimental + - id: targeting_match + value: "targeting_match" + brief: The resolved value was the result of a dynamic evaluation, such as a rule or specific user-targeting. + stability: experimental + - id: split + value: "split" + brief: The resolved value was the result of pseudorandom assignment. + stability: experimental + - id: cached + value: "cached" + brief: The resolved value was retrieved from cache. + stability: experimental + - id: disabled + value: "disabled" + brief: The resolved value was the result of the flag being disabled in the management system. + stability: experimental + - id: unknown + value: "unknown" + brief: The reason for the resolved value could not be determined. + stability: experimental + - id: stale + value: "stale" + brief: The resolved value is non-authoritative or possibly out of date + stability: experimental + - id: error + value: "error" + brief: The resolved value was the result of an error. + stability: experimental + stability: experimental + examples: ["static", "targeting_match", "error", "default"] + brief: > + The reason code which shows how a feature flag value was determined. + - id: feature_flag.evaluation.error.message + type: string + stability: experimental + examples: ["Flag `header-color` expected type `string` but found type `number`"] + brief: A message explaining the nature of an error occurring during flag evaluation. diff --git a/schema-next.yaml b/schema-next.yaml index 5439843483..50afab2b96 100644 --- a/schema-next.yaml +++ b/schema-next.yaml @@ -16,6 +16,11 @@ versions: vcs.repository.ref.name: vcs.ref.head.name vcs.repository.ref.revision: vcs.ref.head.revision vcs.repository.ref.type: vcs.ref.head.type + # https://github.com/open-telemetry/semantic-conventions/pull/1440 + - rename_attributes: + attribute_map: + feature_flag.provider_name: feature_flag.system + metrics: changes: # https://github.com/open-telemetry/semantic-conventions/pull/1492