- Start Date: 2023-06-12
- RFC Type: decision
- RFC PR: #101
- RFC Status: approved
- RFC Driver: Abhijeet Prasad
This RFC proposes to revamp the performance API in the SDKs. The new API aims to accomplish the following:
- De-emphasize the concept of transactions from users using a Sentry performance monitoring SDK.
- Optimize for making sure performance data is always sent to Sentry.
- Open the SDK up for future work where we support batch span ingestion.
Note: In a previous version of this RFC there was a focus on updating the span schema. This is no longer in scope for this RFC and will be addressed by future work. This section has been moved to an Appendix at the end of the RFC. This is what the following item refers to.
Align both the internal schemas and top-level public API with OpenTelemetry and their SDKs.Moved to Appendix
The primary planned work in this RFC is to introduce three new top-level methods, Sentry.startSpan
and Sentry.startInactiveSpan
and their language-specific variants as well as Sentry.setMeasurement
.
This allows us to de-emphasize the concept of hubs, scopes, and transactions from users, and instead have them think about just spans. Under the hood, Sentry.startSpan
and Sentry.startInactiveSpan
should create transactions/spans as appropriate. Sentry.setMeasurement
is used to abstract away transaction.setMeasurement
and similar.
In a previous version of the RFC, there was a follow-up step to update the span schema. This will be focused on later and for now, is not in scope for performance API improvements.
Right now every SDK has both the concept of transactions and spans - and to a user they both exist as vehicles of performance data. In addition, the transaction exists as the carrier of distributed tracing information (dynamic sampling context and sentry-trace info), although this is going to change with the advent of tracing without performance support in the SDKs.
Below is a JavaScript example of how to think about performance instrumentation in the JavaScript SDKs (browser/node)
// op is defined by https://develop.sentry.dev/sdk/performance/span-operations/
// name has no specs but is expected to be low cardinality
const transaction = Sentry.startTransaction({
op: "http.server",
name: "GET /",
});
// Need to set transaction on span so that integrations
// can attach spans (I/O operations or framework-specific spans)
// to the correct parent.
Sentry.getCurrentHub().getScope().setSpan(transaction);
// spans have a description, while transactions have names
// op is an optional attribute, but a lot of the product relies on it
// existing.
// description technically should be low cardinality, but we have
// no explicit documentation to say that it should (since spans
// were not indexed at all for a while).
const span = transaction.startChild({ description: "Long Task" });
expensiveAction();
span.finish();
anotherAction();
const secondSpan = transaction.startChild({ description: "Another Task" });
// transaction + all child spans sent to Sentry only when `finish()` is called.
transaction.finish();
Sentry.getCurrentHub().getScope().setSpan(undefined);
// second span info is not sent to Sentry because the transaction is already finished.
secondSpan.finish();
In our integrations that add automatic instrumentation, things look something like so:
const parentSpan = Sentry.getCurrentHub().getSpan();
// parentSpan can be undefined if no span is on the scope, this leads to
// child span just being lost
const span = parentSpan?.startChild({ description: "something" });
Sentry.getCurrentHub().getScope().setSpan(span);
work();
span.finish();
// span is finished, so the parent is put back onto the scope
Sentry.getCurrentHub().getScope().setSpan(parentSpan);
Most users do the above also when nested within their application, as often you assume that a transaction is defined that you can attach (very common in a web server context).
To add instrumentation to their applications, users have to know concepts about hubs/scopes/transactions/spans and understand all the different nuances and use cases. This can be difficult and presents a big barrier to entry for new users.
Also, since transactions/spans are different classes (span is a parent class of transaction), users have to understand the impacts that the same method will have on both transactions/spans. For example, currently calling setTag
on a transaction will add a tag to the transaction event (which is searchable in Sentry), while setTag
on a span just adds it to the span, and the field is not searchable. setData
on a span adds it to span.data
, while setData
on a transaction is undefined behaviour (some SDKs throw away the data, others add it to event.extras
).
Summarizing, here are the core Issues in SDK Performance API:
- Users have to understand the difference between spans/transactions and their schema differences.
- Users have to set/get transactions/spans from a scope (meaning they also have to understand what a scope/hub means).
- Nesting transactions within each other is undefined behaviour - no obvious way to make a transaction a child of another.
- If a transaction finishes before its child span finishes, that child span gets orphaned and the data is never sent to Sentry. This is most apparent if you have transactions that automatically finish (like on browser/mobile).
- Transactions have a max child span count of 1000 which means that eventually data is lost if you keep adding child spans to a transaction.
The new SDK API has the following requirements:
- Newly created spans must have the correct trace and parent/child relationship
- Users shouldn’t be burdened with knowing if something is a span/transaction
- Spans only need a name to identify themselves, everything else is optional.
- The new top-level APIs should be as similar to the OpenTelemetry SDK public API as possible.
There are two top-level methods we'll be introducing to achieve this: Sentry.startSpan
and Sentry.startInactiveSpan
. Sentry.startSpan
will take a callback and start/stop a span automatically. In addition, it'll also set the span on the current scope. Under the hood, the SDK will create a transaction or span based on if there is already an existing span on the scope.
The reason for electing to have startSpan
and startInactiveSpan
as separate top-level methods is to not burden the user with having to know about scope propagation, since the concept of a scope might change in future versions of the unified API. If this is not viable at all for a language or certain frameworks, SDK authors can opt to include the onScope
parameter to the startInactiveSpan
call that will automatically set the span on the current scope, but this is not recommended.
namespace Sentry {
declare function startSpan<T>(
spanContext: SpanContext,
callback: (span: Span) => T
): T;
}
// span that is created is provided to callback in case additional
// attributes have to be added.
// ideally callback can be async/sync
const returnVal = Sentry.startSpan({ name: "expensiveCalc" }, (span: Span) =>
expensiveCalc()
);
// If the SDK needs async/sync typed differently it can be exposed as `startActiveSpanAsync`
// declare function startActiveSpanAsync<T>(
// spanContext: SpanContext,
// callback: (span: Span) => Promise<T>,
// ): Promise<T>;
In the ideal case, startSpan
should generally follow this code path.
- Get the active span from the current scope
- If the active span is defined, create a child span of that active span based on the provided
spanContext
, otherwise create a transaction based on the providedspanContext
. - Run the provided callback
- Finish the child span/transaction created in Step 2
- Remove the child span/transaction from the current scope and if it exists, set the previous active span as the active span in the current scope.
If the provided callback throws an exception, the span/transaction created in Step 2 should be marked as errored. This error should not be swallowed by startSpan
.
startSpan
only provides the correct parent-child relationship if your platform has proper support for forking scopes. For platforms that have a single hub/scope (like the mobile SDKs), this method will not lead to the correct parent-child relationship. The SDK will have to provide a different method for these platforms. The recommended option here is for startSpan
to always attach the span it creates to the root span (the transaction), which means users don't get exact parent-child relationships, but they do get relative relationships between spans using relative durations.
Sentry.startInactiveSpan
will create a span, but not set it as the active span in the current scope.
namespace Sentry {
declare function startInactiveSpan(spanContext: SpanContext): Span;
}
// does not get put on the scope
const span = Sentry.startInactiveSpan({ name: "expensiveCalc" });
expensiveCalc();
span.finish();
The goal here is to make span creation consistent across all SDKs, but we also want to make sure that the SDKs are idiomatic for their language/runtime. SDK authors are free to add additional methods to start spans if they feel that startSpan
and startInactiveSpan
are not appropriate for their language/framework/runtime.
For example, with go we could have a method that starts a span from a go context:
sentry.StartSpanFromContext(ctx, spanCtx)
SDK authors can also change the behaviour of startSpan
and startInactiveSpan
if they feel that the outlined behaviour is not idiomatic for their language/framework/runtime, but this is not recommended and should be discussed with the greater SDK team before being implemented.
To accommodate the inclusion of Sentry.startSpan
and Sentry.startInactiveSpan
, the span.name
field should be used and is an alias for span.description
. Span name should become required for the span context argument that is accepted by Sentry.startSpan
and Sentry.startInactiveSpan
.
This means methods for setting a span name should be added to the span interface.
span.setName(name: string): void;
Under the hood, calling span.setName
should set the span.description
field for backward compatibility reasons.
Since we want to discourage accessing the transaction object directly, the Sentry.setMeasurement
top-level method will also be introduced. This will set a custom performance metric if a transaction exists. If a transaction doesn't exist, this method will do nothing. In the future, this method will attach the measurement to the span on the scope, but for now, it'll only attach it to the transaction.
namespace Sentry {
declare function setMeasurement(key: string, value: number): void;
}
Since we only have beforeSend
hooks for a transaction, we should look toward building similar hooks for span start and finish as well. This can be done after the span schema has been changed in the SDKs.
This RFC previously contained information for changing the span schema. This is no longer in the scope of the RFC but is documented here for posterity's sake.
Step 2 of the RFC was: Introduce a new span schema that is aligned with OpenTelemetry where we would change the data model that is referenced and used internally inside the SDK to better reflect OpenTelemetry. This involves adding shims for backward compatibility and removing redundant fields.
To remove the overhead of understanding transactions/spans and their differences, we propose to simplify the span schema to have a minimal set of required fields.
The current transaction schema inherits from the error event schema, with a few fields that are specific to transactions.
A full version of this protocol can be seen in Relay, but here are some of the important fields:
Transaction Field | Description | Type |
---|---|---|
name |
Name of the transaction. In ingest and storage this field is called transaction |
String |
end_timestamp |
Timestamp when transaction was finished. Some SDKs alias this to end_timestamp but convert it to timestamp when serializing to send to Sentry. |
String | Float |
start_timestamp |
Timestamp when transaction was created. | String | Float |
tags |
Custom tags for this event. Identical in behaviour to tags on error events. | Object |
spans |
A list of child spans to this transaction. | Span[] |
measurements |
Measurements which holds observed values such as web vitals. | Object |
contexts |
Contexts which holds additional information about the transaction. In particular, contexts.trace has additional information about the transaction "span" |
Object |
The transaction also has a trace context, which contains additional fields about the transaction.
Transaction Field | Description | Type |
---|---|---|
trace_id |
Trace ID of the transaction. Format is identical between Sentry and OpenTelemetry | String |
span_id |
Span ID of the transaction. Format is identical between Sentry and OpenTelemetry | String |
parent_span_id |
Parent span ID of the transaction. Format is identical between Sentry and OpenTelemetry | String |
op |
Operation type of the transaction. Standardized by Sentry spec | String |
status |
Status of the transaction. Sentry maps status to HTTP status codes, while OpenTelemetry has a fixed set of status' | String |
The current span schema is as follows:
Span Field | Description | Type |
---|---|---|
description |
Description of the span. Same purpose as transaction name |
|
trace_id |
Trace ID of the span. Format is identical between Sentry and OpenTelemetry | String |
span_id |
Span ID of the span. Format is identical between Sentry and OpenTelemetry | String |
parent_span_id |
Parent span ID of the span. Format is identical between Sentry and OpenTelemetry | String |
end_timestamp |
Timestamp when span was finished. Some SDKs alias this to end_timestamp but convert it to timestamp when serializing to send to Sentry. |
String | Float |
start_timestamp |
Timestamp when span was created. | String | Float |
op |
Operation type of the span. Standardized by Sentry spec | String |
status |
Status of the span. Sentry maps status to HTTP status codes, while OpenTelemetry has a fixed set of status' | String |
tags |
Custom tags for this span. | Object |
data |
Arbitrary additional data on a span, like extra on the top-level event. We maintain conventions for span data keys and values. |
Object |
As you can see, the fields on the transaction/span differ in a few ways, the most notable of which is that transactions have name
while spans have description
. This means that spans and transactions are not interchangeable, and users have to know the difference between the two.
In addition, users have the burden to understand the differences between name
/description
/operation
. operation
in particular can be confusing, as it overlaps with transaction name
and span description
. In addition, operation
is not a required field, which means that it is not clear what the default value should be.
Transactions also have no mechanism for arbitrary additional data like spans do with data
. Users can choose to add arbitrary data to transactions by adding it to the contexts
field (as transactions extend the event schema), but this is not obvious and not exposed in every SDK. Since contexts are already well defined in their own way, there is no way of using Sentry's semantic conventions for span data for transactions.
To simplify how performance data is consumed and understood, we are proposing a new span schema that the SDKs send to Sentry. The new span schema aims to be a superset of the OpenTelemetry span schema and have a minimal top-level API surface. This also means that spans can be easily converted to OpenTelemetry spans and vice versa.
The new span schema is as follows:
Span Field | Description | Type | Notes |
---|---|---|---|
name |
The name of the span | String | Should be low cardinality. Replacing span description |
trace_id |
Trace ID of the span | String | Format is identical between Sentry and OpenTelemetry |
span_id |
Span ID of the span | String | Format is identical between Sentry and OpenTelemetry |
parent_span_id |
Parent span ID of the span | String | Format is identical between Sentry and OpenTelemetry. If empty this is a root span (transaction). |
end_timestamp |
Timestamp when span was finished | String | Float | |
start_timestamp |
Timestamp when span was finished | String | Float | |
op |
Operation type of the span | String | Use is discouraged but kept for backwards compatibility for product features |
status |
Status of the span | String | An optional final status for this span. Can have three possible values: 'ok', 'error', 'unset'. Same as OpenTelemetry's Span Status |
attributes |
A set of attributes on the span. | Object | This maps to span.data in the current schema for spans. There is no existing mapping for this in the current transaction schema. The keys of attributes are well known values, and defined by a combination of OpenTelemtry's and Sentry's semantic conventions. |
measurements |
Measurements which holds observed values such as web vitals. | Object |
For this RFC, the version on the span schema will be set to 2. This will indicate to all consumers that the new span schema is being used.
Just like both the old Sentry schema and the OpenTelemetry schema, we keep the same fields for span_id
, trace_id
, parent_span_id
, start_timestamp
, and end_timestamp
. We also choose to rename description
to name
to match the OpenTelemetry schema.
Having both the name
and op
fields is redundant, but we choose to keep both for backward compatibility. There are many features in the product that are powered by having a meaningful operation, more details about this can be found in the documentation around Span operations. In the future, we can choose to deprecate op
and remove it from the schema.
The most notable change here is to formally introduce the attributes
field, and remove the span.data
field. This is a breaking change, but worth it in the long term. If we start accepting attributes
on transactions as well, we more closely align with the OpenTelemetry schema and can use the same conventions for both spans and transactions.
To ensure that we have backward compatibility, we will shim the old schema to the new schema. This has to be done for both transactions and spans.
For transactions, we need to start adding the attributes
field to the trace context, the same way we do for spans. This will allow us to use the same conventions for both transactions and spans. For spans, we can keep and deprecate the span.data
field, and forward it to span.attributes
internally. For example, span.setData()
would just call span.setAttribute()
internally.
Since status is changing to be an enum of 3 values from something that was previously mapped to http status code, we need to migrate away from it in two steps. First, we'll be marking http status of spans in span.attributes
, using our span data semantic conventions. For example, http.request.status_code
can record the request status code. Next, we'll introduce a mapping where 2xx status codes map to ok
, 4xx and 5xx status codes map to error
, and all other status codes map to unset
.
Similar to span.data
, we can keep and deprecate the span.description
field and forward it to span.name
internally. For example, span.setDescription()
would just call span.setName()
internally.