Skip to content

Latest commit

 

History

History
295 lines (202 loc) · 26.2 KB

0101-revamping-the-sdk-performance-api.md

File metadata and controls

295 lines (202 loc) · 26.2 KB
  • Start Date: 2023-06-12
  • RFC Type: decision
  • RFC PR: #101
  • RFC Status: approved
  • RFC Driver: Abhijeet Prasad

Summary

This RFC proposes to revamp the performance API in the SDKs. The new API aims to accomplish the following:

  • De-emphasize the concept of transactions from users using a Sentry performance monitoring SDK.
  • Optimize for making sure performance data is always sent to Sentry.
  • Open the SDK up for future work where we support batch span ingestion.

Note: In a previous version of this RFC there was a focus on updating the span schema. This is no longer in scope for this RFC and will be addressed by future work. This section has been moved to an Appendix at the end of the RFC. This is what the following item refers to.

  • Align both the internal schemas and top-level public API with OpenTelemetry and their SDKs. Moved to Appendix

Planned Work

The primary planned work in this RFC is to introduce three new top-level methods, Sentry.startSpan and Sentry.startInactiveSpan and their language-specific variants as well as Sentry.setMeasurement.

This allows us to de-emphasize the concept of hubs, scopes, and transactions from users, and instead have them think about just spans. Under the hood, Sentry.startSpan and Sentry.startInactiveSpan should create transactions/spans as appropriate. Sentry.setMeasurement is used to abstract away transaction.setMeasurement and similar.

In a previous version of the RFC, there was a follow-up step to update the span schema. This will be focused on later and for now, is not in scope for performance API improvements.

Background

Right now every SDK has both the concept of transactions and spans - and to a user they both exist as vehicles of performance data. In addition, the transaction exists as the carrier of distributed tracing information (dynamic sampling context and sentry-trace info), although this is going to change with the advent of tracing without performance support in the SDKs.

Below is a JavaScript example of how to think about performance instrumentation in the JavaScript SDKs (browser/node)

// op is defined by https://develop.sentry.dev/sdk/performance/span-operations/
// name has no specs but is expected to be low cardinality
const transaction = Sentry.startTransaction({
  op: "http.server",
  name: "GET /",
});

// Need to set transaction on span so that integrations
// can attach spans (I/O operations or framework-specific spans)
// to the correct parent.
Sentry.getCurrentHub().getScope().setSpan(transaction);

// spans have a description, while transactions have names
// op is an optional attribute, but a lot of the product relies on it
// existing.
// description technically should be low cardinality, but we have
// no explicit documentation to say that it should (since spans
// were not indexed at all for a while).
const span = transaction.startChild({ description: "Long Task" });
expensiveAction();
span.finish();

anotherAction();

const secondSpan = transaction.startChild({ description: "Another Task" });

// transaction + all child spans sent to Sentry only when `finish()` is called.
transaction.finish();
Sentry.getCurrentHub().getScope().setSpan(undefined);

// second span info is not sent to Sentry because the transaction is already finished.
secondSpan.finish();

In our integrations that add automatic instrumentation, things look something like so:

const parentSpan = Sentry.getCurrentHub().getSpan();

// parentSpan can be undefined if no span is on the scope, this leads to
// child span just being lost
const span = parentSpan?.startChild({ description: "something" });

Sentry.getCurrentHub().getScope().setSpan(span);

work();

span.finish();

// span is finished, so the parent is put back onto the scope
Sentry.getCurrentHub().getScope().setSpan(parentSpan);

Most users do the above also when nested within their application, as often you assume that a transaction is defined that you can attach (very common in a web server context).

To add instrumentation to their applications, users have to know concepts about hubs/scopes/transactions/spans and understand all the different nuances and use cases. This can be difficult and presents a big barrier to entry for new users.

Also, since transactions/spans are different classes (span is a parent class of transaction), users have to understand the impacts that the same method will have on both transactions/spans. For example, currently calling setTag on a transaction will add a tag to the transaction event (which is searchable in Sentry), while setTag on a span just adds it to the span, and the field is not searchable. setData on a span adds it to span.data, while setData on a transaction is undefined behaviour (some SDKs throw away the data, others add it to event.extras).

Summarizing, here are the core Issues in SDK Performance API:

  1. Users have to understand the difference between spans/transactions and their schema differences.
  2. Users have to set/get transactions/spans from a scope (meaning they also have to understand what a scope/hub means).
  3. Nesting transactions within each other is undefined behaviour - no obvious way to make a transaction a child of another.
  4. If a transaction finishes before its child span finishes, that child span gets orphaned and the data is never sent to Sentry. This is most apparent if you have transactions that automatically finish (like on browser/mobile).
  5. Transactions have a max child span count of 1000 which means that eventually data is lost if you keep adding child spans to a transaction.

Improvements

New SDK API

The new SDK API has the following requirements:

  1. Newly created spans must have the correct trace and parent/child relationship
  2. Users shouldn’t be burdened with knowing if something is a span/transaction
  3. Spans only need a name to identify themselves, everything else is optional.
  4. The new top-level APIs should be as similar to the OpenTelemetry SDK public API as possible.

Span Creators

There are two top-level methods we'll be introducing to achieve this: Sentry.startSpan and Sentry.startInactiveSpan. Sentry.startSpan will take a callback and start/stop a span automatically. In addition, it'll also set the span on the current scope. Under the hood, the SDK will create a transaction or span based on if there is already an existing span on the scope.

The reason for electing to have startSpan and startInactiveSpan as separate top-level methods is to not burden the user with having to know about scope propagation, since the concept of a scope might change in future versions of the unified API. If this is not viable at all for a language or certain frameworks, SDK authors can opt to include the onScope parameter to the startInactiveSpan call that will automatically set the span on the current scope, but this is not recommended.

namespace Sentry {
  declare function startSpan<T>(
    spanContext: SpanContext,
    callback: (span: Span) => T
  ): T;
}

// span that is created is provided to callback in case additional
// attributes have to be added.
// ideally callback can be async/sync
const returnVal = Sentry.startSpan({ name: "expensiveCalc" }, (span: Span) =>
  expensiveCalc()
);

// If the SDK needs async/sync typed differently it can be exposed as `startActiveSpanAsync`
// declare function startActiveSpanAsync<T>(
//   spanContext: SpanContext,
//   callback: (span: Span) => Promise<T>,
// ): Promise<T>;

In the ideal case, startSpan should generally follow this code path.

  1. Get the active span from the current scope
  2. If the active span is defined, create a child span of that active span based on the provided spanContext, otherwise create a transaction based on the provided spanContext.
  3. Run the provided callback
  4. Finish the child span/transaction created in Step 2
  5. Remove the child span/transaction from the current scope and if it exists, set the previous active span as the active span in the current scope.

If the provided callback throws an exception, the span/transaction created in Step 2 should be marked as errored. This error should not be swallowed by startSpan.

startSpan only provides the correct parent-child relationship if your platform has proper support for forking scopes. For platforms that have a single hub/scope (like the mobile SDKs), this method will not lead to the correct parent-child relationship. The SDK will have to provide a different method for these platforms. The recommended option here is for startSpan to always attach the span it creates to the root span (the transaction), which means users don't get exact parent-child relationships, but they do get relative relationships between spans using relative durations.

Sentry.startInactiveSpan will create a span, but not set it as the active span in the current scope.

namespace Sentry {
  declare function startInactiveSpan(spanContext: SpanContext): Span;
}

// does not get put on the scope
const span = Sentry.startInactiveSpan({ name: "expensiveCalc" });

expensiveCalc();

span.finish();

The goal here is to make span creation consistent across all SDKs, but we also want to make sure that the SDKs are idiomatic for their language/runtime. SDK authors are free to add additional methods to start spans if they feel that startSpan and startInactiveSpan are not appropriate for their language/framework/runtime.

For example, with go we could have a method that starts a span from a go context:

sentry.StartSpanFromContext(ctx, spanCtx)

SDK authors can also change the behaviour of startSpan and startInactiveSpan if they feel that the outlined behaviour is not idiomatic for their language/framework/runtime, but this is not recommended and should be discussed with the greater SDK team before being implemented.

Span Name

To accommodate the inclusion of Sentry.startSpan and Sentry.startInactiveSpan, the span.name field should be used and is an alias for span.description. Span name should become required for the span context argument that is accepted by Sentry.startSpan and Sentry.startInactiveSpan.

This means methods for setting a span name should be added to the span interface.

span.setName(name: string): void;

Under the hood, calling span.setName should set the span.description field for backward compatibility reasons.

Setting measurements

Since we want to discourage accessing the transaction object directly, the Sentry.setMeasurement top-level method will also be introduced. This will set a custom performance metric if a transaction exists. If a transaction doesn't exist, this method will do nothing. In the future, this method will attach the measurement to the span on the scope, but for now, it'll only attach it to the transaction.

namespace Sentry {
  declare function setMeasurement(key: string, value: number): void;
}

Next Steps

Since we only have beforeSend hooks for a transaction, we should look toward building similar hooks for span start and finish as well. This can be done after the span schema has been changed in the SDKs.

Appendix

This RFC previously contained information for changing the span schema. This is no longer in the scope of the RFC but is documented here for posterity's sake.

Step 2 of the RFC was: Introduce a new span schema that is aligned with OpenTelemetry where we would change the data model that is referenced and used internally inside the SDK to better reflect OpenTelemetry. This involves adding shims for backward compatibility and removing redundant fields.

Span Schema

To remove the overhead of understanding transactions/spans and their differences, we propose to simplify the span schema to have a minimal set of required fields.

Existing Span Schema

The current transaction schema inherits from the error event schema, with a few fields that are specific to transactions.

A full version of this protocol can be seen in Relay, but here are some of the important fields:

Transaction Field Description Type
name Name of the transaction. In ingest and storage this field is called transaction String
end_timestamp Timestamp when transaction was finished. Some SDKs alias this to end_timestamp but convert it to timestamp when serializing to send to Sentry. String | Float
start_timestamp Timestamp when transaction was created. String | Float
tags Custom tags for this event. Identical in behaviour to tags on error events. Object
spans A list of child spans to this transaction. Span[]
measurements Measurements which holds observed values such as web vitals. Object
contexts Contexts which holds additional information about the transaction. In particular, contexts.trace has additional information about the transaction "span" Object

The transaction also has a trace context, which contains additional fields about the transaction.

Transaction Field Description Type
trace_id Trace ID of the transaction. Format is identical between Sentry and OpenTelemetry String
span_id Span ID of the transaction. Format is identical between Sentry and OpenTelemetry String
parent_span_id Parent span ID of the transaction. Format is identical between Sentry and OpenTelemetry String
op Operation type of the transaction. Standardized by Sentry spec String
status Status of the transaction. Sentry maps status to HTTP status codes, while OpenTelemetry has a fixed set of status' String

The current span schema is as follows:

Span Field Description Type
description Description of the span. Same purpose as transaction name
trace_id Trace ID of the span. Format is identical between Sentry and OpenTelemetry String
span_id Span ID of the span. Format is identical between Sentry and OpenTelemetry String
parent_span_id Parent span ID of the span. Format is identical between Sentry and OpenTelemetry String
end_timestamp Timestamp when span was finished. Some SDKs alias this to end_timestamp but convert it to timestamp when serializing to send to Sentry. String | Float
start_timestamp Timestamp when span was created. String | Float
op Operation type of the span. Standardized by Sentry spec String
status Status of the span. Sentry maps status to HTTP status codes, while OpenTelemetry has a fixed set of status' String
tags Custom tags for this span. Object
data Arbitrary additional data on a span, like extra on the top-level event. We maintain conventions for span data keys and values. Object

As you can see, the fields on the transaction/span differ in a few ways, the most notable of which is that transactions have name while spans have description. This means that spans and transactions are not interchangeable, and users have to know the difference between the two.

In addition, users have the burden to understand the differences between name/description/operation. operation in particular can be confusing, as it overlaps with transaction name and span description. In addition, operation is not a required field, which means that it is not clear what the default value should be.

Transactions also have no mechanism for arbitrary additional data like spans do with data. Users can choose to add arbitrary data to transactions by adding it to the contexts field (as transactions extend the event schema), but this is not obvious and not exposed in every SDK. Since contexts are already well defined in their own way, there is no way of using Sentry's semantic conventions for span data for transactions.

New Span Schema

To simplify how performance data is consumed and understood, we are proposing a new span schema that the SDKs send to Sentry. The new span schema aims to be a superset of the OpenTelemetry span schema and have a minimal top-level API surface. This also means that spans can be easily converted to OpenTelemetry spans and vice versa.

The new span schema is as follows:

Span Field Description Type Notes
name The name of the span String Should be low cardinality. Replacing span description
trace_id Trace ID of the span String Format is identical between Sentry and OpenTelemetry
span_id Span ID of the span String Format is identical between Sentry and OpenTelemetry
parent_span_id Parent span ID of the span String Format is identical between Sentry and OpenTelemetry. If empty this is a root span (transaction).
end_timestamp Timestamp when span was finished String | Float
start_timestamp Timestamp when span was finished String | Float
op Operation type of the span String Use is discouraged but kept for backwards compatibility for product features
status Status of the span String An optional final status for this span. Can have three possible values: 'ok', 'error', 'unset'. Same as OpenTelemetry's Span Status
attributes A set of attributes on the span. Object This maps to span.data in the current schema for spans. There is no existing mapping for this in the current transaction schema. The keys of attributes are well known values, and defined by a combination of OpenTelemtry's and Sentry's semantic conventions.
measurements Measurements which holds observed values such as web vitals. Object

For this RFC, the version on the span schema will be set to 2. This will indicate to all consumers that the new span schema is being used.

Just like both the old Sentry schema and the OpenTelemetry schema, we keep the same fields for span_id, trace_id, parent_span_id, start_timestamp, and end_timestamp. We also choose to rename description to name to match the OpenTelemetry schema.

Having both the name and op fields is redundant, but we choose to keep both for backward compatibility. There are many features in the product that are powered by having a meaningful operation, more details about this can be found in the documentation around Span operations. In the future, we can choose to deprecate op and remove it from the schema.

The most notable change here is to formally introduce the attributes field, and remove the span.data field. This is a breaking change, but worth it in the long term. If we start accepting attributes on transactions as well, we more closely align with the OpenTelemetry schema and can use the same conventions for both spans and transactions.

Shimming the old schema

To ensure that we have backward compatibility, we will shim the old schema to the new schema. This has to be done for both transactions and spans.

For transactions, we need to start adding the attributes field to the trace context, the same way we do for spans. This will allow us to use the same conventions for both transactions and spans. For spans, we can keep and deprecate the span.data field, and forward it to span.attributes internally. For example, span.setData() would just call span.setAttribute() internally.

Since status is changing to be an enum of 3 values from something that was previously mapped to http status code, we need to migrate away from it in two steps. First, we'll be marking http status of spans in span.attributes, using our span data semantic conventions. For example, http.request.status_code can record the request status code. Next, we'll introduce a mapping where 2xx status codes map to ok, 4xx and 5xx status codes map to error, and all other status codes map to unset.

Similar to span.data, we can keep and deprecate the span.description field and forward it to span.name internally. For example, span.setDescription() would just call span.setName() internally.