Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Span processor clarifications #1135

36 changes: 22 additions & 14 deletions specification/trace/sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -287,16 +287,17 @@ invocations. The span processors are invoked only when
[`IsRecording`](api.md#isrecording) is true.

Built-in span processors are responsible for batching and conversion of spans to
exportable representation and passing batches to exporters.
exportable representation and passing batches to exporters or alternatives to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if a GA spec should point to an experimental spec, thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think zPages are implemented as a span processor, not something that receives data through a span processor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be true then to say "and passing batches to exporters or other span processors," or are there still other alternatives?

Copy link
Author

@rachelleahklein rachelleahklein Oct 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to @reyang's point, I am not even sure referring to zPages specifically makes sense here. My only goal is to have a clear and universal definition of span processors in this sentence. Reading this section, I was confused by the fact that span processors were defined as "responsible for... passing batches to exporters" but then a few lines down, exporters were described as optional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a problem referring to experimental feature as it makes the point that this is an extensibility point in SDK that can be useful for purposes including experimentation with the new features.

Comment on lines 289 to +290
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are mixing 2 different things in this sentence:

  1. What is a SpanProcessor, and how can this be used to interact with the SDK.
  • The SpanProcessor in general does not need to batch or talk to the exporter, it is a simple "hook" that allows custom processors to interact with the Span lifetime events (on start and on end). For example somebody can implement a processor that tracks all the active requests (one of the functionality that we want to offer in the zPages). Others can implement different other functionality.
  1. What are the "built-in" SpanProcessors.
  • We will provide 2 flavors of SpanProcessors (Simple/Batch) that are able to batch (or not in case of simple), transform the Spans in an "exportable" format, and call the "exporter".

exporters such as [z-pages](../../experimental/trace/zpages.md).

Span processors can be registered directly on SDK `TracerProvider` and they are
invoked in the same order as they were registered.

Each processor registered on `TracerProvider` is a start of pipeline that consist
of span processor and optional exporter. SDK MUST allow to end each pipeline with
individual exporter.
Each processor registered on `TracerProvider` is a start of a pipeline that consists
of one or more span processors and, optionally, one or more exporters. The SDK MUST
allow ending each pipeline with an individual exporter.

Comment on lines +296 to 299
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this paragraph. Or do you know what is meant by it? AFAIK the TracerProvider must allow the registration of multiple SpanProcessors, but each built-in span processor only supports a single exporter.

Suggested change
Each processor registered on `TracerProvider` is a start of a pipeline that consists
of one or more span processors and, optionally, one or more exporters. The SDK MUST
allow ending each pipeline with an individual exporter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we definitely need some better description of the possible compositions of SpanProcessors and SpanExporters. E.g. we MUST indicate somewhere that there should be a SpanProcessor before each SpanExporter (purely from methods signatures point of view). We also MUST stress and clearly describe that multiple registered SpanProcessor are "siblings" and do NOT call each other. All this has already created some confusion in the past.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support removing this paragraph. We don't need to specify that pipelines end in exporters, just that the two builtin span processors end in exporters. The metric SDK hasn't thus far mentioned support for multiple processors, and why should we? It's possible to define a multi-processor that calls a series of individual processors. Should the default trace SDK include a builtin multi-processor? I suppose they should if it's common to implement multiple trace exporters with different processor settings, but that does not sound common. Wouldn't you prefer to have a single processor call multiple exporters? Why do we need to specify these things?

multiple registered SpanProcessor are "siblings" and do NOT call each other

If you only specify SDK support for a single processor, this problem vanishes. Potentially you can also specify a multi-processor and/or a multi-exporter, but are you sure these are commonly needed?

Copy link
Author

@rachelleahklein rachelleahklein Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, as someone relatively new to OpenTelemetry concepts, it was helpful to me to have a textual description of the span processor pipeline in addition to the diagram that also exists in this section. There may be some additional nuance that needs to be added, but having some form of this paragraph seems like a good idea to me.

SDK MUST allow users to implement and configure custom processors and decorate
The SDK MUST allow users to implement and configure custom processors and decorate
built-in processors for advanced scenarios such as tagging or filtering.

rachelleahklein marked this conversation as resolved.
Show resolved Hide resolved
The following diagram shows `SpanProcessor`'s relationship to other components
Expand Down Expand Up @@ -391,35 +392,42 @@ make the flush timeout configurable.

The standard OpenTelemetry SDK MUST implement both simple and batch processors,
as described below. Other common processing scenarios should be first considered
for implementation out-of-process in [OpenTelemetry Collector](../overview.md#collector)
for implementation out-of-process in an [OpenTelemetry Collector](../overview.md#collector).

#### Simple processor

This is an implementation of `SpanProcessor` which passes finished spans
and passes the export-friendly span data representation to the configured
`SpanExporter`, as soon as they are finished.
and the export-friendly span data representation to the configured
`SpanExporter` as soon as they are finished.

Typically, the simple processor will be most suitable for use in testing and/or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Java this has become the most common span processor as exporters use async functions to send data instead of relying on a background thread. Not how the spec was designed probably, but if it works.
CC @anuraaga

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your valuable feedback on this section as well as the batch processor one, @Oberon00. Do you think more should be added to clarify that while these statements may "typically" be true, it also depends on language implementation? I'm not sure how much detail to get into here.

in scenarios where creating multiple threads is not desirable, since batching
span processors will typically be threaded in order to pass spans on to exporters
in the background.

**Configurable parameters:**

* `exporter` - the exporter where the spans are pushed.

#### Batching processor

This is an implementation of the `SpanProcessor` which create batches of finished
spans and passes the export-friendly span data representations to the
configured `SpanExporter`.
This is an implementation of `SpanProcessor` which collects finished spans into batches
and passes the export-friendly span data representations to the associated `SpanExporter`.

Typically, the batching processor will be more suitable for production environments
than the simple processor.
Comment on lines +417 to +418
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on the language, e.g. in Java exporters do batching themselves sometimes. But typically, I think that BatchSpanProcessor should be the expected scenario, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how it's written. Wdyt @Oberon00?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR's wording is fine here 👍


**Configurable parameters:**

* `exporter` - the exporter where the spans are pushed.
* `maxQueueSize` - the maximum queue size. After the size is reached spans are
dropped. The default value is `2048`.
* `maxQueueSize` - the maximum queue size. After this limit is reached, spans are
skipped by this BatchSpanProcessor. The default value is `2048`.
* `scheduledDelayMillis` - the delay interval in milliseconds between two
consecutive exports. The default value is `5000`.
* `exportTimeoutMillis` - how long the export can run before it is cancelled.
The default value is `30000`.
* `maxExportBatchSize` - the maximum batch size of every export. It must be
smaller or equal to `maxQueueSize`. The default value is `512`.
smaller than or equal to `maxQueueSize`. The default value is `512`.

## Span Exporter

Expand Down