Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

A proposal for SDK configurations for metric aggregation (Basic Views) #126

Merged
98 changes: 98 additions & 0 deletions text/0126-Configurable-Metric-Aggregations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# A Proposal For SDK Support for Configurable Batching and Aggregations

Add support to the default SDK for the ability to configure Metric Aggregations.

## Motivation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see also another config, simpler:

  • Change default temporality for instrument type - separated than view or selector.
  • Enable/Disable an instrument default aggregation.

Maybe this are out of scope, but kind of think that they may be configured separately than the views.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that the "disable" use-case could just be done by setting the aggregation to a no-op aggregator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worried that having this "only one" view requirement and relying on that to enable/disable default aggregations, or even doing some merging of "Views" may be too tricky and too restrictive for the future.

I would like you to think how would you do this if full views API is supported.

Bogdan self brainstorming:

Maybe having a way to retrieve all the configured views (InstrumentDescriptor + View) and allow:

  • "Delete" - the user will use this to disable default aggregation for example
  • You can clone/change the retrieved "View" after deleting it and set it back to update the "temporality"

It seems a lot of work to just disable "default aggregations". Maybe treat "default views" a bit special, and offer specific APIs to manipulate "all of them" like:

  • "default views temporality" - these APIs may include selectors to facilitate.
  • "disable all default views"
    But this way we make it clear that these "DefaultViewsConfiguration" API refers to only the default views and you no longer need the "one view only" requirement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way is to have a "strategy" param to the API: replace all (only supported option now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of considering this OTEP the "default views configuration". Maybe we can just simplify it down to just selecting by instrument type for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can be convinced that labels can be also an operation available for default views, but probably we can discuss that later. Let's start with something then incremently add more stuff

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if for selector only type is important now, we can start only with that. But these decisions I think can be made in the specs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with leaving details to the specs. I'd be happy to drop label support for now, to keep things ultra-simple and build from there. Do you want this OTEP updated to reflect that, or is that something we can wrangle out in the specs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a sentence that say details still to be polished in specs is more than enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few more open questions to capture the discussion, and a note that we will resolve them in the spec. Sound ok?


OpenTelemetry's architecture separates the concerns of instrumentation and operation. The Metric Instruments
provided by the Metric API are all defined to have a default aggregation. And, by default, aggregations are
performed with all Labels being used to define a unit of aggregation. Although this is a good default
configuration for the SDK to provide, more configurability is needed.

There are 3 main use-cases that this proposal is intended to address:

1) The application developer/operator wishes to use an aggregation other than the default provided by the SDK
for a given instrument or set of instruments.
2) An exporter author wishes to inform the SDK what "Temporality" (delta vs. cumulative) the resulting metric
jkwatson marked this conversation as resolved.
Show resolved Hide resolved
data points represent. "Delta" means only metric recordings since the last reporting interval are considered
in the aggregation, and "Cumulative" means that all metric recordings over the lifetime of the Instrument are
considered in the aggregation.
3) The application developer/operator wishes to constrain the cardinality of labels for metrics being reported
to the metric vendor/backend of choice.

## Explanation

I propose a new feature for the default SDK, available on the interface of the SDK's MeterProvider implementation, to configure
the batching strategies and aggregations that will be used by the SDK when metric recordings are made.

The basic API has two parts.

* InstrumentSelector - Enables specifying the selection of one or more instruments for the configuration to apply to.
- Selection options include: the instrument type (Counter, ValueRecorder, etc), and a regex for instrument name.
jkwatson marked this conversation as resolved.
Show resolved Hide resolved
- If more than one option is provided, they are considered additive.
- Example: select all ValueRecorders whose name ends with ".duration".
* Configuration - configures how the batching and aggregation should be done.
- 3 things can be specified: The aggregation (Sum, MinMaxSumCount, Histogram, etc), the "temporality" of the batching,
and a set of pre-defined labels to consider as the subset to be used for aggregations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have in my mind this optimization that I want you to not make me hard to achieve when the full views is approved. There needs to be 2 lists of labels (labels from the instruments record calls, labels from correlation context). So probably I would just suggest to make sure we clearly specify that these are labels from the record calls.

Another suggestion that I have is that later (most likely) we want to do label rename (or do we?) maybe not.

So maybe a design like:

Labels: List. Where then in the LabelKey we can have a source property, as well as maybe an "exported name" or other things, may not be a bad idea.

Right now we can start with only one string being in the LabelKey but later we can build on top of that.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the labels part of this is definitely the most fuzzy in my head. I'm not 100% sure of what is needed at this level. I was thinking initially of just having the configuration contain the set of labels to reduce down to, and not consider renaming or re-mapping or anything fancy like that.

So, assuming the instrumentation author has provided labels [a,b,c,d,e] the view might be configured just with [b,e].

- Note: "temporality" can be one of "DELTA" and "CUMULATIVE" and specifies whether the values of the aggregation
are reset after a collection is done or not, respectively.
- If not all are specified, then the others should be considered to be requesting the default.
- Examples:
- Use a MinMaxSumCount aggregation, and provide delta-style batching.
- Use a Histogram aggregation, and only use two labels "route" and "error" for aggregations.
- Use a quantile aggregation, and drop all labels when aggregating.

In this proposal, there is only one configuration associated with each selector.
jkwatson marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean if two selectors match the same instrument only one View is active?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, with some sort of precedence to be determined later. (see my java prototype for an example).


As a concrete example, in Java, this might look something like this:

```java
// get a handle to the MeterSdkProvider (note, this is concrete name of the default SDK class in java, not a general SDK)
MeterSdkProvider meterProvider = OpenTelemetrySdk.getMeterProvider();

// create a selector to select which instruments to customize:
InstrumentSelector instrumentSelector = InstrumentSelector.newBuilder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably at least the "meter name" must be provided. This may get into too many details that we can cover in the specs, if that is the intention maybe just remind this :))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that you could provide just the instrument type, and get all of them (as a way to set the default).

.instrumentType(InstrumentType.COUNTER)
.build();

// create a configuration of how you want the metrics aggregated:
Configuration configuration =
Configuration.create(Aggregations.minMaxSumCount(), Temporality.DELTA);
jkwatson marked this conversation as resolved.
Show resolved Hide resolved

//register the configuration with the MeterSdkProvider
meterProvider.registerAggregation(instrumentSelector, configuration);
```

## Internal details

This OTEP does not specify how this should be implemented in a particular language, only the functionality that is desired.

A prototype with a partial implementation of this proposal in Java is available in PR form [here](https://github.com/open-telemetry/opentelemetry-java/pull/1412)

## Trade-offs and mitigations

This does not intend to deliver a full "Views" API, although it may eventually form the basis for one. The goal here is
simply to allow configuration of the batching and aggregation by operators and exporter authors.

This does not intend to specify the exact interface for providing these configurations, nor does it
consider a non-programmatic configuration option.

## Prior art and alternatives

* Prior Art is probably mostly in the [OpenCensus Views](https://opencensus.io/stats/view/) system.
* Another [OTEP](https://github.com/open-telemetry/oteps/pull/89) attempted to address building a Views API.

## Open questions

1. Should custom aggregations should be allowable for all instruments? How should an SDK respond to a request for a non-supported aggregation?
jkwatson marked this conversation as resolved.
Show resolved Hide resolved
2. Should the requesting of DELTA vs. CUMULATIVE be only available via an exporter-only API, rather than generally available to all operators?
3. Is regex-based name matching too broad and dangerous? Would the alternative (having to know the exact name of all instruments to configure) be too onerous?
4. Is there anything in this proposal that would make implementing a full Views API (i.e. having multiple, named aggregations per instrument) difficult?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this question needs to be answered before this is merged. If the answer is "yes" this could be moving us away from our goal instead of towards it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if anyone thinks that this proposal will stop views from happening, please speak up now, so we can resolve it ASAP.

5. How should an exporter interact with the SDK for which it is configured, in order to change aggregation settings?

## Future possibilities

What are some future changes that this proposal would enable?

- A full-blown views API, which would allow multiple "views" per instrument. It's unclear how an exporter would specify which one it wanted, or if it would all the generated metrics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a rough prototype of the Views API that just got merged for Python. Right now the ability for multiple views per instrument is possible. Each view can specify the aggregation type, default set of label keys to aggregate by and aggregator configuration (histogram buckets and such). Something that hasn't been discussed yet is the temporality of the instruments (which is configured on the batcher and an SDK property, so currently all instruments will have the same behaviour). I'm not sure if there is need of new mechanisms exposed to the user to construct these configurations when Views achieve something similar (maybe they will turn into the Views API eventually anyways?).

Also, to maybe open up the floor for addressing open questions 3 and 4, could it possible to have the views handle all of the instrument related configuration that the exporter doesn't need to specify, and just have the temporality be controlled by the SDK (since exporters need to know about it). So for 3, the granularity of control for those configuration fields would be per instrument, and the granularity for temporality would be via regex. Although it might be too complex/too much configuration for a user to have to do. Maybe there is some way we can have the Views API handle all of the configuration (although then the exporter would have to know about it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not discuss a full-blown Views API in this OTEP. This OTEP is intentionally much more limited in scope, as I believe it is more achievable cross-language to be delivered in the GA timeframe.

Details for the open questions should be resolved in the specification PR. I'd like general consensus on an approach, via OTEP before we dive into the details, which should be done in the specification.

- Additional non-programmatic configuration options.