-
Notifications
You must be signed in to change notification settings - Fork 164
A proposal for SDK configurations for metric aggregation (Basic Views) #126
Changes from 7 commits
a6a5ead
7d07fee
16f0104
421170c
efde65e
d064c96
80675a1
55160e8
b1fe6f4
66d0d33
c09e172
7dc3948
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
# A Proposal For SDK Support for Configurable Batching and Aggregations | ||
|
||
Add support to the default SDK for the ability to configure Metric Aggregations. | ||
|
||
## Motivation | ||
|
||
OpenTelemetry's architecture separates the concerns of instrumentation and operation. The Metric Instruments | ||
provided by the Metric API are all defined to have a default aggregation. And, by default, aggregations are | ||
performed with all Labels being used to define a unit of aggregation. Although this is a good default | ||
configuration for the SDK to provide, more configurability is needed. | ||
|
||
There are 3 main use-cases that this proposal is intended to address: | ||
|
||
1) The application developer/operator wishes to use an aggregation other than the default provided by the SDK | ||
for a given instrument or set of instruments. | ||
2) An exporter author wishes to inform the SDK what "Temporality" (delta vs. cumulative) the resulting metric | ||
jkwatson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
data points represent. "Delta" means only metric recordings since the last reporting interval are considered | ||
in the aggregation, and "Cumulative" means that all metric recordings over the lifetime of the Instrument are | ||
considered in the aggregation. | ||
3) The application developer/operator wishes to constrain the cardinality of labels for metrics being reported | ||
to the metric vendor/backend of choice. | ||
|
||
## Explanation | ||
|
||
I propose a new feature for the default SDK, available on the interface of the SDK's MeterProvider implementation, to configure | ||
the batching strategies and aggregations that will be used by the SDK when metric recordings are made. | ||
|
||
The basic API has two parts. | ||
|
||
* InstrumentSelector - Enables specifying the selection of one or more instruments for the configuration to apply to. | ||
- Selection options include: the instrument type (Counter, ValueRecorder, etc), and a regex for instrument name. | ||
jkwatson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- If more than one option is provided, they are considered additive. | ||
- Example: select all ValueRecorders whose name ends with ".duration". | ||
* Configuration - configures how the batching and aggregation should be done. | ||
- 3 things can be specified: The aggregation (Sum, MinMaxSumCount, Histogram, etc), the "temporality" of the batching, | ||
and a set of pre-defined labels to consider as the subset to be used for aggregations. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have in my mind this optimization that I want you to not make me hard to achieve when the full views is approved. There needs to be 2 lists of labels (labels from the instruments record calls, labels from correlation context). So probably I would just suggest to make sure we clearly specify that these are labels from the record calls. Another suggestion that I have is that later (most likely) we want to do label rename (or do we?) maybe not. So maybe a design like: Labels: List. Where then in the LabelKey we can have a source property, as well as maybe an "exported name" or other things, may not be a bad idea. Right now we can start with only one string being in the LabelKey but later we can build on top of that. What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the labels part of this is definitely the most fuzzy in my head. I'm not 100% sure of what is needed at this level. I was thinking initially of just having the configuration contain the set of labels to reduce down to, and not consider renaming or re-mapping or anything fancy like that. So, assuming the instrumentation author has provided labels [a,b,c,d,e] the view might be configured just with [b,e]. |
||
- Note: "temporality" can be one of "DELTA" and "CUMULATIVE" and specifies whether the values of the aggregation | ||
are reset after a collection is done or not, respectively. | ||
- If not all are specified, then the others should be considered to be requesting the default. | ||
- Examples: | ||
- Use a MinMaxSumCount aggregation, and provide delta-style batching. | ||
- Use a Histogram aggregation, and only use two labels "route" and "error" for aggregations. | ||
- Use a quantile aggregation, and drop all labels when aggregating. | ||
|
||
In this proposal, there is only one configuration associated with each selector. | ||
jkwatson marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does that mean if two selectors match the same instrument only one View is active? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. right, with some sort of precedence to be determined later. (see my java prototype for an example). |
||
|
||
As a concrete example, in Java, this might look something like this: | ||
|
||
```java | ||
// get a handle to the MeterSdkProvider (note, this is concrete name of the default SDK class in java, not a general SDK) | ||
MeterSdkProvider meterProvider = OpenTelemetrySdk.getMeterProvider(); | ||
|
||
// create a selector to select which instruments to customize: | ||
InstrumentSelector instrumentSelector = InstrumentSelector.newBuilder() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably at least the "meter name" must be provided. This may get into too many details that we can cover in the specs, if that is the intention maybe just remind this :)) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking that you could provide just the instrument type, and get all of them (as a way to set the default). |
||
.instrumentType(InstrumentType.COUNTER) | ||
.build(); | ||
|
||
// create a configuration of how you want the metrics aggregated: | ||
Configuration configuration = | ||
Configuration.create(Aggregations.minMaxSumCount(), Temporality.DELTA); | ||
jkwatson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
//register the configuration with the MeterSdkProvider | ||
meterProvider.registerAggregation(instrumentSelector, configuration); | ||
``` | ||
|
||
## Internal details | ||
|
||
This OTEP does not specify how this should be implemented in a particular language, only the functionality that is desired. | ||
|
||
A prototype with a partial implementation of this proposal in Java is available in PR form [here](https://github.com/open-telemetry/opentelemetry-java/pull/1412) | ||
|
||
## Trade-offs and mitigations | ||
|
||
This does not intend to deliver a full "Views" API, although it may eventually form the basis for one. The goal here is | ||
simply to allow configuration of the batching and aggregation by operators and exporter authors. | ||
|
||
This does not intend to specify the exact interface for providing these configurations, nor does it | ||
consider a non-programmatic configuration option. | ||
|
||
## Prior art and alternatives | ||
|
||
* Prior Art is probably mostly in the [OpenCensus Views](https://opencensus.io/stats/view/) system. | ||
* Another [OTEP](https://github.com/open-telemetry/oteps/pull/89) attempted to address building a Views API. | ||
|
||
## Open questions | ||
|
||
1. Should custom aggregations should be allowable for all instruments? How should an SDK respond to a request for a non-supported aggregation? | ||
jkwatson marked this conversation as resolved.
Show resolved
Hide resolved
|
||
2. Should the requesting of DELTA vs. CUMULATIVE be only available via an exporter-only API, rather than generally available to all operators? | ||
3. Is regex-based name matching too broad and dangerous? Would the alternative (having to know the exact name of all instruments to configure) be too onerous? | ||
4. Is there anything in this proposal that would make implementing a full Views API (i.e. having multiple, named aggregations per instrument) difficult? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel like this question needs to be answered before this is merged. If the answer is "yes" this could be moving us away from our goal instead of towards it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, if anyone thinks that this proposal will stop views from happening, please speak up now, so we can resolve it ASAP. |
||
5. How should an exporter interact with the SDK for which it is configured, in order to change aggregation settings? | ||
|
||
## Future possibilities | ||
|
||
What are some future changes that this proposal would enable? | ||
|
||
- A full-blown views API, which would allow multiple "views" per instrument. It's unclear how an exporter would specify which one it wanted, or if it would all the generated metrics. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a rough prototype of the Views API that just got merged for Python. Right now the ability for multiple views per instrument is possible. Each view can specify the aggregation type, default set of label keys to aggregate by and aggregator configuration (histogram buckets and such). Something that hasn't been discussed yet is the temporality of the instruments (which is configured on the batcher and an SDK property, so currently all instruments will have the same behaviour). I'm not sure if there is need of new mechanisms exposed to the user to construct these configurations when Views achieve something similar (maybe they will turn into the Views API eventually anyways?). Also, to maybe open up the floor for addressing open questions 3 and 4, could it possible to have the views handle all of the instrument related configuration that the exporter doesn't need to specify, and just have the temporality be controlled by the SDK (since exporters need to know about it). So for 3, the granularity of control for those configuration fields would be per instrument, and the granularity for temporality would be via regex. Although it might be too complex/too much configuration for a user to have to do. Maybe there is some way we can have the Views API handle all of the configuration (although then the exporter would have to know about it). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd rather not discuss a full-blown Views API in this OTEP. This OTEP is intentionally much more limited in scope, as I believe it is more achievable cross-language to be delivered in the GA timeframe. Details for the open questions should be resolved in the specification PR. I'd like general consensus on an approach, via OTEP before we dive into the details, which should be done in the specification. |
||
- Additional non-programmatic configuration options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see also another config, simpler:
Maybe this are out of scope, but kind of think that they may be configured separately than the views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that the "disable" use-case could just be done by setting the aggregation to a no-op aggregator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worried that having this "only one" view requirement and relying on that to enable/disable default aggregations, or even doing some merging of "Views" may be too tricky and too restrictive for the future.
I would like you to think how would you do this if full views API is supported.
Bogdan self brainstorming:
Maybe having a way to retrieve all the configured views (InstrumentDescriptor + View) and allow:
It seems a lot of work to just disable "default aggregations". Maybe treat "default views" a bit special, and offer specific APIs to manipulate "all of them" like:
But this way we make it clear that these "DefaultViewsConfiguration" API refers to only the default views and you no longer need the "one view only" requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way is to have a "strategy" param to the API: replace all (only supported option now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of considering this OTEP the "default views configuration". Maybe we can just simplify it down to just selecting by instrument type for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can be convinced that labels can be also an operation available for default views, but probably we can discuss that later. Let's start with something then incremently add more stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if for selector only type is important now, we can start only with that. But these decisions I think can be made in the specs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with leaving details to the specs. I'd be happy to drop label support for now, to keep things ultra-simple and build from there. Do you want this OTEP updated to reflect that, or is that something we can wrangle out in the specs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a sentence that say details still to be polished in specs is more than enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few more open questions to capture the discussion, and a note that we will resolve them in the spec. Sound ok?