-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Views API Prototype #596
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start, thanks for doing this @lzchen. I left some comments on the design, more to come.
- Remove bound instrument release: Based on the user-specs, bound instruments can use resources indefinitely. Also release bound instruments complicates the recording process.
The fact that we MAY waste resources doesn't necessarily mean that we SHOULD. We may want to revisit this once views are working.
- Remove label_keys (recommended keys) from metric constructor: The functionality of this is essentially moved to View configuration.
I think this is the right change in general, but AFAICT we never used recommended keys to set default label values anyway (cf. go's Record
), so label keys only affected bound instruments even before removing from metrics.
- Remove meter from metric def: This was in place before when LabelSet existed. We might need this in the future if we adopt the "naming of metrics based off meters" convention.
Where is this change? I see metrics still have a reference to meter.
- Remove batcher types (UngroupedBatcher): The functionality of this is replaced by View configuration.
Could you remove Batcher
completely then?
@@ -0,0 +1,107 @@ | |||
# Copyright The OpenTelemetry Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need views in the API package too if we want to let other libraries define views (as in e.g. https://www.javadoc.io/doc/io.opencensus/opencensus-contrib-grpc-metrics/0.19.2/io/opencensus/contrib/grpc/metrics/RpcViews.html) without taking a dependency on the SDK package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would users define the aggregator types to be used in Views? They exist currently in the SDK. Since we expect users to create custom aggregators of their own, should we move aggregators into the API package?
Also, I believe the ViewManager should stay in the SDK. It seems like implementation specific way of handling the Views, and we also don't want to introduce the whole "loading your own ViewManager" correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From discussions in the metrics SIG, I believe we are creating a "SDK-API" for now, so we don't expect users to define their own views from the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing my understanding so you can correct me if wrong:
OT API is for library authors to take dependency on. They use Metric API to emit metrics using instruments. They don't have any say in how the metrics are going to be aggregated (other than knowing the default aggregation). Views are not required to emit metrics, and hence not required to be part of API.
Views are supposed to be defined by the application owner - and application must have SDK installed to have metrics work. And they can define Views, as Views are part of SDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They use Metric API to emit metrics using instruments.
The Metric API is used to record and create metrics.
They don't have any say in how the metrics are going to be aggregated (other than knowing the default aggregation).
Since aggregation is an SDK concept, users that take dependency only on the API have no idea how to do this regardless of whether or not Views is part of the API.
Views are not required to emit metrics, and hence not required to be part of API.
Although the first part is true, it's not really the reason why the second part is true.We don't require Views to be part of the API because we don't expect users to define their own Views (like how Meter and Tracer are used).
Views are supposed to be defined by the application owner - and application must have SDK installed to have metrics work. And they can define Views, as Views are part of SDK.
Yes this is true., although the whole API SDK seperation doesn't really make too much sense in terms of metrics anyways. What use is just have a dependency on the API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My view of this separation is that there are different concerns. The "operator" is the person who knows which aggregations are useful for monitoring a system, who is different from the "developer" who writes the code. There are two separate APIs here, one for operators (Views) and one for developers (Metrics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the thread necromancy here, and sorry if this has been discussed elsewhere or is otherwise out of date.
How would users define the aggregator types to be used in Views? They exist currently in the SDK. Since we expect users to create custom aggregators of their own, should we move aggregators into the API package?
One option is to let views specify aggregators by class or interface name, similar to java SPI. This works nicely with static config, but is also brittle and would make for more complicated initialization logic.
I think it would be fine to include this in the "SDK API" instead, and similar to what we're doing for exporters now. Exporters could have been written to depend only on the API package, but it's not clear why a user would ever want a custom exporter, but not the SDK package (or a separate SDK package with the same exporter interfaces). It sounds like the same logic applies to views.
the whole API SDK seperation doesn't really make too much sense in terms of metrics anyways. What use is just have a dependency on the API?
The same as for tracing: so a client library (e.g. grpc) can participate in metrics generation (e.g. latency, bytes sent/received, etc.) if the SDK exists, but doesn't drag the SDK dependency into the project otherwise.
opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/batcher.py
Outdated
Show resolved
Hide resolved
@c24t To your question about removing |
If they're all static methods I'd prefer to lose the batcher class and put them in their own module, but up to you. |
They aren't all static methods. The batcher needs to maintain state with the |
@c24t |
|
||
# Register the views to the view manager to use the views | ||
meter.register_view(counter_view) | ||
meter.register_view(clicks_view) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qn on requests_counter
. This counter instrument is created much before a view involving it is created and registered. What would happen to the requests_counter.add()
calls - how would they be aggregated, before the register_view
call occurred?
Should we require that all views must be registered beforehand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we require that all views must be registered beforehand?
Good question. Yes I believe we should enforce this as a requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamic view management would certainly be more challenging. Let's leave it out for now.
) | ||
|
||
labels = {"environment": "staging"} | ||
|
||
# Views are used to define an aggregation type to use for a specific metric | ||
counter_view = View(requests_counter, CountAggregation()) | ||
clicks_view = View(clicks_counter, CountAggregation()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the creating a view requires a reference to the metric instrument itself. This means we can create a view only after creating the instrument itself.
Can we instead create Views with the name/description of the instrument? This can also allow us to load Views setting from json/yml etc.
{
"metername" : "MeterForHttpLib"
, "instrumentname", "RequestCounts"
, keys : ["httpurl","httpstatus"]
"aggregation":
{
Type:, "Histogram",
Options:
{
histogramboundaries....
}
}
}
When a meter/instrument matching the above is actually created in the program, it'll get its View config automatically from the config.
Thougts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that the application owner may not have access to requests_counter
object, if it was an instrument created inside another library. Only that library will have access to it.
The application owner will have to use "metername + instrumentname" or similar to specify an instrument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what I did for .net prototype.
MetricViewRegistry metricViewRegistry = new MetricViewRegistry();
metricViewRegistry.AddMetricView(new MetricView() { MeterName = "library1", InstrumentName = "testCounter", Aggregation = Aggregation.SUM, Keys = new List<string>() { "k1", "k2" } });
metricViewRegistry.AddMetricView(new MetricView() { MeterName = "library1", InstrumentName = "testCounter", Aggregation = Aggregation.SUM, Keys = new List<string>() { "k1" } });
metricViewRegistry.AddMetricView(new MetricView() { MeterName = "library1", InstrumentName = "testCounter", Aggregation = Aggregation.SUM, Keys = new List<string>() { "k2" } });
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a possible improvement in the future. For the prototype, I believe it is okay to have just the basic functionality and iterate after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cijothomas I like what you posted here. We should be imagining that views will be configured from a .yaml
file, which many but not all users will want.
) | ||
|
||
# The view manager handles all updates to aggregators | ||
self._metric.meter.view_manager.record(self._metric, self._labels, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BoundInstruments are meant to offer highest performance to users, as it avoids look up to which timeseries the value should go. The original code met this requirement, as it had a reference to the aggregator and just needed to call aggregator.update.
With the new code, the boundinstrument is calling view_manager.record. The view_manager has to do a lookup first with the labels, to find out which timeseries this update should go into. This would be defeating the purpose of boundinstruments.
I don't have solution to this - but i believe this is a p0 scenario to solve. As discussed in our offline conversation, I'd add a comment in view otep as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this does look like something we have to solve to make views viable.
@lzchen and I talked about this offline yesterday and we may be able to avoid the lookup cost here (and a lot of complexity elsewhere) if we don't support adding and removing views on the fly, e.g. by replacing View.__init__
with ViewManager.configure(views: list[View])
. In that case we could create each bound instrument with a set of aggregators without having to worry about keeping that set up to date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@c24t Yes! I also did some prototyping in .NET.
If Views config is fixed , then it simplifies a lot of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cijothomas
Updated the implementation to include high performance of bound instruments (aggregators are initialized upon boundinstrument creation based off of configured views at point of creation).
@@ -396,6 +341,12 @@ def unregister_observer(self, observer: "Observer") -> None: | |||
with self.observers_lock: | |||
self.observers.remove(observer) | |||
|
|||
def register_view(self, view): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this registers view for a particular Meter
right? I'd say Views are more global thingie, and hence must be part of MeterProvider.
The application may not have access to the Meter itself to add views to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are Views
global? A use case might be defining different views depending on what the meter (or source) is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few blocking comments, a few superficial.
This PR is overdue to be merged, and I don't want to stand in its way. Most fixes can happen in other PRs, but I would like to see that this works for multiple differently-configured aggregators for the same metric, and confirm that we're only creating a single ViewManager
.
@@ -0,0 +1,107 @@ | |||
# Copyright The OpenTelemetry Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the thread necromancy here, and sorry if this has been discussed elsewhere or is otherwise out of date.
How would users define the aggregator types to be used in Views? They exist currently in the SDK. Since we expect users to create custom aggregators of their own, should we move aggregators into the API package?
One option is to let views specify aggregators by class or interface name, similar to java SPI. This works nicely with static config, but is also brittle and would make for more complicated initialization logic.
I think it would be fine to include this in the "SDK API" instead, and similar to what we're doing for exporters now. Exporters could have been written to depend only on the API package, but it's not clear why a user would ever want a custom exporter, but not the SDK package (or a separate SDK package with the same exporter interfaces). It sounds like the same logic applies to views.
the whole API SDK seperation doesn't really make too much sense in terms of metrics anyways. What use is just have a dependency on the API?
The same as for tracing: so a client library (e.g. grpc) can participate in metrics generation (e.g. latency, bytes sent/received, etc.) if the SDK exists, but doesn't drag the SDK dependency into the project otherwise.
opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/aggregate.py
Outdated
Show resolved
Hide resolved
self.current[">"] = 0 | ||
self.checkpoint[">"] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're better off having N+2 buckets instead of an ordered dict with special keys here.
E.g. boundaries [1, 2]
would produce buckets [0, 1), [1, 2), [2, inf)
. AFAIK we also don't have a decision on negative bucket boundaries in OT yet. In OC we dropped measurements < 0 and didn't include them in any bucket count. The alternative is for the first bucket to stretch all the way to negative infinity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that not N+1 buckets? Currently [1, 2]
would produce (-inf, 1), [1, 2), [2, inf)
so the only difference is the 0 cut off. But I think that decision can wait for the spec to specify it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N+1 buckets. Not content just to have them in my code, I've got off-by-one bugs in my comments too.
@@ -366,11 +352,13 @@ def __init__( | |||
instrumentation_info: "InstrumentationInfo", | |||
): | |||
self.instrumentation_info = instrumentation_info | |||
self.batcher = UngroupedBatcher(source.stateful) | |||
self.batcher = Batcher(source.stateful) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we lose this whole class yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure. SDK specs are still not out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should feel free to move ahead of the spec, or this may never get written.
def generate_view_datas(self, metric, labels): | ||
view_datas = set() | ||
views = self.views.get(metric) | ||
# No views configured, use default aggregations | ||
if views is None: | ||
aggregator = get_default_aggregator(metric) | ||
# Default config aggregates on all label keys | ||
view_datas.add(ViewData(tuple(labels), aggregator)) | ||
else: | ||
for view in views: | ||
updated_labels = [] | ||
if view.config == ViewConfig.LABEL_KEYS: | ||
label_key_set = set(view.label_keys) | ||
for label in labels: | ||
# Only keep labels that are in configured label_keys | ||
if label[0] in label_key_set: | ||
updated_labels.append(label) | ||
updated_labels = tuple(updated_labels) | ||
elif view.config == ViewConfig.UNGROUPED: | ||
updated_labels = labels | ||
# ViewData that is duplicate (same labels and aggregator) will be | ||
# aggregated together as one | ||
view_datas.add( | ||
ViewData(tuple(updated_labels), view.aggregator) | ||
) | ||
return view_datas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, the goal here is fast aggregator updates. When the user creates a bound instrument, we grab any aggregators currently associated with that bound instrument's metric via a view. That way when the user calls bound_instrument.record
we can update the aggregator immediately without another lookup.
I think I understand how you got here, but it's very surprising to see BoundInstrument
s hold references to a bunch of ViewData
s. I don't have a suggestion to improve this, but it looks very suspicious and I'm interested to hear what others think.
I think we would have come up with a very different design if we had written a single record path first, then added views, then added optimizations for bound instruments on top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked the owner of ViewDatas to View
, so bound instruments now only have references to the unique viewdatas. Hopefully that addresses some of your concerns here
opentelemetry-sdk/src/opentelemetry/sdk/metrics/export/batcher.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's two more blocking issues that I know of:
-
We currently create the aggregator, for example
SumAggregator()
and then pass it in to the View constructor. However, we need a seperate aggregator instantiation for every different set of labels, while currently it just uses the exact same aggregator. Need to pass in the type of aggregator and args seperately to View. -
When you use dropped labels (ie
ViewConfig.LABEL_KEYS
), a new instrument is created for every combination including the dropped labels, where instead the dropped labels should be discarded first before picking a matching instrument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good @cnnradams, and good call adding aggregator_config
.
def __eq__(self, other): | ||
return ( | ||
self.metric == other.metric | ||
and self.aggregator.__class__ == other.aggregator.__class__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless all aggregator instances of the same class are equal this is still a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to go ahead, @lzchen please merge this when you're ready.
Views API Prototype. Solves [#578]. Views is pretty much the only large component that is remaining for metrics that is needed to go GA. It was discussed with @c24t that even though the otep is not merged yet, having a prototype will entice people to talk about this and push this forward. I STRONGLY encourage everyone to read the otep PR before reviewing this code. This is also WIP so I have not fixed the tests yet.
Workflow of metric recording -> collection -> export logic.
Notice the original api surface of how to RECORD metrics are not changed (there are still metric instrument/bound instrument/batch recording).
Previously, bound instruments were 1:1 with the aggregators that belonged to them (for specific metric, labels pair). Now, a bound instrument has a sequence of aggregator, label pairs (view_data), which are initialized upon creation of the bound instrument. Per record, each view_data is then recorded. This is to maintain the instant lookup behavior of the bound instruments to labels that are associated with them.
The
ViewManager
contains unique views (uniqueness is based on metric, aggregator type and label keys).View
s are simply a container that defines a relationship between metric and a specific aggregator exists (i.e. for this metric, I want to aggregate the data by this aggregator type).ViewData
are generated upon boundinstrument creation, which are container representing aggregator/labels. The labels are generated based upon the view configuration (drop keys, ungrouped, etc). This means that per record to boundinstrument, multipleViewData
s could be updated as a result (like a fan-out pattern).Summary of changes below and their reasonings:
ViewData
) to broadcast updates to everytime there is a recordmetric
to bound instrument constructor, the metric that created the bound instrument: This is because bound instrument is the path that all methods of updating a metric instrument takes underlying. So the view manager needs the metric reference that called the update.ViewData
s in bound instruments for metrics that are createdaggregator_for
taken out of batcher, moved to views filemeter
reference removed forObserver
types (previously used foraggregator_for
)NOTE: View API only affects METRIC types. Observers functionality does not change.