Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Views API Prototype #596

Merged
merged 47 commits into from
Aug 3, 2020
Merged

[WIP] Views API Prototype #596

merged 47 commits into from
Aug 3, 2020

Conversation

lzchen
Copy link
Contributor

@lzchen lzchen commented Apr 17, 2020

Views API Prototype. Solves [#578]. Views is pretty much the only large component that is remaining for metrics that is needed to go GA. It was discussed with @c24t that even though the otep is not merged yet, having a prototype will entice people to talk about this and push this forward. I STRONGLY encourage everyone to read the otep PR before reviewing this code. This is also WIP so I have not fixed the tests yet.

Workflow of metric recording -> collection -> export logic.
Notice the original api surface of how to RECORD metrics are not changed (there are still metric instrument/bound instrument/batch recording).

Previously, bound instruments were 1:1 with the aggregators that belonged to them (for specific metric, labels pair). Now, a bound instrument has a sequence of aggregator, label pairs (view_data), which are initialized upon creation of the bound instrument. Per record, each view_data is then recorded. This is to maintain the instant lookup behavior of the bound instruments to labels that are associated with them.

The ViewManager contains unique views (uniqueness is based on metric, aggregator type and label keys). View s are simply a container that defines a relationship between metric and a specific aggregator exists (i.e. for this metric, I want to aggregate the data by this aggregator type).

ViewData are generated upon boundinstrument creation, which are container representing aggregator/labels. The labels are generated based upon the view configuration (drop keys, ungrouped, etc). This means that per record to boundinstrument, multiple ViewData s could be updated as a result (like a fan-out pattern).

Summary of changes below and their reasonings:

  • Add Views API -> ViewManager, View
  • View configuration (drop keys, ungrouped, etc)
  • If no views are defined for a metric, a default aggregation is used
  • Remove label_keys (recommended keys) from metric constructor: The functionality of this is essentially moved to View configuration.
  • Remove batcher types (UngroupedBatcher): The functionality of this is replaced by View configuration.
  • Verify aggregation types when merging: Since various aggregations can be defined for a single metric now, verify the aggregation types when merging (should never be different but is a good sanity check).
  • BoundInstruments hold the sequence of Aggregators with generated labels (ViewData) to broadcast updates to everytime there is a record
  • Added metric to bound instrument constructor, the metric that created the bound instrument: This is because bound instrument is the path that all methods of updating a metric instrument takes underlying. So the view manager needs the metric reference that called the update.
  • On meter collect, iterate through all ViewData s in bound instruments for metrics that are created
  • aggregator_for taken out of batcher, moved to views file
  • meter reference removed for Observer types (previously used for aggregator_for)

NOTE: View API only affects METRIC types. Observers functionality does not change.

@lzchen lzchen requested a review from a team April 17, 2020 18:29
@lzchen lzchen added metrics sdk Affects the SDK package. labels Apr 17, 2020
Copy link
Member

@c24t c24t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start, thanks for doing this @lzchen. I left some comments on the design, more to come.

  • Remove bound instrument release: Based on the user-specs, bound instruments can use resources indefinitely. Also release bound instruments complicates the recording process.

The fact that we MAY waste resources doesn't necessarily mean that we SHOULD. We may want to revisit this once views are working.

  • Remove label_keys (recommended keys) from metric constructor: The functionality of this is essentially moved to View configuration.

I think this is the right change in general, but AFAICT we never used recommended keys to set default label values anyway (cf. go's Record), so label keys only affected bound instruments even before removing from metrics.

  • Remove meter from metric def: This was in place before when LabelSet existed. We might need this in the future if we adopt the "naming of metrics based off meters" convention.

Where is this change? I see metrics still have a reference to meter.

  • Remove batcher types (UngroupedBatcher): The functionality of this is replaced by View configuration.

Could you remove Batcher completely then?

@@ -0,0 +1,107 @@
# Copyright The OpenTelemetry Authors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need views in the API package too if we want to let other libraries define views (as in e.g. https://www.javadoc.io/doc/io.opencensus/opencensus-contrib-grpc-metrics/0.19.2/io/opencensus/contrib/grpc/metrics/RpcViews.html) without taking a dependency on the SDK package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would users define the aggregator types to be used in Views? They exist currently in the SDK. Since we expect users to create custom aggregators of their own, should we move aggregators into the API package?

Also, I believe the ViewManager should stay in the SDK. It seems like implementation specific way of handling the Views, and we also don't want to introduce the whole "loading your own ViewManager" correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From discussions in the metrics SIG, I believe we are creating a "SDK-API" for now, so we don't expect users to define their own views from the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing my understanding so you can correct me if wrong:

OT API is for library authors to take dependency on. They use Metric API to emit metrics using instruments. They don't have any say in how the metrics are going to be aggregated (other than knowing the default aggregation). Views are not required to emit metrics, and hence not required to be part of API.

Views are supposed to be defined by the application owner - and application must have SDK installed to have metrics work. And they can define Views, as Views are part of SDK.

Copy link
Contributor Author

@lzchen lzchen Apr 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They use Metric API to emit metrics using instruments.

The Metric API is used to record and create metrics.

They don't have any say in how the metrics are going to be aggregated (other than knowing the default aggregation).

Since aggregation is an SDK concept, users that take dependency only on the API have no idea how to do this regardless of whether or not Views is part of the API.

Views are not required to emit metrics, and hence not required to be part of API.

Although the first part is true, it's not really the reason why the second part is true.We don't require Views to be part of the API because we don't expect users to define their own Views (like how Meter and Tracer are used).

Views are supposed to be defined by the application owner - and application must have SDK installed to have metrics work. And they can define Views, as Views are part of SDK.

Yes this is true., although the whole API SDK seperation doesn't really make too much sense in terms of metrics anyways. What use is just have a dependency on the API?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My view of this separation is that there are different concerns. The "operator" is the person who knows which aggregations are useful for monitoring a system, who is different from the "developer" who writes the code. There are two separate APIs here, one for operators (Views) and one for developers (Metrics).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the thread necromancy here, and sorry if this has been discussed elsewhere or is otherwise out of date.

How would users define the aggregator types to be used in Views? They exist currently in the SDK. Since we expect users to create custom aggregators of their own, should we move aggregators into the API package?

One option is to let views specify aggregators by class or interface name, similar to java SPI. This works nicely with static config, but is also brittle and would make for more complicated initialization logic.

I think it would be fine to include this in the "SDK API" instead, and similar to what we're doing for exporters now. Exporters could have been written to depend only on the API package, but it's not clear why a user would ever want a custom exporter, but not the SDK package (or a separate SDK package with the same exporter interfaces). It sounds like the same logic applies to views.

the whole API SDK seperation doesn't really make too much sense in terms of metrics anyways. What use is just have a dependency on the API?

The same as for tracing: so a client library (e.g. grpc) can participate in metrics generation (e.g. latency, bytes sent/received, etc.) if the SDK exists, but doesn't drag the SDK dependency into the project otherwise.

opentelemetry-sdk/src/opentelemetry/sdk/metrics/view.py Outdated Show resolved Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/metrics/view.py Outdated Show resolved Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/metrics/view.py Outdated Show resolved Hide resolved
@lzchen
Copy link
Contributor Author

lzchen commented Apr 21, 2020

@c24t To your question about removing batcher: yes we can delete this because there is no different behaviour now since there is only one implementation of batcher. However, having a seperate class to encapsulate all the "process" logic during the collection, as well as handling different logic paths based on stateless/stateful seems to be cleaner than putting them all into the meter. Thoughts?

@c24t
Copy link
Member

c24t commented Apr 21, 2020

However, having a seperate class to encapsulate all the "process" logic during the collection, as well as handling different logic paths based on stateless/stateful seems to be cleaner than putting them all into the meter. Thoughts?

If they're all static methods I'd prefer to lose the batcher class and put them in their own module, but up to you.

@lzchen
Copy link
Contributor Author

lzchen commented Apr 23, 2020

However, having a seperate class to encapsulate all the "process" logic during the collection, as well as handling different logic paths based on stateless/stateful seems to be cleaner than putting them all into the meter. Thoughts?

If they're all static methods I'd prefer to lose the batcher class and put them in their own module, but up to you.

They aren't all static methods. The batcher needs to maintain state with the batch_map to keep track of checkpointed aggregators. Since multiple meters can be created, and the batcher belongs to the meter, it is possible to have multiple batchers as well, so these must be instance methods.

@lzchen
Copy link
Contributor Author

lzchen commented Apr 24, 2020

@c24t
I changed the architecture to seperate aggregations and the aggregation data.
aggregation_data is held in Aggregator and Aggregations are new classes.
So View and aggregation data is decoupled now.

@codeboten codeboten added the needs reviewers PRs with this label are ready for review and needs people to review to move forward. label Apr 28, 2020

# Register the views to the view manager to use the views
meter.register_view(counter_view)
meter.register_view(clicks_view)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qn on requests_counter. This counter instrument is created much before a view involving it is created and registered. What would happen to the requests_counter.add() calls - how would they be aggregated, before the register_view call occurred?

Should we require that all views must be registered beforehand?

Copy link
Contributor Author

@lzchen lzchen Apr 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we require that all views must be registered beforehand?

Good question. Yes I believe we should enforce this as a requirement.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic view management would certainly be more challenging. Let's leave it out for now.

)

labels = {"environment": "staging"}

# Views are used to define an aggregation type to use for a specific metric
counter_view = View(requests_counter, CountAggregation())
clicks_view = View(clicks_counter, CountAggregation())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the creating a view requires a reference to the metric instrument itself. This means we can create a view only after creating the instrument itself.

Can we instead create Views with the name/description of the instrument? This can also allow us to load Views setting from json/yml etc.
{
"metername" : "MeterForHttpLib"
, "instrumentname", "RequestCounts"
, keys : ["httpurl","httpstatus"]
"aggregation":
{
Type:, "Histogram",
Options:
{
histogramboundaries....
}
}
}
When a meter/instrument matching the above is actually created in the program, it'll get its View config automatically from the config.

Thougts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that the application owner may not have access to requests_counter object, if it was an instrument created inside another library. Only that library will have access to it.
The application owner will have to use "metername + instrumentname" or similar to specify an instrument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I did for .net prototype.

MetricViewRegistry metricViewRegistry = new MetricViewRegistry();
metricViewRegistry.AddMetricView(new MetricView() { MeterName = "library1", InstrumentName = "testCounter", Aggregation = Aggregation.SUM, Keys = new List<string>() { "k1", "k2" } });
metricViewRegistry.AddMetricView(new MetricView() { MeterName = "library1", InstrumentName = "testCounter", Aggregation = Aggregation.SUM, Keys = new List<string>() { "k1" } });
metricViewRegistry.AddMetricView(new MetricView() { MeterName = "library1", InstrumentName = "testCounter", Aggregation = Aggregation.SUM, Keys = new List<string>() { "k2" } });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a possible improvement in the future. For the prototype, I believe it is okay to have just the basic functionality and iterate after.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cijothomas I like what you posted here. We should be imagining that views will be configured from a .yaml file, which many but not all users will want.

)

# The view manager handles all updates to aggregators
self._metric.meter.view_manager.record(self._metric, self._labels, value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BoundInstruments are meant to offer highest performance to users, as it avoids look up to which timeseries the value should go. The original code met this requirement, as it had a reference to the aggregator and just needed to call aggregator.update.

With the new code, the boundinstrument is calling view_manager.record. The view_manager has to do a lookup first with the labels, to find out which timeseries this update should go into. This would be defeating the purpose of boundinstruments.

I don't have solution to this - but i believe this is a p0 scenario to solve. As discussed in our offline conversation, I'd add a comment in view otep as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this does look like something we have to solve to make views viable.

@lzchen and I talked about this offline yesterday and we may be able to avoid the lookup cost here (and a lot of complexity elsewhere) if we don't support adding and removing views on the fly, e.g. by replacing View.__init__ with ViewManager.configure(views: list[View]). In that case we could create each bound instrument with a set of aggregators without having to worry about keeping that set up to date.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c24t Yes! I also did some prototyping in .NET.
If Views config is fixed , then it simplifies a lot of things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cijothomas
Updated the implementation to include high performance of bound instruments (aggregators are initialized upon boundinstrument creation based off of configured views at point of creation).

@@ -396,6 +341,12 @@ def unregister_observer(self, observer: "Observer") -> None:
with self.observers_lock:
self.observers.remove(observer)

def register_view(self, view):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this registers view for a particular Meter right? I'd say Views are more global thingie, and hence must be part of MeterProvider.
The application may not have access to the Meter itself to add views to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are Views global? A use case might be defining different views depending on what the meter (or source) is.

Copy link
Member

@c24t c24t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few blocking comments, a few superficial.

This PR is overdue to be merged, and I don't want to stand in its way. Most fixes can happen in other PRs, but I would like to see that this works for multiple differently-configured aggregators for the same metric, and confirm that we're only creating a single ViewManager.

@@ -0,0 +1,107 @@
# Copyright The OpenTelemetry Authors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the thread necromancy here, and sorry if this has been discussed elsewhere or is otherwise out of date.

How would users define the aggregator types to be used in Views? They exist currently in the SDK. Since we expect users to create custom aggregators of their own, should we move aggregators into the API package?

One option is to let views specify aggregators by class or interface name, similar to java SPI. This works nicely with static config, but is also brittle and would make for more complicated initialization logic.

I think it would be fine to include this in the "SDK API" instead, and similar to what we're doing for exporters now. Exporters could have been written to depend only on the API package, but it's not clear why a user would ever want a custom exporter, but not the SDK package (or a separate SDK package with the same exporter interfaces). It sounds like the same logic applies to views.

the whole API SDK seperation doesn't really make too much sense in terms of metrics anyways. What use is just have a dependency on the API?

The same as for tracing: so a client library (e.g. grpc) can participate in metrics generation (e.g. latency, bytes sent/received, etc.) if the SDK exists, but doesn't drag the SDK dependency into the project otherwise.

Comment on lines +148 to +149
self.current[">"] = 0
self.checkpoint[">"] = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're better off having N+2 buckets instead of an ordered dict with special keys here.

E.g. boundaries [1, 2] would produce buckets [0, 1), [1, 2), [2, inf). AFAIK we also don't have a decision on negative bucket boundaries in OT yet. In OC we dropped measurements < 0 and didn't include them in any bucket count. The alternative is for the first bucket to stretch all the way to negative infinity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that not N+1 buckets? Currently [1, 2] would produce (-inf, 1), [1, 2), [2, inf) so the only difference is the 0 cut off. But I think that decision can wait for the spec to specify it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N+1 buckets. Not content just to have them in my code, I've got off-by-one bugs in my comments too.

opentelemetry-sdk/src/opentelemetry/sdk/metrics/view.py Outdated Show resolved Hide resolved
@@ -366,11 +352,13 @@ def __init__(
instrumentation_info: "InstrumentationInfo",
):
self.instrumentation_info = instrumentation_info
self.batcher = UngroupedBatcher(source.stateful)
self.batcher = Batcher(source.stateful)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we lose this whole class yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. SDK specs are still not out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should feel free to move ahead of the spec, or this may never get written.

opentelemetry-sdk/src/opentelemetry/sdk/metrics/view.py Outdated Show resolved Hide resolved
Comment on lines 117 to 142
def generate_view_datas(self, metric, labels):
view_datas = set()
views = self.views.get(metric)
# No views configured, use default aggregations
if views is None:
aggregator = get_default_aggregator(metric)
# Default config aggregates on all label keys
view_datas.add(ViewData(tuple(labels), aggregator))
else:
for view in views:
updated_labels = []
if view.config == ViewConfig.LABEL_KEYS:
label_key_set = set(view.label_keys)
for label in labels:
# Only keep labels that are in configured label_keys
if label[0] in label_key_set:
updated_labels.append(label)
updated_labels = tuple(updated_labels)
elif view.config == ViewConfig.UNGROUPED:
updated_labels = labels
# ViewData that is duplicate (same labels and aggregator) will be
# aggregated together as one
view_datas.add(
ViewData(tuple(updated_labels), view.aggregator)
)
return view_datas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, the goal here is fast aggregator updates. When the user creates a bound instrument, we grab any aggregators currently associated with that bound instrument's metric via a view. That way when the user calls bound_instrument.record we can update the aggregator immediately without another lookup.

I think I understand how you got here, but it's very surprising to see BoundInstruments hold references to a bunch of ViewDatas. I don't have a suggestion to improve this, but it looks very suspicious and I'm interested to hear what others think.

I think we would have come up with a very different design if we had written a single record path first, then added views, then added optimizations for bound instruments on top.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked the owner of ViewDatas to View, so bound instruments now only have references to the unique viewdatas. Hopefully that addresses some of your concerns here

@cnnradams cnnradams mentioned this pull request Jul 23, 2020
Copy link
Member

@cnnradams cnnradams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's two more blocking issues that I know of:

  1. We currently create the aggregator, for example SumAggregator() and then pass it in to the View constructor. However, we need a seperate aggregator instantiation for every different set of labels, while currently it just uses the exact same aggregator. Need to pass in the type of aggregator and args seperately to View.

  2. When you use dropped labels (ie ViewConfig.LABEL_KEYS), a new instrument is created for every combination including the dropped labels, where instead the dropped labels should be discarded first before picking a matching instrument

Copy link
Member

@c24t c24t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good @cnnradams, and good call adding aggregator_config.

opentelemetry-sdk/src/opentelemetry/sdk/metrics/view.py Outdated Show resolved Hide resolved
def __eq__(self, other):
return (
self.metric == other.metric
and self.aggregator.__class__ == other.aggregator.__class__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless all aggregator instances of the same class are equal this is still a problem.

Copy link
Member

@c24t c24t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to go ahead, @lzchen please merge this when you're ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metrics needs reviewers PRs with this label are ready for review and needs people to review to move forward. sdk Affects the SDK package.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants