-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Add filter predicate to MetricReader (push-down predicate) #3324
Comments
Does this lead to the same output as basically turning off all the |
Turning off --> delete? Apache Pulsar is a messaging system like Kafka. It has topics, but as opposed to Kafka: It supports huge amount of topics, which results of 10k to 100k topics per broker. Topic group are the key to reduce cardinality to still be able to monitor Pulsar. You will have instruments for topic level granularity, each having 100k attribute sets (1 per topic), and instruments for topic group level granularity (1 per group), each having 1k attribute sets. By default, topic level will be filtered out. You will group level instruments. Once you see a group misbehave, you turn it on to allow exporting only that group topics. This will be controlled dynamically by the user in the predicate that will be provided to the metric reader. Once you understood and solved the issue, you can turn off the metrics for that topic group, and you can even decide to monitor a specific topic. |
I see. What is the difference when compared to Views? |
I think this could make sense. Couple of thoughts:
|
|
Totally agree. I was planning to do such a comparison, once I reach that phase.
Awesome suggestion. I'm revising my proposal to include it. |
Got it! Nicer to use, and probably a whole lot harder to implement 😄
Two more questions:
|
@pirgeo
Even you filter most topics, you still have the Of course, you need to be aware that if you explicitly configured Pulsar to filter out metrics, some will not appear there :) |
@asafm Would you like to discuss this topic on the spec call tomorrow to move this proposal forward? |
@arminru I would, but unfortunately I can't today. I had a lengthy conversation with @jack-berg and @jmacd - I think they managed to understand the use case. If either of them will be present, they may be able to present the idea. If not, once the holiday season ends here, I will join the next meeting for sure. |
@arminru I will try to join this Tuesday |
@asafm I understand that you are working out a way to offer debug-level metrics and have arrived at an approach where the reader supports a configurable predicate allowing you to skip through the data. This led me to think through other possible ways to accomplish this, and I like your approach. As long as we're imagining, though, I think it's worth comparing alternatives. One alternative is to let the SDK output high-cardinality data, send it to a collector, where (somehow) the level of aggregation could be dynamically controlled. Although we can find ways to describe varying-dimension data in OTLP, we are likely to run into problems at most consumers. A similar alternative could output high-cardinality data to an intermediate representation, then re-process that data to reduce it to the intended level of detail at runtime. This might be more efficient than the collector-based re-aggregation, it is still more expensive than what you're considering. One alternative might be to try to register dynamic views, but especially for UpDownCounter instruments (like counting items in a queue), the finest intended granularity (i.e., most dimensions) has to be aggregated from the start. Your solution to the problems associated with varying-dimensions is to establish what I may call "dormant views", which are view configurations that are generally being calculated but not exported most of the time. When a user is diagnosing a specific problem, they could enable a specific dormant view and/or enter a predicate to report specific sets of attributes to a sporadically-reported timeseries used for debugging. You can turn the debug-level data on and off, without disturbing the ordinary group-level timeseries. All the side effects associated with the SDK's Collect() would still take place when a predicate skips the output. All the callbacks are evaluated. If the reader is configured with Delta temporality, there will be gaps with missing data, if cumulative there will be gaps with average data. I support this as an experimental option for SDKs to implement. I am wary of using "Filter" to describe this action of skipping data, I've used "skip" and "predicate" to avoid that word because that word is associated with reducing dimensionality in a View, and this is a separate mechanism. |
I thought about it at first, but there were a couple of hurdles I saw to it:
I hope I understood your intention correctly. I had that idea initially, but it failed. The idea was that I will have only topic-level instruments - i.e.
Dynamic views is an option I explored (I spent a lot of time on this...). The issue there was like this: Suppose we have
Basically yes. I can define a view for each instrument: One similar to the default one, and another which will change the name from topic to group level, there by creating a new instrument, and changes the attributes to be group level. The one thing I need to use this is something like
Not all side effects. Yes, callback will be evaluated, but the most important thing is saved here: No need to memory allocate an object to return the result. So if most of the time those 100k topics remained filtered out, your memory requirements due to collect are quite low.
I agree it should be experimental and I will edit the issue to reflect that. |
What is the next step for this proposal? Write a PR? |
What are you trying to achieve?
I would like as a user to define a predicate (programmatically at first) to
MetricReader
which will allow me to read only a portion of metrics available in the SDK metrics producer or any otherMetricProducer
.This will be marked as experimental in the spec.
What did you expect to see?
A new operation in
MetricReader
:setPredicate(predicate)
Where
predicate
will be a function matching the following interface:(instrumentationScope, instrumentName, instrumentDescription, instrumentKind, attributes, unit) -> boolean
The metric reader will provide that function to any registered
MetricProducer
upon callingcollect()
. Each producer will use that function to determine whether to produce a metric for given attributes in an instrument. A True value means the data point will be included in the result of the metrics producer, and there by respected by what ever the reader does with the data point.A new optional parameter to
Produce()
operation ofMetricProducer
:producer(predicate)
, wherepredicate
is defined as above.Additional context.
I'm working on designing the integration of OpenTelemetry Metrics into Apache Pulsar.
Pulsar supports today up-to 100k of topics (partitions in Kafka terms) in a single broker, and in the near future 1M topics per broker (possible since Pulsar persists the message into a virtual append-only files to Apache Bookkeeper, as opposed to actual file descriptors on disk).
In a single broker having 100k topics, with 50 instruments for topics, this means, 5000k attribute sets overall. As you can imagine, it is too much, response body size and cost wise. Hence, there are two ways I planned to solve it, where the 2nd way is the reason for this proposal:
This can be achieved with decorating the registered
MetricProducer
and filtering the result list, but this means we pay the cost of memory and CPU associated with collecting those amount of attributes (that can be quite a lot, as shows). In latency sensitive applications such as Apache Pulsar, every millisecond in latency counts.Hence, I wanted to be able to provide a filter function which the SDK metric producer will use to determine which (instrument, attribute) pairs to collect or skip, whether synchronous or asynchronous (the callback
record()
method will do nothing).I believe this is useful also for other purposes. Imagine the applications wish to expose some instruments and attributes via REST API in certain format. It can create a metric reader and set the filter, to obtain those values. The alternative is to record the values manually in memory and expose it via asynchronous instruments (which is not possible for histograms).
This REST API can also be a nice UI showing current metrics of the system.
Also, Prometheus
/metrics
endpoint today supports this kind of dynamic filtering via query parameters (name[]
). This proposal will make it more efficient and more accurate (delta temporality resets count upon collect, but data got discarded forward in the pipeline).The text was updated successfully, but these errors were encountered: