-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change Metric Processor to merge multiple observations #1024
Conversation
@bogdandrutu please review the approach in #1023 (part of which has been factored into this PR). My goals:
|
In the Metrics SIG we discussed two ways that dimensionality reduction can take place, one in the Accumulator and one in a Processor. These two approaches have different performance characteristics and ultimately the best choice will depend on workload. If there is high cardinality in the input, reducing dimensionality will lower the size and cost of maintaining Accumulator state, but the key filter will be applied to every synchronous measurement, which means more filtering cost. If there is not high cardinality in the input, lowering dimensionality after the Accumulator makes more sense, because the filters are applied once per collection interval to every distinct label set instead of once per synchronous event. When it comes to sampling, these approaches support different sampling schemes, and there are meaningful sampling schemes in both cases, but if we reduce dimensionality in the Accumulator then we are able to sample individual events and if we reduce dimensionality in the Processor then we are able to sample over aggregated measurements. I want to continue to press forward with this PR because it addresses a problem in both cases and resolves #862. If reducing dimensionality in the Accumulator: this PR ensures that multiple observations will be combined, not overwrite each other. If reducing dimensionality in a Processor, this PR assumes that a separate Processor implementation would wrap around this basic Processor implementation. The "reducer" Processor would filter the label set just as the Accumulator would, and again the basic Processor would see multiple observations w/ identical label sets. Again, the correct output is to combine these. @open-telemetry/specs-metrics-approvers please consider. |
Co-authored-by: Anthony Mirabella <[email protected]>
Co-authored-by: Tyler Yahn <[email protected]>
…y#1024) * Add regexp filter in api/label, test * Add regexp option to sdk.Config * Return indistinct values only when keyRe != nil * Filter in sdk * Add an accumulator filter test * SDK tests pass * Precommit * Undo set filters * Backout related filter changes * Add a new test * Fix build * Apply suggestions from code review Co-authored-by: Anthony Mirabella <[email protected]> * Update comments * Apply suggestions from code review Co-authored-by: Tyler Yahn <[email protected]> Co-authored-by: Anthony Mirabella <[email protected]> Co-authored-by: Tyler Yahn <[email protected]>
This is a prerequisite to any change that introduces dimensionality reduction. It fixes a bug in handling multiple observations (i.e., measurements from callbacks), that will occur when multiple label sets are reduced (i.e., have key-values removed) into identical label sets with fewer keys than the originals.
Any duplicate metric (i.e., instrument+labelset) from a single Accumulator during a single collection round must be the result of dimensionality reduction; it could also be explained by multiple Accumulators submitting data to a single Processor. In either case, the former behavior is incorrect. The former behavior implements an Accumulator responsibility inside the Processor--it was incorrect behavior.
Resolves #862 by simplifying the relationship between several variables in the processor's state value.