Revert Timer#max to use a windowed max rather than last publishing interval. #1993

jkschneider · 2020-04-13T20:59:54Z

In #1856, the behavior of StepTimer was changed to publish max only for the last publishing interval. For the purpose of availability monitoring, it is better to allow maximum to decay over a longer period.

cc / @crankydillo

Further explanation as to why:

Micrometer decays the maximum rather than aligning it to the publishing interval like it does for sum and count. If we perfectly aligned the view of maximum time to the push-interval then a dropped metrics payload means we potentially miss out on seeing a particularly high maximum value (because in the next interval we'd only consider samples that occurred in that interval).

Practically, there are many reasons why a high maximum latency and a dropped metrics payload would be correlated. For example, if the application is under heavy resource pressure (like a saturated network interface), a response to the user for an API endpoint that is being timed (and for which a maximum value is being tracked) may be exceedingly high at the same time that a metrics post request to the monitoring system fails with a read timeout. But such conditions can be (and many times are) temporary.

Perhaps you have a client-side load balancing strategy that recognizes that (from the client's perspective) API latency has gone up sharply for this instance that is under resource pressure, and begins preferring other instances. By relieving pressure on this instance it recovers.

In some subsequent interval, after the instance has recovered, it's nice to be able to push a maximum latency seen during this time of trouble that would otherwise have been skipped. In fact, it's precisely these times of duress that we care about the most, not the maximum latency under fair-weather conditions!

The text was updated successfully, but these errors were encountered:

shakuzen · 2020-04-14T07:59:56Z

We'll also need to revert the changes made to the docs: micrometer-metrics/micrometer-docs#120

crankydillo · 2020-04-14T13:05:15Z

Thanks for the heads up. I can understand how the importance of max values makes it behavior different from other 'step values'. It appears you left the refactoring around isolating the rolling logic, which will allow us to continue align max with the last step (and live with the ramifications of missing values) without copying too much of your code. If this comes up in the future, you might consider providing 2 versions of StepTimer.

Please consider making some additions to the javadoc/hosted doc/etc that makes it clear that max deviates from other 'step values' and why that is done.

Lastly, thanks for the project. It's working great for us so far!

izeye · 2020-04-14T13:47:46Z

I created micrometer-metrics/micrometer-docs#125 to revert micrometer-metrics/micrometer-docs#120. I also added the reason for time window max to the doc based on the comment from @jkschneider.

jkschneider modified the milestones: 1.4.x, 1.4.2 Apr 13, 2020

jkschneider pushed a commit that referenced this issue Apr 14, 2020

Revert to using windows max for Timer/DistributionSummary (issue #1993)

f8fc920

jkschneider closed this as completed Apr 14, 2020

shakuzen added release notes Noteworthy change to call out in the release notes type: task A general task labels Apr 14, 2020

izeye mentioned this issue Apr 14, 2020

Revert "Update doc on max for StepDistributionSummary and StepTimer" micrometer-metrics/micrometer-docs#125

Merged

shakuzen mentioned this issue Apr 14, 2022

OTLP registry #3129

Merged

shakuzen mentioned this issue Jun 24, 2022

Support min/max in OTLP histograms #3144

Open

shakuzen mentioned this issue Aug 15, 2022

Value discordance in summaries with concurrent record execution #3346

Closed

shakuzen mentioned this issue Apr 7, 2023

Add capability to have configurable aggregation temporality for OTLP Registry #3625

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert Timer#max to use a windowed max rather than last publishing interval. #1993

Revert Timer#max to use a windowed max rather than last publishing interval. #1993

jkschneider commented Apr 13, 2020 •

edited

Loading

shakuzen commented Apr 14, 2020

crankydillo commented Apr 14, 2020

izeye commented Apr 14, 2020

Revert Timer#max to use a windowed max rather than last publishing interval. #1993

Revert Timer#max to use a windowed max rather than last publishing interval. #1993

Comments

jkschneider commented Apr 13, 2020 • edited Loading

shakuzen commented Apr 14, 2020

crankydillo commented Apr 14, 2020

izeye commented Apr 14, 2020

jkschneider commented Apr 13, 2020 •

edited

Loading