-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reset handling by checking against previous value instead of reset value. #263
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful.
LGTM modulo one minor formatting comment.
retrieval/series_cache.go
Outdated
// Value of the most recent point seen for the time series. If a new value is | ||
// less than the previous, then the series has reset. | ||
previousValue float64 | ||
|
||
hasReset bool | ||
resetValue float64 | ||
resetTimestamp int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to take the opportunity to add a blank line after this one? Or is this how the auto-formatter chose to format it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea -- added some additional comments too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo one grammar nit.
retrieval/series_cache.go
Outdated
hasReset bool | ||
|
||
// The value and timestamp of the latest reset. The timestamp is when it | ||
// occurred and the value is what it was reset to. resetValue will initially |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar nit: need a comma before the "and".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
There's been a bug in reset handling (since inception) that detects a reset by checking if the current value is less than the initial/reset value, when it should be detecting drops by comparing against the previous value instead. For example, if a series receives values
4->10->6
, the existing code will not detect a reset when the value drops from10
to6
(even though it should), because6
</4
. It would only detect a reset if the data were e.g.4->10->3
, because3
<4
.Further, this meant that the sidecar could allow at most one reset per time series (so long as it stayed in the cache), because after one reset, the baseline
resetValue
was set to0
, and that was what all subsequent values were compared against to identify resets. Since Prometheus counters can't be negative, no additional resets could occur (0 </ 0).Related bugs (internal access only):