Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query: PromQL resets function does not detect reset. #3644

Open
bwplotka opened this issue Dec 18, 2020 · 10 comments
Open

query: PromQL resets function does not detect reset. #3644

bwplotka opened this issue Dec 18, 2020 · 10 comments

Comments

@bwplotka
Copy link
Member

Just for awareness. I noticed PromQL resets https://prometheus.io/docs/prometheus/latest/querying/functions/#resets function has no right to work. (:

This bug was introduced ~18.05.2020 by http://github.com/thanos-io/thanos/pull/2548

This is because for deduplication and downsampling precision we adjust resets on storage level:

Since it's used in downsampling it irreversibly blocked the reset function against such downsampled data.

First of all, I wonder... Why someone would use such low level detail like when counter resets and how many were happening? Maybe to detect restarts? 🤔

Still, I think we should fix it, help wanted 🤗 I noticed this when adding those tests #3631

@bwplotka bwplotka changed the title query: PromQL resets never detects reset. query: PromQL resets function does not detect reset. Dec 18, 2020
@bwplotka
Copy link
Member Author

Some useful disscussion on IRC:

bbrazil
resets is intended largely for debugging broken counters, however some people have found uses for it with gauges too
bwplotka
per: https://github.com/thanos-io/thanos/issues/3644
bbrazil
as resets will catch any time a series decreased, so can catch the down of a flap
bwplotka
yea ok
thanks for info (:
bbrazil
I'm having difficultly imaginging a good use case for using it on downsampled data, but that doesn't mean one doesn't exist
probably use the average or last rather than the counter. and it should be okay
bwplotka
what do you mean? For "resets" results? Interesting
bbrazil
basically resets is a gauge function, not a counter function

@bwplotka
Copy link
Member Author

# Testdata for resets() and changes().
load 5m
	http_requests{path="/foo"}	1 2 3 0 1 0 0 1 2 0
	http_requests{path="/bar"}	1 2 3 4 5 1 2 3 4 5
	http_requests{path="/biz"}	0 0 0 0 0 1 1 1 1 1

# Tests for resets().
eval instant at 50m resets(http_requests[5m])
	{path="/foo"} 0
	{path="/bar"} 0
	{path="/biz"} 0

eval instant at 50m resets(http_requests[20m])
	{path="/foo"} 1
	{path="/bar"} 0
	{path="/biz"} 0

eval instant at 50m resets(http_requests[30m])
	{path="/foo"} 2
	{path="/bar"} 1
	{path="/biz"} 0

eval instant at 50m resets(http_requests[50m])
	{path="/foo"} 3
	{path="/bar"} 1
	{path="/biz"} 0

eval instant at 50m resets(nonexistent_metric[50m])

Tests that detect those.

bwplotka added a commit that referenced this issue Dec 18, 2020
* Improved logging.
* Improved debug matching info.
* Fixed important matching bug.
* Found reset bug, added issue #3644 and commented code.

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Dec 18, 2020
* Improved logging.
* Improved debug matching info.
* Fixed important matching bug.
* Found reset bug, added issue #3644 and commented code.

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Dec 18, 2020
* Improved logging.
* Improved debug matching info.
* Fixed important matching bug.
* Found reset bug, added issue #3644 and commented code.

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Dec 18, 2020
* Improved logging.
* Improved debug matching info.
* Fixed important matching bug.
* Found reset bug, added issue #3644 and commented code.

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Dec 18, 2020
* Improved logging.
* Improved debug matching info.
* Fixed important matching bug.
* Found reset bug, added issue #3644 and commented code.

Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Dec 18, 2020
* Improved logging.
* Improved debug matching info.
* Fixed important matching bug.
* Found reset bug, added issue #3644 and commented code.

Signed-off-by: Bartlomiej Plotka <[email protected]>
kakkoyun pushed a commit that referenced this issue Dec 19, 2020
…step towars Query pushdown! (#3631)

* store: Added inprocess server to client.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Added initial PromQL acceptance tests.

* Improved logging.
* Improved debug matching info.
* Fixed important matching bug.
* Found reset bug, added issue #3644 and commented code.

Signed-off-by: Bartlomiej Plotka <[email protected]>
@stale
Copy link

stale bot commented Feb 18, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Feb 18, 2021
@GiedriusS GiedriusS removed the stale label Mar 3, 2021
@stale
Copy link

stale bot commented Jun 2, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jun 2, 2021
@kakkoyun kakkoyun removed the stale label Jun 3, 2021
@stale
Copy link

stale bot commented Aug 2, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Aug 2, 2021
@stale
Copy link

stale bot commented Aug 17, 2021

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Aug 17, 2021
@GiedriusS GiedriusS reopened this Sep 9, 2021
@stale stale bot removed the stale label Sep 9, 2021
@stale
Copy link

stale bot commented Jan 9, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jan 9, 2022
@GiedriusS GiedriusS removed the stale label Jan 10, 2022
@stale
Copy link

stale bot commented Apr 17, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Apr 17, 2022
@GiedriusS GiedriusS removed the stale label Apr 17, 2022
@stale
Copy link

stale bot commented Oct 1, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Oct 1, 2022
@ee-github
Copy link

ee-github commented Mar 27, 2023

I would like to make a case here for the usefulness of fixing this bug, further and similar to the use-cases above mentioned by Brian Brazil.

When doing blackbox style monitoring of monotonically increasing functions, resets() is very valuable in detecting issues.

For example, the uptime counter of a network routing protocol like BGP. It ticks up until the session resets, at which point it restarts from zero. The resets() function catches that very well.

Just monitoring the binary/bool "up" status of said BGP peer is not sufficient because it can bounce quickly in-between scrape intervals and go undetected. However, such quick events cannot escape resets() of an uptime counter. Very valuable.

Please do fix! Beer 🍺 on me for the contributor!

@stale stale bot removed the stale label Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants