-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compact: add metric thanos_compactor_iterations_total #1733
compact: add metric thanos_compactor_iterations_total #1733
Conversation
Add a metric called thanos_compactor_iterations_total that is a counter and will get increased by 1 every time an iteration gets executed successfully. This is needed in case --wait is specified and then our Compactor could die. We need to alert on such a case. One thing would be to alert on a restart of the container however that is not the most flexible thing - it might still be OK as long as it successfully finishes its job in time. However, it is impossible to know that exact part ATM. Add this metric so that users could add alerts like: ``` rate(thanos_compactor_iterations_total[1d]) == 0 FOR 3d ``` Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
a18e0b3
to
8a9a90d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with small nit.
Let's register the metric no matter what since if it is run as a batch job then this metric does not matter either way. Signed-off-by: Giedrius Statkevičius <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍
we should update the alerts at some point again maybe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
CHANGELOG.md
Outdated
@@ -18,6 +18,7 @@ We use *breaking* word for marking changes that are not backward compatible (rel | |||
- [#1573](https://github.com/thanos-io/thanos/pull/1573) `AliYun OSS` object storage, see [documents](docs/storage.md#aliyun-oss) for further information. | |||
- [#1680](https://github.com/thanos-io/thanos/pull/1680) Add a new `--http-grace-period` CLI option to components which serve HTTP to set how long to wait until HTTP Server shuts down. | |||
- [#1712](https://github.com/thanos-io/thanos/pull/1712) Rename flag on bucket web component from `--listen` to `--http-address` to match other components. | |||
- [#1733](https://github.com/thanos-io/thanos/pull/1733) New metric `thanos_compactor_iterations_total` on Thanos Compactor which shows the number of successful iterations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit: add a period at the end here to keep them all full sentences.
Add a period at the end of an item in the CHANGELOG to keep it uniform. Signed-off-by: Giedrius Statkevičius <[email protected]>
* compact: add metric thanos_compactor_iterations_total Add a metric called thanos_compactor_iterations_total that is a counter and will get increased by 1 every time an iteration gets executed successfully. This is needed in case --wait is specified and then our Compactor could die. We need to alert on such a case. One thing would be to alert on a restart of the container however that is not the most flexible thing - it might still be OK as long as it successfully finishes its job in time. However, it is impossible to know that exact part ATM. Add this metric so that users could add alerts like: ``` rate(thanos_compactor_iterations_total[1d]) == 0 FOR 3d ``` Signed-off-by: Giedrius Statkevičius <[email protected]> * CHANGELOG: add entry Signed-off-by: Giedrius Statkevičius <[email protected]> * compact: simplify wait check Signed-off-by: Giedrius Statkevičius <[email protected]> * cmd: thanos: compact: remove wait check Let's register the metric no matter what since if it is run as a batch job then this metric does not matter either way. Signed-off-by: Giedrius Statkevičius <[email protected]> * CHANGELOG: add period Add a period at the end of an item in the CHANGELOG to keep it uniform. Signed-off-by: Giedrius Statkevičius <[email protected]> Signed-off-by: Aleksey Sin <[email protected]>
Add a metric called thanos_compactor_iterations_total that is a counter
and will get increased by 1 every time an iteration gets executed
successfully. This is needed in case --wait is specified and then our
Compactor could die. We need to alert on such a case.
One thing would be to alert on a restart of the container however that
is not the most flexible thing - it might still be OK as long as it
successfully finishes its job in time. However, it is impossible to know
that exact part ATM.
Add this metric so that users could add alerts like:
Signed-off-by: Giedrius Statkevičius [email protected]
There is
thanos_objstore_bucket_last_successful_upload_time
but stillthat doesn't show if we were able to get through all of the cycle successfully.