Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compact: add metric thanos_compactor_iterations_total #1733

Merged

Conversation

GiedriusS
Copy link
Member

@GiedriusS GiedriusS commented Nov 8, 2019

Add a metric called thanos_compactor_iterations_total that is a counter
and will get increased by 1 every time an iteration gets executed
successfully. This is needed in case --wait is specified and then our
Compactor could die. We need to alert on such a case.

One thing would be to alert on a restart of the container however that
is not the most flexible thing - it might still be OK as long as it
successfully finishes its job in time. However, it is impossible to know
that exact part ATM.

Add this metric so that users could add alerts like:

rate(thanos_compactor_iterations_total[1d]) == 0
FOR 3d

Signed-off-by: Giedrius Statkevičius [email protected]

There is thanos_objstore_bucket_last_successful_upload_time but still
that doesn't show if we were able to get through all of the cycle successfully.

Add a metric called thanos_compactor_iterations_total that is a counter
and will get increased by 1 every time an iteration gets executed
successfully. This is needed in case --wait is specified and then our
Compactor could die. We need to alert on such a case.

One thing would be to alert on a restart of the container however that
is not the most flexible thing - it might still be OK as long as it
successfully finishes its job in time. However, it is impossible to know
that exact part ATM.

Add this metric so that users could add alerts like:

```
rate(thanos_compactor_iterations_total[1d]) == 0
FOR 3d
```

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
@GiedriusS GiedriusS force-pushed the add_iterations_complete_metric_compactor branch from a18e0b3 to 8a9a90d Compare November 8, 2019 09:43
@GiedriusS GiedriusS closed this Nov 8, 2019
@GiedriusS GiedriusS reopened this Nov 8, 2019
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with small nit.

cmd/thanos/compact.go Outdated Show resolved Hide resolved
Let's register the metric no matter what since if it is run as a batch
job then this metric does not matter either way.

Signed-off-by: Giedrius Statkevičius <[email protected]>
Copy link
Member

@FUSAKLA FUSAKLA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍
we should update the alerts at some point again maybe

Copy link
Member

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

CHANGELOG.md Outdated
@@ -18,6 +18,7 @@ We use *breaking* word for marking changes that are not backward compatible (rel
- [#1573](https://github.com/thanos-io/thanos/pull/1573) `AliYun OSS` object storage, see [documents](docs/storage.md#aliyun-oss) for further information.
- [#1680](https://github.com/thanos-io/thanos/pull/1680) Add a new `--http-grace-period` CLI option to components which serve HTTP to set how long to wait until HTTP Server shuts down.
- [#1712](https://github.com/thanos-io/thanos/pull/1712) Rename flag on bucket web component from `--listen` to `--http-address` to match other components.
- [#1733](https://github.com/thanos-io/thanos/pull/1733) New metric `thanos_compactor_iterations_total` on Thanos Compactor which shows the number of successful iterations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit: add a period at the end here to keep them all full sentences.

Add a period at the end of an item in the CHANGELOG to keep it uniform.

Signed-off-by: Giedrius Statkevičius <[email protected]>
@bwplotka bwplotka merged commit a0747ef into thanos-io:master Nov 13, 2019
IKSIN pushed a commit to monitoring-tools/thanos that referenced this pull request Nov 26, 2019
* compact: add metric thanos_compactor_iterations_total

Add a metric called thanos_compactor_iterations_total that is a counter
and will get increased by 1 every time an iteration gets executed
successfully. This is needed in case --wait is specified and then our
Compactor could die. We need to alert on such a case.

One thing would be to alert on a restart of the container however that
is not the most flexible thing - it might still be OK as long as it
successfully finishes its job in time. However, it is impossible to know
that exact part ATM.

Add this metric so that users could add alerts like:

```
rate(thanos_compactor_iterations_total[1d]) == 0
FOR 3d
```

Signed-off-by: Giedrius Statkevičius <[email protected]>

* CHANGELOG: add entry

Signed-off-by: Giedrius Statkevičius <[email protected]>

* compact: simplify wait check

Signed-off-by: Giedrius Statkevičius <[email protected]>

* cmd: thanos: compact: remove wait check

Let's register the metric no matter what since if it is run as a batch
job then this metric does not matter either way.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* CHANGELOG: add period

Add a period at the end of an item in the CHANGELOG to keep it uniform.

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Aleksey Sin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants