Skip to content

Commit

Permalink
Compact resolution/retention docs update. (#1548)
Browse files Browse the repository at this point in the history
* Some updates to compact docs

Signed-off-by: Ivan Kiselev <[email protected]>

* some formatting

Signed-off-by: Ivan Kiselev <[email protected]>

* Update docs/components/compact.md

accept PR suggestions

Co-Authored-By: Bartlomiej Plotka <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Add metalmatze to list of maintainers (#1547)

Signed-off-by: Matthias Loibl <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* resolve comments

Signed-off-by: Ivan Kiselev <[email protected]>

* resolve last comment

Signed-off-by: Ivan Kiselev <[email protected]>

* receive: Add liveness and readiness probe (#1537)

* Add prober to receive

Signed-off-by: Kemal Akkoyun <[email protected]>

* Add changelog entries

Signed-off-by: Kemal Akkoyun <[email protected]>

* Update README

Signed-off-by: Kemal Akkoyun <[email protected]>

* Remove default

Signed-off-by: Kemal Akkoyun <[email protected]>

* Wait hashring to be ready

Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* downsample: Add liveness and readiness probe (#1540)

* Add readiness and liveness probes for downsampler

Signed-off-by: Kemal Akkoyun <[email protected]>

* Add changelog entry

Signed-off-by: Kemal Akkoyun <[email protected]>

* Remove default

Signed-off-by: Kemal Akkoyun <[email protected]>

* Set ready

Signed-off-by: Kemal Akkoyun <[email protected]>

* Update CHANGELOG

Signed-off-by: Kemal Akkoyun <[email protected]>

* Clean CHANGELOG

Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Document the dnssrvnoa option (#1551)

Signed-off-by: Antonio Santos <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* feat store: added readiness and livenes prober (#1460)

Signed-off-by: Martin Chodur <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Add Hotstar to adopters. (#1553)

It's the largest streaming service in India that does cricket and GoT
for India. They have insane scale and are using Thanos to scale their
Prometheus.

Spoke to them offline about adding the logo and will get a signoff here
too.

Signed-off-by: Goutham Veeramachaneni <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Fix hotstar logo in the adoptor's list (#1558)

Signed-off-by: Karthik Vijayaraju <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552)

Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Compactor: Fix for #844 - Ignore object if it is the current directory (#1544)

* Ignore object if it is the current directory

Signed-off-by: Jamie Poole <[email protected]>

* Add full-stop

Signed-off-by: Jamie Poole <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Adding doc explaining the importance of groups for compactor (#1555)

Signed-off-by: Leo Meira Vital <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Add blank line for list (#1566)

The format of these files is wrong in the web.

Signed-off-by: dongwenjuan <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Refactor compactor constants, fix bucket column (#1561)

* compact: unify different time constants

Use downsample.* constants where possible. Move the downsampling time
ranges into constants and use them as well.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* bucket: refactor column calculation into compact

Fix the column's name and name it UNTIL-DOWN because that is what it
actually shows - time until the next downsampling.

Move out the calculation into a separate function into the compact
package. Ideally we could use the retention policies in this calculation
as well but the `bucket` subcommand knows nothing about them :-(

Signed-off-by: Giedrius Statkevičius <[email protected]>

* compact: fix issues with naming

Reorder the constants and fix mistakes.

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* remove duplicate

Signed-off-by: Ivan Kiselev <[email protected]>
  • Loading branch information
Ivan Kiselev authored and brancz committed Sep 26, 2019
1 parent 99bc7d2 commit ae45e27
Showing 1 changed file with 24 additions and 4 deletions.
28 changes: 24 additions & 4 deletions docs/components/compact.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,33 @@ config:
The compactor needs local disk space to store intermediate data for its processing. Generally, about 100GB are recommended for it to keep working as the compacted time ranges grow over time.
On-disk data is safe to delete between restarts and should be the first attempt to get crash-looping compactors unstuck.
## Downsampling, Resolution and Retention
Resolution - distance between data points on your graphs. E.g.
* raw - the same as scrape interval at the moment of data ingestion
* 5m - data point is every 5 minutes
* 1h - data point is every 1h
Keep in mind, that the initial goal of downsampling is not saving disk space (Read further for elaboration on storage space consumption). The goal of downsampling is providing an opportunity to get fast results for range queries of big time intervals like months or years. In other words, if you set `--retention.resolution-raw` less then `--retention.resolution-5m` and `--retention.resolution-1h` - you might run into a problem of not being able to "zoom in" to your historical data.

To avoid confusion - you might want to think about `raw` data as about "zoom in" opportunity. Considering the values for mentioned options - always think "Will I need to zoom in to the day 1 year ago?" if the answer "yes" - you most likely want to keep raw data for as long as 1h and 5m resolution, otherwise you'll be able to see only downsampled representation of how your raw data looked like.

There's also a case when you might want to disable downsampling at all with `debug.disable-downsampling`. You might want to do it when you know for sure that you are not going to request long ranges of data (obviously, because without downsampling those requests are going to be much much more expensive than with it). A valid example of that case if when you only care about the last couple of weeks of your data or use it only for alerting, but if it's your case - you also need to ask yourself if you want to introduce Thanos at all instead of vanilla Prometheus?

Ideally, you will have equal retention set (or no retention at all) to all resolutions which allow both "zoom in" capabilities as well as performant long ranges queries. Since object storages are usually quite cheap, storage size might not matter that much, unless your goal with thanos is somewhat very specific and you know exactly what you're doing.

## Storage space consumption

In fact, downsampling doesn't save you any space but instead it adds 2 more blocks for each raw block which are only slightly smaller or relatively similar size to raw block. This is required by internal downsampling implementation which to be mathematically correct holds various aggregations. This means that downsampling can increase the size of your storage a bit (~3x), but it gives massive advantage on querying long ranges.

## Groups

The compactor groups blocks using the [external_labels](https://thanos.io/getting-started.md/#external-labels) added by the
Prometheus who produced the block. The labels must be both _unique_ and _persistent_ across different Prometheus instances.
The compactor groups blocks using the [external_labels](https://thanos.io/getting-started.md/#external-labels) added by the
Prometheus who produced the block. The labels must be both _unique_ and _persistent_ across different Prometheus instances.

By _unique_, we mean that the set of labels in a Prometheus instance must be different from all other sets of labels of
your Prometheus instances, so that the compactor will be able to group blocks by Prometheus instance.
By _unique_, we mean that the set of labels in a Prometheus instance must be different from all other sets of labels of
your Prometheus instances, so that the compactor will be able to group blocks by Prometheus instance.

By _persistent_, we mean that one Prometheus instance must keep the same labels if it restarts, so that the compactor will keep
compacting blocks from an instance even when a Prometheus instance goes down for some time.
Expand Down

0 comments on commit ae45e27

Please sign in to comment.