Skip to content

Commit

Permalink
Update design.md
Browse files Browse the repository at this point in the history
Update store.md

Update design.md

Update storage.md

Update troubleshooting.md

Signed-off-by: Biswajit Ghosh <[email protected]>
  • Loading branch information
Biswajitghosh98 committed Dec 16, 2020
1 parent 7c3c43c commit 4574490
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 19 deletions.
12 changes: 6 additions & 6 deletions docs/components/store.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ We recommend having overlapping time ranges with Thanos Sidecar and other Thanos

Thanos Querier deals with overlapping time series by merging them together.

Filtering is done on a Chunk level, so Thanos Store might still return Samples which are outside of `--min-time` & `--max-time`.
Filtering is done on a [Chunk](../design.md/#Note) level, so Thanos Store might still return Samples which are outside of `--min-time` & `--max-time`.

### External Label Partitioning (Sharding)

Expand Down Expand Up @@ -289,7 +289,7 @@ While the remaining settings are **optional**:

## Caching Bucket

Thanos Store Gateway supports a "caching bucket" with chunks and metadata caching to speed up loading of chunks from TSDB blocks. To configure caching, one needs to use `--store.caching-bucket.config=<yaml content>` or `--store.caching-bucket.config-file=<file.yaml>`.
Thanos Store Gateway supports a "caching bucket" with [chunks](../design.md/#Note) and metadata caching to speed up loading of [chunks](../design.md/#Note) from TSDB blocks. To configure caching, one needs to use `--store.caching-bucket.config=<yaml content>` or `--store.caching-bucket.config-file=<file.yaml>`.

Currently only memcached "backend" is supported:

Expand All @@ -312,11 +312,11 @@ metafile_max_size: 1MiB

`config` field for memcached supports all the same configuration as memcached for [index cache](#memcached-index-cache).

Additional options to configure various aspects of chunks cache are available:
Additional options to configure various aspects of [chunks](../design.md/#Note) cache are available:

- `chunk_subrange_size`: size of segment of chunks object that is stored to the cache. This is the smallest unit that chunks cache is working with.
- `chunk_subrange_size`: size of segment of [chunks](../design.md/#Note) object that is stored to the cache. This is the smallest unit that chunks cache is working with.
- `max_chunks_get_range_requests`: how many "get range" sub-requests may cache perform to fetch missing subranges.
- `chunk_object_attrs_ttl`: how long to keep information about chunk file attributes (e.g. size) in the cache.
- `chunk_object_attrs_ttl`: how long to keep information about [chunk](../design.md/#Note) file attributes (e.g. size) in the cache.
- `chunk_subrange_ttl`: how long to keep individual subranges in the cache.

Following options are used for metadata caching (meta.json files, deletion mark files, iteration result):
Expand All @@ -327,7 +327,7 @@ Following options are used for metadata caching (meta.json files, deletion mark
- `metafile_content_ttl`: how long to cache content of meta.json and deletion mark files.
- `metafile_max_size`: maximum size of cached meta.json and deletion mark file. Larger files are not cached.

Note that chunks and metadata cache is an experimental feature, and these fields may be renamed or removed completely in the future.
Note that [chunks](../design.md/#Note) and metadata cache is an experimental feature, and these fields may be renamed or removed completely in the future.

## Index Header

Expand Down
14 changes: 8 additions & 6 deletions docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ Data sources that persist their data for long-term storage do so via the Prometh

A blocks top-level directory is a ULID (like UUID but lexicographically sortable and encoding the creation time).

* Chunk files hold a few hundred MB worth of chunks each. Chunks for the same series are sequentially aligned. Series in return are aligned by their metric name. This becomes relevant further down.
* The index file holds all information needed to lookup specific series by their labels and the positions of their chunks.
* [Chunk](design.md/#Note) files hold a few hundred MB worth of chunks each. Chunks for the same series are sequentially aligned. Series in return are aligned by their metric name. This becomes relevant further down.
* The index file holds all information needed to lookup specific series by their labels and the positions of their [chunks](design.md/#Note).
* `meta.json` holds meta information about a block like stats, time range, and compaction level.

Those block files can be backed up to an object storage and later be queried by another component (see below).
Expand All @@ -63,24 +63,26 @@ The meta.json is updated during upload time on sidecars.
│ Object Storage │
└──────────────────────────────────────────────────┘
```
#### Note
A chunk is part of the data structure of Prometheus TSDB, holding up to 120 samples for a single timeseries, and a chunk file is the file in TSDB block that contains up to 0.5 GB worth of chunk entries in binary format.

### Stores

A store node acts as a gateway to block data that is stored in an object storage bucket. It implements the same gRPC API as data sources to provide access to all metric data found in the bucket.

It continuously synchronizes which blocks exist in the bucket and translates requests for metric data into object storage requests. It implements various strategies to minimize the number of requests to the object storage such as filtering relevant blocks by their metadata (e.g. time range and labels) and caching frequent index lookups.

The Prometheus 2.0 storage layout is optimized for minimal read amplification. For example, sample data for the same time series is sequentially aligned in a chunk file. Similarly, series for the same metric name are sequentially aligned as well.
The store node is aware of the files' layout and translates data requests into a plan of a minimum amount of object storage request. Each request may fetch up to hundreds of thousands of chunks at once. This is essential to satisfy even big queries with a limited amount of requests to the object storage.
The Prometheus 2.0 storage layout is optimized for minimal read amplification. For example, sample data for the same time series is sequentially aligned in a [chunk](design.md/#Note) file. Similarly, series for the same metric name are sequentially aligned as well.
The store node is aware of the files' layout and translates data requests into a plan of a minimum amount of object storage request. Each request may fetch up to hundreds of thousands of [chunks](design.md/#Note) at once. This is essential to satisfy even big queries with a limited amount of requests to the object storage.

Currently only index data is cached. Chunk data could be cached but is orders of magnitude larger in size. In the current state, fetching chunk data from the object storage already only accounts for a small fraction of end-to-end latency. Thus, there's currently no incentive to increase the store nodes resource requirements/limit its scalability by adding chunk caching.
Currently only index data is cached. [Chunk](design.md/#Note) data could be cached but is orders of magnitude larger in size. In the current state, fetching chunk data from the object storage already only accounts for a small fraction of end-to-end latency. Thus, there's currently no incentive to increase the store nodes resource requirements/limit its scalability by adding chunk caching.

### Stores & Data Sources - It's all the same

Since store nodes and data sources expose the same gRPC Store API, clients can largely treat them as equivalent and don't have to be concerned with which specific component they are querying.
Each implementer of the Store API advertise meta information about the data they provide. This allows clients to minimize the set of nodes they have to fan out to, to satisfy a particular data query.

In its essence, the Store API allows to look up data by a set of label matchers (as known from PromQL), and a time range. It returns compressed chunks of samples as they are found in the block data. It is purely a data retrieval API and does _not_ provide complex query execution.
In its essence, the Store API allows to look up data by a set of label matchers (as known from PromQL), and a time range. It returns compressed [chunks](design.md/#Note) of samples as they are found in the block data. It is purely a data retrieval API and does _not_ provide complex query execution.

```
┌──────────────────────┐ ┌────────────┬─────────┐ ┌────────────┐
Expand Down
2 changes: 1 addition & 1 deletion docs/operating/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ In this halted example, we can read that compactor detected 2 overlapped blocks.
* Duplicated upload with different ULID (non-persistent storage for Prometheus can cause this)
* 2 Prometheus instances are misconfigured and they are uploading the data with exactly the same external labels. This is wrong, they should be unique.

Checking producers log for such ULID, and checking meta.json (e.g if sample stats are the same or not) helps. Checksum the index and chunks files as well to reveal if data is exactly the same, thus ok to be removed manually. You may find `scripts/thanos-block.jq` script useful when inspecting `meta.json` files, as it translates timestamps to human-readable form.
Checking producers log for such ULID, and checking meta.json (e.g if sample stats are the same or not) helps. Checksum the index and [chunks](../design.md/#Note) files as well to reveal if data is exactly the same, thus ok to be removed manually. You may find `scripts/thanos-block.jq` script useful when inspecting `meta.json` files, as it translates timestamps to human-readable form.

### Reasons

Expand Down
12 changes: 6 additions & 6 deletions docs/storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -445,7 +445,7 @@ At that point, anyone can use your provider by spec.
## Data in Object Storage

Thanos supports writing and reading data in native Prometheus `TSDB blocks` in [TSDB format](https://github.com/prometheus/prometheus/tree/master/tsdb/docs/format).
This is the format used by [Prometheus](https://prometheus.io) TSDB database for persisting data on the local disk. With the efficient index and chunk binary formats,
This is the format used by [Prometheus](https://prometheus.io) TSDB database for persisting data on the local disk. With the efficient index and [chunk](design.md/#Note) binary formats,
it also fits well to be used directly from object storage using range GET API.

Following sections explain this format in details with the additional files and entries that Thanos system supports.
Expand Down Expand Up @@ -653,7 +653,7 @@ From high level it allows to find:
* Label names
* Label values for label name
* All series labels
* Given (or all) series' chunk reference. This can be used to find chunk with samples in the [chunk files](#chunks-file-format)
* Given (or all) series' chunk reference. This can be used to find [chunk](design.md/#Note) with samples in the [chunk files](#chunks-file-format)

The following describes the format of the `index` file found in each block directory.
It is terminated by a table of contents which serves as an entry point into the index.
Expand Down Expand Up @@ -719,7 +719,7 @@ Strings are referenced by sequential indexing. The strings are sorted in lexicog
##### Series
The section contains a sequence of series that hold the label set of the series as well as its chunks within the block. The series are sorted lexicographically by their label sets.
The section contains a sequence of series that hold the label set of the series as well as its [chunks](design.md/#Note) within the block. The series are sorted lexicographically by their label sets.
Each series section is aligned to 16 bytes. The ID for a series is the `offset/16`. This serves as the series' ID in all subsequent references. Thereby, a sorted list of series IDs implies a lexicographically sorted list of series label sets.
```
Expand All @@ -735,9 +735,9 @@ Each series section is aligned to 16 bytes. The ID for a series is the `offset/1
```
Every series entry first holds its number of labels, followed by tuples of symbol table references that contain the label name and value. The label pairs are lexicographically sorted.
After the labels, the number of indexed chunks is encoded, followed by a sequence of metadata entries containing the chunks minimum (`mint`) and maximum (`maxt`) timestamp and a reference to its position in the chunk file. The `mint` is the time of the first sample and `maxt` is the time of the last sample in the chunk. Holding the time range data in the index allows dropping chunks irrelevant to queried time ranges without accessing them directly.
After the labels, the number of indexed [chunks](design.md/#Note) is encoded, followed by a sequence of metadata entries containing the chunks minimum (`mint`) and maximum (`maxt`) timestamp and a reference to its position in the chunk file. The `mint` is the time of the first sample and `maxt` is the time of the last sample in the chunk. Holding the time range data in the index allows dropping chunks irrelevant to queried time ranges without accessing them directly.
`mint` of the first chunk is stored, it's `maxt` is stored as a delta and the `mint` and `maxt` are encoded as deltas to the previous time for subsequent chunks. Similarly, the reference of the first chunk is stored and the next ref is stored as a delta to the previous one.
`mint` of the first [chunk](design.md/#Note) is stored, it's `maxt` is stored as a delta and the `mint` and `maxt` are encoded as deltas to the previous time for subsequent chunks. Similarly, the reference of the first chunk is stored and the next ref is stored as a delta to the previous one.
```
┌──────────────────────────────────────────────────────────────────────────┐
Expand Down Expand Up @@ -911,7 +911,7 @@ The following describes the format of a chunks file,
which is created in the `chunks/` directory of a block.
The maximum size per segment file is 512MiB.
Chunks in the files are referenced from the index by uint64 composed of
[Chunks](design.md/#Note) in the files are referenced from the index by uint64 composed of
in-file offset (lower 4 bytes) and segment sequence number (upper 4 bytes).
```
Expand Down

0 comments on commit 4574490

Please sign in to comment.