Skip to content

Commit

Permalink
Merge upstream f6871f7 (thanos-io#149)
Browse files Browse the repository at this point in the history
* mixins: Add code/grpc-code dimension to error widgets

Signed-off-by: Douglas Camata <[email protected]>

* Update changelog

Signed-off-by: Douglas Camata <[email protected]>

* Fix messed up merge conflict resolution

Signed-off-by: Douglas Camata <[email protected]>

* Readd empty line at the end of changelog

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

* mixin(Rule): Add rule evaluation failures to the Rule dashboard (thanos-io#6244)

* Improve Thanos Rule dashboard legends

Signed-off-by: Douglas Camata <[email protected]>

* Add evaluations failed to Rule dashboard

Signed-off-by: Douglas Camata <[email protected]>

* Refactor rule dashboard

Signed-off-by: Douglas Camata <[email protected]>

* Add changelog entry

Signed-off-by: Douglas Camata <[email protected]>

* Rerun CI

Signed-off-by: Douglas Camata <[email protected]>

---------

Signed-off-by: Douglas Camata <[email protected]>

* added thanos logo in react app (thanos-io#6264)

Signed-off-by: hackeramitkumar <[email protected]>

* Add an experimental flag to block samples with timestamp too far in the future (thanos-io#6195)

* Add an experimental flag to block samples with timestamp too far in the future

Signed-off-by: Yi Jin <[email protected]>

* fix bug

Signed-off-by: Yi Jin <[email protected]>

* address comments

Signed-off-by: Yi Jin <[email protected]>

* fix docs CI errors

Signed-off-by: Yi Jin <[email protected]>

* resolve merge conflicts

Signed-off-by: Yi Jin <[email protected]>

* resolve merge conflicts

Signed-off-by: Yi Jin <[email protected]>

* retrigger checks

Signed-off-by: Yi Jin <[email protected]>

---------

Signed-off-by: Yi Jin <[email protected]>

* store/bucket: snappy-encoded postings reading improvements (thanos-io#6245)

* store: pool input to snappy.Decode

Pool input to snappy.Decode to avoid allocations.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* store: use s2 for decoding snappy

It's faster hence use it.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* store: small code style adjustment

Signed-off-by: Giedrius Statkevičius <[email protected]>

* store: call closefns before returning err

Signed-off-by: Giedrius Statkevičius <[email protected]>

* store/postings_codec: return both if possible

Signed-off-by: Giedrius Statkevičius <[email protected]>

* store/bucket: always call close fns

Signed-off-by: Giedrius Statkevičius <[email protected]>

---------

Signed-off-by: Giedrius Statkevičius <[email protected]>

* truncateExtLabels support Unicode cut (thanos-io#6267)

* truncateExtLabels support Unicode cut

Signed-off-by: mickeyzzc <[email protected]>

* update TestTruncateExtLabels and pass test

Signed-off-by: mickeyzzc <[email protected]>

---------

Signed-off-by: mickeyzzc <[email protected]>

* Update mentorship links

Signed-off-by: Saswata Mukherjee <[email protected]>

* Fix segfault in LabelValues during head compaction (thanos-io#6271)

* Fix segfault in LabelValues during head compaction

Head compaction causes blocks outside the retention period to get deleted.
If there is an in-flight LabelValues request at the same time, deleting
the block can cause the store proxy to panic since it loses access to
the data.

This commit fixes the issue by copying label values from TSDB stores
before returning them to the store proxy. I thought about exposing
a Close method on the TSDB store which the Proxy can call, but this will
not eliminate cases where gRPC defers sending data over a channel using its
queueing mechanism.

Signed-off-by: Filip Petkovski <[email protected]>

* Add changelog entry

Signed-off-by: Filip Petkovski <[email protected]>

* Assert no error when querying labels

Signed-off-by: Filip Petkovski <[email protected]>

---------

Signed-off-by: Filip Petkovski <[email protected]>

* Mixin: Allow specifying an instance name filter (thanos-io#6273)

This commit allow specifying the instance name filter, in order to
filter the datasources shown on the dashboards.

For example, when generating the dashboards one can do the following
(i.e in config.libsonnet)

```
  dashboard+:: {
    prefix: 'Thanos / ',
    ...
    instance_name_filter: '/EU.*/'
```

Signed-off-by: Jacob Baungard Hansen <[email protected]>

* Adds Deno to adopters.yml (thanos-io#6275)

Signed-off-by: Will (Newby) Atlas <[email protected]>

* Bump `make test` timeout (thanos-io#6276)

Signed-off-by: Douglas Camata <[email protected]>

* fix 0.31 changelog (thanos-io#6278)

Signed-off-by: junot <[email protected]>

* Query: Switch Multiple Engines (thanos-io#6234)

* Query: Switch engines using `engine` param

Thanos query has two engine, prometheus (default) and thanos.
A single engine runs through thanos query command at a time, and
have to re run the command to switch between.

This commit adds a functionality to run multiple engines at once
and switch between them using `engine` query param inq query api.

To avoid duplicate matrics registration, the thanos engine is
provided with a different registerer having prefix `tpe_` (not
been finalized yet).

promql-engine command line flag has been removed that specifies
the query engine to run.

Currently this functionality not implemented on GRPCAPI.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Add multiple engine support to GRPCAPI

Fix build fail for thanos, adds support for multiple engine in
GRPCAPI.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Create QueryEngineFactory to create engines

QueryEngineFactory makes a collection for all promql engines used
by thanos and returns it. Any engine can be created and returned
using `GetXEngine` method.

It is currently limited to 2 engines prometheus and thanos engines
that get created on the first call.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Use QueryEngineFactory in query API

thanos query commands pass `QueryEngineFactory` to query apis
that will use engine based on query params. It will provide more
flexibility to create multiple engines in thanos.

Adds `defaultEngine` CLI flag, A default engine to use if not
specified with query params.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Update Query API tests

Fixes breaking tests

Signed-off-by: Pradyumna Krishna <[email protected]>

* Minor changes and Docs fixes

* Move defaultEngine argument to reduce diff.
* Generated Docs.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Add Engine Selector/ Dropdown to Query UI

Engine Selector is a dropdown that sets an engine to be used to
run the query. Currently two engines `thanos` and `prometheus`.

This dropdown sends a query param `engine` to query api, which
runs the api using the engine provided. Provided to run query
using multiple query engines from Query UI.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Move Engine Selector to Panel

Removes Dropdown component, and renders Engine Selector directly.
Receives defaultEngine from `flags` API.
Updates parseOptions to parse engine query param and updates test
for Panel and utils.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Upgrade promql-engine dependency

Updates promql-engine that brings functionality to provide
fallback engine using enigne Opts.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Add MinT to remote client

MinT method was missing from Client due to updated promql-engine.
This commits adds mint to the remote client.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Use prometheus fallback engine in thanos engine

Thanos engine creates a fallback prometheus engine that conflicts
with another prometheus engine created by thanos, while
registering metrics. To fix this, provided created thanos engine
as fallback engine to thanos engine in engine Opts.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Use enum for EngineType in GRPC

GRPC is used for communication between thanos components and
defaultEngine was a string before. Enum makes more sense, and
hence the request.Enigne type has been changed to
querypb.EngineType.
Default case is handled with another default value provided in
the enum.

Signed-off-by: Pradyumna Krishna <[email protected]>

* Update query UI bindata.go

Compile react app using `make assets`.

Signed-off-by: Pradyumna Krishna <[email protected]>

---------

Signed-off-by: Pradyumna Krishna <[email protected]>

* docs: mismatch in changelog

Signed-off-by: Etienne Martel <[email protected]>

* Updates busybox SHA (thanos-io#6283)

Signed-off-by: GitHub <[email protected]>
Co-authored-by: fpetkovski <[email protected]>

* Upgrade prometheus to 7309ac272195cb856b879306d6a27af7641d3346 (thanos-io#6287)

* Upgrade prometheus to 7309ac272195cb856b879306d6a27af7641d3346

Signed-off-by: Alex Le <[email protected]>

* Reverted test code

Signed-off-by: Alex Le <[email protected]>

* Updated comment

Signed-off-by: Alex Le <[email protected]>

* docs: mismatch in changelog

Signed-off-by: Etienne Martel <[email protected]>
Signed-off-by: Alex Le <[email protected]>

* Updates busybox SHA (thanos-io#6283)

Signed-off-by: GitHub <[email protected]>
Co-authored-by: fpetkovski <[email protected]>
Signed-off-by: Alex Le <[email protected]>

* trigger workflow

Signed-off-by: Alex Le <[email protected]>

* trigger workflow

Signed-off-by: Alex Le <[email protected]>

---------

Signed-off-by: Alex Le <[email protected]>
Signed-off-by: Etienne Martel <[email protected]>
Signed-off-by: GitHub <[email protected]>
Co-authored-by: Etienne Martel <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: fpetkovski <[email protected]>

* Add CarTrade Tech as new adopter

Signed-off-by: naveadkazi <[email protected]>

* tests: Remove custom Between test matcher (thanos-io#6310)

* Remove custom Between test matcher

The upstream PR to efficientgo/e2e has been merged, so we can use  it from there.

Signed-off-by: Douglas Camata <[email protected]>

* Run go mod tidy

Signed-off-by: Douglas Camata <[email protected]>

---------

Signed-off-by: Douglas Camata <[email protected]>

* query frontend, query UI: Native histogram support (thanos-io#6071)

* Implemented native histogram support for qfe and query UI

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Fixed marshalling for histograms in qfe

Started working on native histogram query ui

Copied histogram implementation for graph

Added query range support for native histograms in qfe

Use prom model (un-)marshal for native histograms in qfe

Use prom model (un-)marshal for native histograms in qfe

Fixed sample and sample stream marshal fn

Extended qfe native histogram e2e tests

Added copyright to qfe queryrange compat

Added query range test fo histograms and try to fix ui tests

Fixed DataTable test

Review feedback

Fixed native histogram e2e test

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Add histogram support for ApplyCounterResetsSeriesIterator

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Made assets

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Add chnagelog

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Fixed changelog

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Fixed qfe

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Fixed PrometheusResponse minTime for histograms in qfe

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Updated prometheus common to v0.40.0 and queryrange.Sample fixes

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Updated Readme

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Addressed PR comments

Signed-off-by: Sebastian Rabenhorst <[email protected]>

trigger tests

Signed-off-by: Sebastian Rabenhorst <[email protected]>

Made assets

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* Made assets

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* fixed tsdbutil references

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* fixed imports

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* Enabled pushdown for query native hist test and removed ToDo

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* Refactored native histogram query UI

Signed-off-by: Sebastian Rabenhorst <[email protected]>

---------

Signed-off-by: Sebastian Rabenhorst <[email protected]>

* store: add streamed snappy encoding for postings list (thanos-io#6303)

* store: add streamed snappy encoding for postings list

We've noticed that decoding Snappy compressed postings list
takes a lot of RAM:

```
(pprof) top
Showing nodes accounting for 1427.30GB, 67.55% of 2112.82GB total
Dropped 1069 nodes (cum <= 10.56GB)
Showing top 10 nodes out of 82
      flat  flat%   sum%        cum   cum%
         0     0%     0%  1905.67GB 90.20%  golang.org/x/sync/errgroup.(*Group).Go.func1
    2.08GB 0.098% 0.098%  1456.94GB 68.96%  github.com/thanos-io/thanos/pkg/store.(*blockSeriesClient).ExpandPostings
    1.64GB 0.078%  0.18%  1454.87GB 68.86%  github.com/thanos-io/thanos/pkg/store.(*bucketIndexReader).ExpandedPostings
    2.31GB  0.11%  0.29%  1258.15GB 59.55%  github.com/thanos-io/thanos/pkg/store.(*bucketIndexReader).fetchPostings
    1.48GB  0.07%  0.36%  1219.67GB 57.73%  github.com/thanos-io/thanos/pkg/store.diffVarintSnappyDecode
 1215.21GB 57.52% 57.87%  1215.21GB 57.52%  github.com/klauspost/compress/s2.Decode
```

This is because we are creating a new []byte slice for the decoded data
each time. To avoid this RAM usage problem, let's stream the decoding
from a given buffer. Since Snappy block format doesn't support streamed
decoding, let's switch to Snappy stream format which is made for exactly
that.

Notice that our current `index.Postings` list does not
support going back through Seek() even if theoretically one could want
something like that. Fortunately, to search for posting intersection, we
need to only go forward.

Benchmark data:

```
name                                                          time/op
PostingsEncodingDecoding/10000/raw/encode-16                  71.6µs ± 3%
PostingsEncodingDecoding/10000/raw/decode-16                  76.3ns ± 4%
PostingsEncodingDecoding/10000#01/snappy/encode-16            73.3µs ± 1%
PostingsEncodingDecoding/10000#01/snappy/decode-16            1.63µs ± 6%
PostingsEncodingDecoding/10000#02/snappyStreamed/encode-16     111µs ± 2%
PostingsEncodingDecoding/10000#02/snappyStreamed/decode-16    14.5µs ± 7%
PostingsEncodingDecoding/100000/snappyStreamed/encode-16      1.09ms ± 2%
PostingsEncodingDecoding/100000/snappyStreamed/decode-16      14.4µs ± 4%
PostingsEncodingDecoding/100000#01/raw/encode-16               710µs ± 1%
PostingsEncodingDecoding/100000#01/raw/decode-16              79.3ns ±13%
PostingsEncodingDecoding/100000#02/snappy/encode-16            719µs ± 1%
PostingsEncodingDecoding/100000#02/snappy/decode-16           13.5µs ± 4%
PostingsEncodingDecoding/1000000/raw/encode-16                7.14ms ± 1%
PostingsEncodingDecoding/1000000/raw/decode-16                81.7ns ± 9%
PostingsEncodingDecoding/1000000#01/snappy/encode-16          7.52ms ± 3%
PostingsEncodingDecoding/1000000#01/snappy/decode-16           139µs ± 4%
PostingsEncodingDecoding/1000000#02/snappyStreamed/encode-16  11.4ms ± 4%
PostingsEncodingDecoding/1000000#02/snappyStreamed/decode-16  15.5µs ± 4%

name                                                          alloc/op
PostingsEncodingDecoding/10000/raw/encode-16                  13.6kB ± 0%
PostingsEncodingDecoding/10000/raw/decode-16                   96.0B ± 0%
PostingsEncodingDecoding/10000#01/snappy/encode-16            25.9kB ± 0%
PostingsEncodingDecoding/10000#01/snappy/decode-16            11.0kB ± 0%
PostingsEncodingDecoding/10000#02/snappyStreamed/encode-16    16.6kB ± 0%
PostingsEncodingDecoding/10000#02/snappyStreamed/decode-16     148kB ± 0%
PostingsEncodingDecoding/100000/snappyStreamed/encode-16       148kB ± 0%
PostingsEncodingDecoding/100000/snappyStreamed/decode-16       148kB ± 0%
PostingsEncodingDecoding/100000#01/raw/encode-16               131kB ± 0%
PostingsEncodingDecoding/100000#01/raw/decode-16               96.0B ± 0%
PostingsEncodingDecoding/100000#02/snappy/encode-16            254kB ± 0%
PostingsEncodingDecoding/100000#02/snappy/decode-16            107kB ± 0%
PostingsEncodingDecoding/1000000/raw/encode-16                1.25MB ± 0%
PostingsEncodingDecoding/1000000/raw/decode-16                 96.0B ± 0%
PostingsEncodingDecoding/1000000#01/snappy/encode-16          2.48MB ± 0%
PostingsEncodingDecoding/1000000#01/snappy/decode-16          1.05MB ± 0%
PostingsEncodingDecoding/1000000#02/snappyStreamed/encode-16  1.47MB ± 0%
PostingsEncodingDecoding/1000000#02/snappyStreamed/decode-16   148kB ± 0%

name                                                          allocs/op
PostingsEncodingDecoding/10000/raw/encode-16                    2.00 ± 0%
PostingsEncodingDecoding/10000/raw/decode-16                    2.00 ± 0%
PostingsEncodingDecoding/10000#01/snappy/encode-16              3.00 ± 0%
PostingsEncodingDecoding/10000#01/snappy/decode-16              4.00 ± 0%
PostingsEncodingDecoding/10000#02/snappyStreamed/encode-16      4.00 ± 0%
PostingsEncodingDecoding/10000#02/snappyStreamed/decode-16      5.00 ± 0%
PostingsEncodingDecoding/100000/snappyStreamed/encode-16        4.00 ± 0%
PostingsEncodingDecoding/100000/snappyStreamed/decode-16        5.00 ± 0%
PostingsEncodingDecoding/100000#01/raw/encode-16                2.00 ± 0%
PostingsEncodingDecoding/100000#01/raw/decode-16                2.00 ± 0%
PostingsEncodingDecoding/100000#02/snappy/encode-16             3.00 ± 0%
PostingsEncodingDecoding/100000#02/snappy/decode-16             4.00 ± 0%
PostingsEncodingDecoding/1000000/raw/encode-16                  2.00 ± 0%
PostingsEncodingDecoding/1000000/raw/decode-16                  2.00 ± 0%
PostingsEncodingDecoding/1000000#01/snappy/encode-16            3.00 ± 0%
PostingsEncodingDecoding/1000000#01/snappy/decode-16            4.00 ± 0%
PostingsEncodingDecoding/1000000#02/snappyStreamed/encode-16    4.00 ± 0%
PostingsEncodingDecoding/1000000#02/snappyStreamed/decode-16    5.00 ± 0%
```

Compression ratios are still the same like previously:

```
$ /bin/go test -v -timeout 10m -run ^TestDiffVarintCodec$ github.com/thanos-io/thanos/pkg/store
[snip]
=== RUN   TestDiffVarintCodec/snappy/i!~"2.*"
    postings_codec_test.go:73: postings entries: 944450
    postings_codec_test.go:74: original size (4*entries): 3777800 bytes
    postings_codec_test.go:80: encoded size 44498 bytes
    postings_codec_test.go:81: ratio: 0.012
=== RUN   TestDiffVarintCodec/snappyStreamed/i!~"2.*"
    postings_codec_test.go:73: postings entries: 944450
    postings_codec_test.go:74: original size (4*entries): 3777800 bytes
    postings_codec_test.go:80: encoded size 44670 bytes
    postings_codec_test.go:81: ratio: 0.012
```

Signed-off-by: Giedrius Statkevičius <[email protected]>

* store: clean up postings code

Signed-off-by: Giedrius Statkevičius <[email protected]>

* store: fix estimation

Signed-off-by: Giedrius Statkevičius <[email protected]>

* store: use buffer.Bytes()

Signed-off-by: Giedrius Statkevičius <[email protected]>

* store/postings_codec: reuse extgrpc compressors/decompressors

Signed-off-by: Giedrius Statkevičius <[email protected]>

* CHANGELOG: add item

Signed-off-by: Giedrius Statkevičius <[email protected]>

* CHANGELOG: clean up whitespace

Signed-off-by: Giedrius Statkevičius <[email protected]>

---------

Signed-off-by: Giedrius Statkevičius <[email protected]>

* compact: atomically replace no compact marked map (thanos-io#6319)

With lots of blocks it could take some time to fill this no compact
marked map hence replace it atomically. I believe this leads to problems
in the compaction planner where it picks up no compact marked blocks
because meta syncer does synchronizations concurrently.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Fixed modules, logicalplan flag and more

* Made assets

* Removed unused test function

* Sorted labels

---------

Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: hackeramitkumar <[email protected]>
Signed-off-by: Yi Jin <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: mickeyzzc <[email protected]>
Signed-off-by: Saswata Mukherjee <[email protected]>
Signed-off-by: Filip Petkovski <[email protected]>
Signed-off-by: Jacob Baungard Hansen <[email protected]>
Signed-off-by: Will (Newby) Atlas <[email protected]>
Signed-off-by: junot <[email protected]>
Signed-off-by: Pradyumna Krishna <[email protected]>
Signed-off-by: Etienne Martel <[email protected]>
Signed-off-by: GitHub <[email protected]>
Signed-off-by: Alex Le <[email protected]>
Signed-off-by: naveadkazi <[email protected]>
Signed-off-by: Sebastian Rabenhorst <[email protected]>
Co-authored-by: Douglas Camata <[email protected]>
Co-authored-by: Filip Petkovski <[email protected]>
Co-authored-by: Amit kumar <[email protected]>
Co-authored-by: Yi Jin <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>
Co-authored-by: MickeyZZC <[email protected]>
Co-authored-by: Saswata Mukherjee <[email protected]>
Co-authored-by: Jacob Baungård Hansen <[email protected]>
Co-authored-by: Will (Newby) Atlas <[email protected]>
Co-authored-by: junot <[email protected]>
Co-authored-by: Pradyumna Krishna <[email protected]>
Co-authored-by: Etienne Martel <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: fpetkovski <[email protected]>
Co-authored-by: Alex Le <[email protected]>
Co-authored-by: naveadkazi <[email protected]>
  • Loading branch information
17 people authored May 2, 2023
1 parent a45abcb commit 8f1855a
Show file tree
Hide file tree
Showing 72 changed files with 1,538 additions and 573 deletions.
11 changes: 9 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re

- [#6185](https://github.com/thanos-io/thanos/pull/6185) Tracing: tracing in OTLP support configuring service_name.
- [#6192](https://github.com/thanos-io/thanos/pull/6192) Store: add flag `bucket-web-label` to select the label to use as timeline title in web UI
- [#6167](https://github.com/thanos-io/thanos/pull/6195) Receive: add flag `tsdb.too-far-in-future.time-window` to prevent clock skewed samples to pollute TSDB head and block all valid incoming samples.
- [#6273](https://github.com/thanos-io/thanos/pull/6273) Mixin: Allow specifying an instance name filter in dashboards

### Fixed

Expand All @@ -25,13 +27,18 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#6216](https://github.com/thanos-io/thanos/pull/6216) Receiver: removed hard-coded value of EnableExemplarStorage flag and set it according to max-exemplar value.
- [#6222](https://github.com/thanos-io/thanos/pull/6222) mixin(Receive): Fix tenant series received charts.
- [#6218](https://github.com/thanos-io/thanos/pull/6218) mixin(Store): handle ResourceExhausted as a non-server error. As a consequence, this error won't contribute to Store's grpc errors alerts.
- [#6271](https://github.com/thanos-io/thanos/pull/6271) Receive: Fix segfault in `LabelValues` during head compaction.

### Changed
- [#6168](https://github.com/thanos-io/thanos/pull/6168) Receiver: Make ketama hashring fail early when configured with number of nodes lower than the replication factor.
- [#6201](https://github.com/thanos-io/thanos/pull/6201) Query-Frontend: Disable absent and absent_over_time for vertical sharding.
- [#6212](https://github.com/thanos-io/thanos/pull/6212) Query-Frontend: Disable scalar for vertical sharding.
- [#6107](https://github.com/thanos-io/thanos/pull/6082) Change default user id in container image from 0(root) to 1001
- [#6107](https://github.com/thanos-io/thanos/pull/6107) Change default user id in container image from 0(root) to 1001
- [#6228](https://github.com/thanos-io/thanos/pull/6228) Conditionally generate debug messages in ProxyStore to avoid memory bloat.
- [#6231](https://github.com/thanos-io/thanos/pull/6231) mixins: Add code/grpc-code dimension to error widgets.
- [#6244](https://github.com/thanos-io/thanos/pull/6244) mixin(Rule): Add rule evaluation failures to the Rule dashboard.
- [#6303](https://github.com/thanos-io/thanos/pull/6303) Store: added and start using streamed snappy encoding for postings list instead of block based one. This leads to constant memory usage during decompression. This approximately halves memory usage when decompressing a postings list in index cache.
- [#6071](https://github.com/thanos-io/thanos/pull/6071) Query Frontend: *breaking :warning:* Add experimental native histogram support for which we updated and aligned with the [Prometheus common](https://github.com/prometheus/common) model, which is used for caching so a cache reset required.

### Removed

Expand All @@ -42,7 +49,7 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#5990](https://github.com/thanos-io/thanos/pull/5990) Cache/Redis: add support for Redis Sentinel via new option `master_name`.
- [#6008](https://github.com/thanos-io/thanos/pull/6008) *: Add counter metric `gate_queries_total` to gate.
- [#5926](https://github.com/thanos-io/thanos/pull/5926) Receiver: Add experimental string interning in writer. Can be enabled with a hidden flag `--writer.intern`.
- [#5773](https://github.com/thanos-io/thanos/pull/5773) Store: Support disabling cache index header file by setting `--disable-caching-index-header-file`. When toggled, Stores can run without needing persistent disks.
- [#5773](https://github.com/thanos-io/thanos/pull/5773) Store: Support disabling cache index header file by setting `--no-cache-index-header`. When toggled, Stores can run without needing persistent disks.
- [#5653](https://github.com/thanos-io/thanos/pull/5653) Receive: Allow setting hashing algorithm per tenant in hashrings config.
- [#6074](https://github.com/thanos-io/thanos/pull/6074) *: Add histogram metrics `thanos_store_server_series_requested` and `thanos_store_server_chunks_requested` to all Stores.
- [#6074](https://github.com/thanos-io/thanos/pull/6074) *: Allow configuring series and sample limits per `Series` request for all Stores.
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,7 @@ test: export THANOS_TEST_ALERTMANAGER_PATH= $(ALERTMANAGER)
test: check-git install-tool-deps
@echo ">> install thanos GOOPTS=${GOOPTS}"
@echo ">> running unit tests (without /test/e2e). Do export THANOS_TEST_OBJSTORE_SKIP=GCS,S3,AZURE,SWIFT,COS,ALIYUNOSS,BOS,OCI if you want to skip e2e tests against all real store buckets. Current value: ${THANOS_TEST_OBJSTORE_SKIP}"
@go test $(shell go list ./... | grep -v /vendor/ | grep -v /test/e2e | grep -v /internal/mimir-prometheus);
@go test -timeout 15m $(shell go list ./... | grep -v /vendor/ | grep -v /test/e2e | grep -v /internal/mimir-prometheus);

.PHONY: test-local
test-local: ## Runs test excluding tests for ALL object storage integrations.
Expand Down
57 changes: 24 additions & 33 deletions cmd/thanos/query.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,11 @@ import (
"github.com/prometheus/prometheus/discovery/targetgroup"
"github.com/prometheus/prometheus/model/labels"
"github.com/prometheus/prometheus/promql"
v1 "github.com/prometheus/prometheus/web/api/v1"
"github.com/thanos-community/promql-engine/engine"
"github.com/thanos-community/promql-engine/api"
"github.com/thanos-community/promql-engine/logicalplan"

apiv1 "github.com/thanos-io/thanos/pkg/api/query"
"github.com/thanos-io/thanos/pkg/api/query/querypb"
"github.com/thanos-io/thanos/pkg/block"
"github.com/thanos-io/thanos/pkg/compact/downsample"
"github.com/thanos-io/thanos/pkg/component"
Expand Down Expand Up @@ -68,13 +68,6 @@ const (
queryPushdown = "query-pushdown"
)

type promqlEngineType string

const (
promqlEnginePrometheus promqlEngineType = "prometheus"
promqlEngineThanos promqlEngineType = "thanos"
)

type queryMode string

const (
Expand Down Expand Up @@ -111,8 +104,8 @@ func registerQuery(app *extkingpin.App) {
queryTimeout := extkingpin.ModelDuration(cmd.Flag("query.timeout", "Maximum time to process query by query node.").
Default("2m"))

promqlEngine := cmd.Flag("query.promql-engine", "PromQL engine to use.").Default(string(promqlEnginePrometheus)).
Enum(string(promqlEnginePrometheus), string(promqlEngineThanos))
defaultEngine := cmd.Flag("query.promql-engine", "Default PromQL engine to use.").Default(string(apiv1.PromqlEnginePrometheus)).
Enum(string(apiv1.PromqlEnginePrometheus), string(apiv1.PromqlEngineThanos))
enableThanosPromQLEngOptimizer := cmd.Flag("query.enable-thanos-promql-engine-optimizer", "Enable query optimizer for Thanos PromQL engine ").Default("true").Bool()

promqlQueryMode := cmd.Flag("query.mode", "PromQL query mode. One of: local, distributed.").
Expand Down Expand Up @@ -347,7 +340,7 @@ func registerQuery(app *extkingpin.App) {
*queryTelemetryDurationQuantiles,
*queryTelemetrySamplesQuantiles,
*queryTelemetrySeriesQuantiles,
promqlEngineType(*promqlEngine),
*defaultEngine,
*enableThanosPromQLEngOptimizer,
storeRateLimits,
storeSelectorRelabelConf,
Expand Down Expand Up @@ -425,7 +418,7 @@ func runQuery(
queryTelemetryDurationQuantiles []float64,
queryTelemetrySamplesQuantiles []int64,
queryTelemetrySeriesQuantiles []int64,
promqlEngine promqlEngineType,
defaultEngine string,
enableThanosPromQLEngOptimizer bool,
storeRateLimits store.SeriesSelectLimits,
storeSelectorRelabelConf extflag.PathOrContent,
Expand Down Expand Up @@ -699,26 +692,22 @@ func runQuery(
engineOpts.ActiveQueryTracker = promql.NewActiveQueryTracker(activeQueryDir, maxConcurrentQueries, logger)
}

var queryEngine v1.QueryEngine
switch promqlEngine {
case promqlEnginePrometheus:
queryEngine = promql.NewEngine(engineOpts)
case promqlEngineThanos:
if queryMode == queryModeLocal {
queryEngine = engine.New(engine.Opts{EngineOpts: engineOpts, EnableXFunctions: true, LogicalOptimizers: logicalOptimizers})
} else {
remoteEngineEndpoints := query.NewRemoteEndpoints(logger, endpoints.GetQueryAPIClients, query.Opts{
AutoDownsample: enableAutodownsampling,
ReplicaLabels: queryReplicaLabels,
Timeout: queryTimeout,
EnablePartialResponse: enableQueryPartialResponse,
})
queryEngine = engine.NewDistributedEngine(engine.Opts{EngineOpts: engineOpts, EnableXFunctions: true}, remoteEngineEndpoints)
}
default:
return errors.Errorf("unknown query.promql-engine type %v", promqlEngine)
var remoteEngineEndpoints api.RemoteEndpoints
if queryMode != queryModeLocal {
remoteEngineEndpoints = query.NewRemoteEndpoints(logger, endpoints.GetQueryAPIClients, query.Opts{
AutoDownsample: enableAutodownsampling,
ReplicaLabels: queryReplicaLabels,
Timeout: queryTimeout,
EnablePartialResponse: enableQueryPartialResponse,
})
}

engineFactory := apiv1.NewQueryEngineFactory(
engineOpts,
remoteEngineEndpoints,
logicalOptimizers,
)

lookbackDeltaCreator := LookbackDeltaFactory(engineOpts, dynamicLookbackDelta)

// Start query API + UI HTTP server.
Expand Down Expand Up @@ -749,7 +738,8 @@ func runQuery(
api := apiv1.NewQueryAPI(
logger,
endpoints.GetEndpointStatus,
queryEngine,
*engineFactory,
apiv1.PromqlEngineType(defaultEngine),
lookbackDeltaCreator,
queryableCreator,
// NOTE: Will share the same replica label as the query for now.
Expand Down Expand Up @@ -834,7 +824,8 @@ func runQuery(
info.WithQueryAPIInfoFunc(),
)

grpcAPI := apiv1.NewGRPCAPI(time.Now, queryReplicaLabels, queryableCreator, queryEngine, lookbackDeltaCreator, instantDefaultMaxSourceResolution)
defaultEngineType := querypb.EngineType(querypb.EngineType_value[defaultEngine])
grpcAPI := apiv1.NewGRPCAPI(time.Now, queryReplicaLabels, queryableCreator, *engineFactory, defaultEngineType, lookbackDeltaCreator, instantDefaultMaxSourceResolution)
storeServer := store.NewLimitedStoreServer(store.NewInstrumentedStoreServer(reg, proxy), reg, storeRateLimits)
s := grpcserver.New(logger, reg, tracer, grpcLogOpts, tagOpts, comp, grpcProbe,
grpcserver.WithServer(apiv1.RegisterQueryServer(grpcAPI)),
Expand Down
11 changes: 10 additions & 1 deletion cmd/thanos/receive.go
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,10 @@ func runReceive(
conf.allowOutOfOrderUpload,
hashFunc,
)
writer := receive.NewWriter(log.With(logger, "component", "receive-writer"), dbs, conf.writerInterning)
writer := receive.NewWriter(log.With(logger, "component", "receive-writer"), dbs, &receive.WriterOptions{
Intern: conf.writerInterning,
TooFarInFutureTimeWindow: int64(time.Duration(*conf.tsdbTooFarInFutureTimeWindow)),
})

var limitsConfig *receive.RootLimitsConfig
if conf.writeLimitsConfig != nil {
Expand Down Expand Up @@ -783,6 +786,7 @@ type receiveConfig struct {

tsdbMinBlockDuration *model.Duration
tsdbMaxBlockDuration *model.Duration
tsdbTooFarInFutureTimeWindow *model.Duration
tsdbOutOfOrderTimeWindow *model.Duration
tsdbOutOfOrderCapMax int64
tsdbAllowOverlappingBlocks bool
Expand Down Expand Up @@ -875,6 +879,11 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {

rc.tsdbMaxBlockDuration = extkingpin.ModelDuration(cmd.Flag("tsdb.max-block-duration", "Max duration for local TSDB blocks").Default("2h").Hidden())

rc.tsdbTooFarInFutureTimeWindow = extkingpin.ModelDuration(cmd.Flag("tsdb.too-far-in-future.time-window",
"[EXPERIMENTAL] Configures the allowed time window for ingesting samples too far in the future. Disabled (0s) by default"+
"Please note enable this flag will reject samples in the future of receive local NTP time + configured duration due to clock skew in remote write clients.",
).Default("0s"))

rc.tsdbOutOfOrderTimeWindow = extkingpin.ModelDuration(cmd.Flag("tsdb.out-of-order.time-window",
"[EXPERIMENTAL] Configures the allowed time window for ingestion of out-of-order samples. Disabled (0s) by default"+
"Please note if you enable this option and you use compactor, make sure you have the --enable-vertical-compaction flag enabled, otherwise you might risk compactor halt.",
Expand Down
2 changes: 1 addition & 1 deletion docs/components/query.md
Original file line number Diff line number Diff line change
Expand Up @@ -399,7 +399,7 @@ Flags:
no partial_response param is specified.
--no-query.partial-response for disabling.
--query.promql-engine=prometheus
PromQL engine to use.
Default PromQL engine to use.
--query.replica-label=QUERY.REPLICA-LABEL ...
Labels to treat as a replica indicator along
which data is deduplicated. Still you will
Expand Down
8 changes: 8 additions & 0 deletions docs/components/receive.md
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,14 @@ Flags:
refer to the Tenant lifecycle management
section in the Receive documentation:
https://thanos.io/tip/components/receive.md/#tenant-lifecycle-management
--tsdb.too-far-in-future.time-window=0s
[EXPERIMENTAL] Configures the allowed time
window for ingesting samples too far in the
future. Disabled (0s) by defaultPlease note
enable this flag will reject samples in the
future of receive local NTP time + configured
duration due to clock skew in remote write
clients.
--tsdb.wal-compression Compress the tsdb WAL.
--version Show application version.
Expand Down
15 changes: 5 additions & 10 deletions examples/dashboards/overview.json
Original file line number Diff line number Diff line change
Expand Up @@ -161,10 +161,9 @@
"steppedLine": false,
"targets": [
{
"expr": "sum by (job) (rate(http_requests_total{handler=\"query\",code=~\"5..\"}[$interval])) / sum by (job) (rate(http_requests_total{handler=\"query\"}[$interval]))",
"expr": "sum by (job, code) (rate(http_requests_total{handler=\"query\",code=~\"5..\"}[$interval])) / ignoring (code) group_left() sum by (job) (rate(http_requests_total{handler=\"query\"}[$interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "error",
"step": 10
}
],
Expand Down Expand Up @@ -466,10 +465,9 @@
"steppedLine": false,
"targets": [
{
"expr": "sum by (job) (rate(http_requests_total{handler=\"query_range\",code=~\"5..\"}[$interval])) / sum by (job) (rate(http_requests_total{handler=\"query_range\"}[$interval]))",
"expr": "sum by (job, code) (rate(http_requests_total{handler=\"query_range\",code=~\"5..\"}[$interval])) / ignoring (code) group_left() sum by (job) (rate(http_requests_total{handler=\"query_range\"}[$interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "error",
"step": 10
}
],
Expand Down Expand Up @@ -823,10 +821,9 @@
"steppedLine": false,
"targets": [
{
"expr": "sum by (job) (rate(grpc_server_handled_total{grpc_code=~\"Unknown|ResourceExhausted|Internal|Unavailable|DataLoss\",grpc_type=\"unary\"}[$interval])) / sum by (job) (rate(grpc_server_handled_total{grpc_type=\"unary\"}[$interval]))",
"expr": "sum by (job, grpc_code) (rate(grpc_server_handled_total{grpc_code=~\"Unknown|ResourceExhausted|Internal|Unavailable|DataLoss\",grpc_type=\"unary\"}[$interval])) / ignoring (grpc_code) group_left() sum by (job) (rate(grpc_server_handled_total{grpc_type=\"unary\"}[$interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "error",
"step": 10
}
],
Expand Down Expand Up @@ -1180,10 +1177,9 @@
"steppedLine": false,
"targets": [
{
"expr": "sum by (job) (rate(grpc_server_handled_total{grpc_code=~\"Unknown|ResourceExhausted|Internal|Unavailable|DataLoss\",grpc_type=\"unary\"}[$interval])) / sum by (job) (rate(grpc_server_handled_total{grpc_type=\"unary\"}[$interval]))",
"expr": "sum by (job, grpc_code) (rate(grpc_server_handled_total{grpc_code=~\"Unknown|ResourceExhausted|Internal|Unavailable|DataLoss\",grpc_type=\"unary\"}[$interval])) / ignoring (grpc_code) group_left() sum by (job) (rate(grpc_server_handled_total{grpc_type=\"unary\"}[$interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "error",
"step": 10
}
],
Expand Down Expand Up @@ -1485,10 +1481,9 @@
"steppedLine": false,
"targets": [
{
"expr": "sum by (job) (rate(http_requests_total{handler=\"receive\",code=~\"5..\"}[$interval])) / sum by (job) (rate(http_requests_total{handler=\"receive\"}[$interval]))",
"expr": "sum by (job, code) (rate(http_requests_total{handler=\"receive\",code=~\"5..\"}[$interval])) / ignoring (code) group_left() sum by (job) (rate(http_requests_total{handler=\"receive\"}[$interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "error",
"step": 10
}
],
Expand Down
3 changes: 1 addition & 2 deletions examples/dashboards/query-frontend.json
Original file line number Diff line number Diff line change
Expand Up @@ -242,10 +242,9 @@
"steppedLine": false,
"targets": [
{
"expr": "sum by (job) (rate(http_requests_total{job=~\"$job\", handler=\"query-frontend\",code=~\"5..\"}[$interval])) / sum by (job) (rate(http_requests_total{job=~\"$job\", handler=\"query-frontend\"}[$interval]))",
"expr": "sum by (job, code) (rate(http_requests_total{job=~\"$job\", handler=\"query-frontend\",code=~\"5..\"}[$interval])) / ignoring (code) group_left() sum by (job) (rate(http_requests_total{job=~\"$job\", handler=\"query-frontend\"}[$interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "error",
"step": 10
}
],
Expand Down
Loading

0 comments on commit 8f1855a

Please sign in to comment.