Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

info: Return store info only when the service is ready #5255

Merged
merged 2 commits into from
Mar 25, 2022

Conversation

yeya24
Copy link
Contributor

@yeya24 yeya24 commented Mar 25, 2022

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Store only returns store info when its status is ready.
This is helpful in our scenario:
When a prometheus itself is down (due to storage error), the pod readiness check failed and the endpoint got removed from the service. However, because of our CNI solution, the IPVS service backend will remain for 10m(same as the pod terminationGraceSeconds) so that the Thanos Query can still connect to the sidecar for 10m, producing a lot of Query partial errors.

Once the sidecar detects that Prometheus is down, it should let the Query know that it is not available for serving any requests because it is not ready. Same for other stores.

Verification

@yeya24 yeya24 force-pushed the infopb-return-ready branch from 5009c64 to 81b2058 Compare March 25, 2022 03:39
Signed-off-by: Ben Ye <[email protected]>
Copy link
Collaborator

@matej-g matej-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, looks good @yeya24 👍

Copy link
Member

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine, however, I'm a bit worried about hypothetical meltdown-type issues where we have two layers of statefulness on top of one another.

The typical example is where you tunnel a TCP connection over SSH, which itself is built on TCP. Both connections are statefull resulting in unnecessary amount of retries that can actually overload and break the underlying connection or saturate the remote server with too many retried packets.

I can't think of an exact issue but I think there could be some similar subtleties here with two layers of readiness awareness

@yeya24
Copy link
Contributor Author

yeya24 commented Mar 25, 2022

Yeah at the beginning I was wondering if I can solve it at the k8s side and I opened prometheus-operator/prometheus-operator#4681. However, there are two cases of prometheus pod shutdown: one is the normal rollout and another one is readiness/liveness failed and being killed. I think the first case should still go through the graceful termination but the second case should cut off connections ASAP

@yeya24
Copy link
Contributor Author

yeya24 commented Mar 25, 2022

Thanks for the quick review!

@yeya24 yeya24 merged commit f0e673a into thanos-io:main Mar 25, 2022
@yeya24 yeya24 deleted the infopb-return-ready branch March 25, 2022 17:25
@hanjm
Copy link
Member

hanjm commented Mar 25, 2022

Useful feature, I am facing same issue.

openshift-merge-robot pushed a commit to stolostron/thanos that referenced this pull request Dec 8, 2022
* Remove debug line (#5245)

Signed-off-by: Matej Gera <[email protected]>

* e2e: fix compact test's flakiness (#5246)

Fix the compact test's by running this sub-test sequentially. The
further steps depend on this test's results so it's wrong to run it as a
sub-test.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* bump prometheus version to v2.33.5 (#5256)

Signed-off-by: Ben Ye <[email protected]>

* info: Return store info only when the service is ready (#5255)

* return store info only when the service is ready

Signed-off-by: Ben Ye <[email protected]>

* fix test

Signed-off-by: Ben Ye <[email protected]>

* Merge release 0.25 to main (#5210)

* Cut 0.25.0-rc.0 (#5184)

Signed-off-by: Matej Gera <[email protected]>

* Cut v0.25.0 (#5209)

Signed-off-by: Matej Gera <[email protected]>

* Create v0.25.1 built with Go 1.17.8 (#5226)

The binaries published with this release are built with Go1.17.8 to
avoid
[CVE-2022-24921](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-24921).

Signed-off-by: Matthias Loibl <[email protected]>

* *: Cut 0.25.2 rc.0 (#5247)

* fix: add null check to exemplar data (#5202)

Signed-off-by: Thomas Mota <[email protected]>

* Ruler: Fix WAL directory in stateless mode (#5242)

Signed-off-by: Matej Gera <[email protected]>

* Update CHANGELOG, VERSION

Signed-off-by: Matej Gera <[email protected]>

* Updates busybox SHA (#5234)

Signed-off-by: GitHub <[email protected]>

Co-authored-by: yeya24 <[email protected]>

Co-authored-by: Tomás Mota <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: yeya24 <[email protected]>

* Cut v0.25.2

Signed-off-by: Matej Gera <[email protected]>

Update tutorials

Signed-off-by: Matej Gera <[email protected]>

Co-authored-by: Matthias Loibl <[email protected]>
Co-authored-by: Tomás Mota <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: yeya24 <[email protected]>

* Implement GRPC query API (#5250)

With the current GRPC APIs, layering Thanos Queriers results in
the root querier getting all of the samples and executing the query
in memory. As a result, the intermediary Queriers do not do any
intensive work and merely transport samples from the Stores to the
root Querier.

When data is perfectly sharded, users can implement a pattern where
the root Querier instructs the intermediary ones to execute the queries
from their stores and return back results. The results can then be
concatenated by the root querier and returned to the user.

In order to support this use case, this commit implements a GRPC API
in the Querier which is analogous to the HTTP Query API exposed
by Prometheus.

Signed-off-by: fpetkovski <[email protected]>

* Change error cleanup in `objstore.DownloadDir` to delete files not destination dir (#5229)

* Change error cleanup in objstore.DownloadDir to delete files not directories

Dst is always a directory. If any file after the first fails to download,
the cleanup will fail because the destination already contains at least one file.
This commit changes the cleanup logic to clean up successfully downloaded files one by one
instead of attempting to clean up the whole dst directory.

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add cleanup of root dst directory.

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Add unit test for cleanup of DownloadDir

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Fix linter

Signed-off-by: Dimitar Dimitrov <[email protected]>

* Update index.html (#5264)

* Add SumUp logo to adopters (#5267)

Signed-off-by: Guilherme Souza <[email protected]>

* receive: Added tenant ID  error handling of remote write requests. (#5269)

Plus better explanation.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Add TIXnGO logo to adopters (#5273)

Signed-off-by: Pierre Hanselmann <[email protected]>

* Fix miekgdns resolver to work with CNAME records too (#5271)

* Fix miekgdns resolver to work with CNAME records too

Signed-off-by: Marco Pracucci <[email protected]>

* Remove unused context

Signed-off-by: Marco Pracucci <[email protected]>

* Update pkg/discovery/dns/miekgdns/resolver.go

Signed-off-by: Marco Pracucci <[email protected]>
Co-authored-by: Lucas Servén Marín <[email protected]>

Co-authored-by: Lucas Servén Marín <[email protected]>

* UI: Remove old ui (#5145)

* remove old ui

Signed-off-by: Augustin Husson <[email protected]>

* add changelog

Signed-off-by: Augustin Husson <[email protected]>

* update assets

Signed-off-by: Augustin Husson <[email protected]>

* Updates busybox SHA (#5283)

Signed-off-by: GitHub <[email protected]>

Co-authored-by: yeya24 <[email protected]>

* build(deps): bump moment from 2.29.1 to 2.29.2 in /pkg/ui/react-app (#5274)

Bumps [moment](https://github.com/moment/moment) from 2.29.1 to 2.29.2.
- [Release notes](https://github.com/moment/moment/releases)
- [Changelog](https://github.com/moment/moment/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/moment/moment/compare/2.29.1...2.29.2)

---
updated-dependencies:
- dependency-name: moment
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs: fix URLs preventing generation and unblock CI (#5285)

* docs: fix Ian Billett's GitHub handle

I noticed that CI was failing [0] for PR
https://github.com/thanos-io/thanos/pull/5284 because Ian had changed
his GitHub handle from @ianbillett to @bill3tt. This commit fixes this.

[0] https://github.com/thanos-io/thanos/runs/6050355497?check_suite_focus=true#step:5:135

Signed-off-by: Lucas Servén Marín <[email protected]>

* docs: fix broken links to GitHub docs

Currently, documentation generation is failing because mdox can't fetch
some GitHub documentation pages since the URLs for the help content has
changed. This commit updates the links to use the correct URLs.

Signed-off-by: Lucas Servén Marín <[email protected]>

* MAINTAINERS.md: regenerate

Signed-off-by: Lucas Servén Marín <[email protected]>

* UI: Update vulnerable dependencies (#5233)

* refactor global window typings

Use declaration merging for better window types

Signed-off-by: Gabriel Bernal <[email protected]>

* bump vulnerable react-scripts version

Signed-off-by: Gabriel Bernal <[email protected]>

* Add Vestiaire Collective as adopter (#5289)

Signed-off-by: claude ebaneck <[email protected]>

Co-authored-by: claude ebaneck <[email protected]>

* Implement Query API discovery (#5291)

A recent commit (#5250) added a GRPC API to Thanos Query which allows
executing PromQL over GRPC. This API is currently not discoverable
through endpointsets which makes it hard for other Thanos components
to use it.

This commit extends endpointsets with a GetQueryAPIClients method
which returns Query API clients to all components which support
this API.

Signed-off-by: fpetkovski <[email protected]>

* Added support for ppc64le (#5290)

* Added support for ppc64le

Signed-off-by: Marvin Giessing <[email protected]>

* Updated Changelog

Signed-off-by: Marvin Giessing <[email protected]>

* Updated promu & protoc

Signed-off-by: Marvin Giessing <[email protected]>

* Updated Makefile comment

Signed-off-by: Marvin Giessing <[email protected]>

* Added target API tests (+goleak). (#5260)

Attempted to repro https://github.com/thanos-io/thanos/issues/5257, but no good luck.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Revert "Added target API tests (+goleak). (#5260)" (#5297)

This reverts commit 955ea6dcae2529ad5b5b97a6a11150a5906d775a.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Use correct filesystem/network path separators when uploading blocks (#5281)

Signed-off-by: Arve Knudsen <[email protected]>

* query-frontend: Don't cache request with dedup=false  (#5300)

* query-frontend: Added repro for dedup affecting precision of querying.

Signed-off-by: Bartlomiej Plotka <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>

* QFE does not cache request with dedup=false.

Signed-off-by: Bartlomiej Plotka <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>

* Move info about queries that skip cache logic to docs

Signed-off-by: Douglas Camata <[email protected]>

* Update CHANGELOG

Signed-off-by: Douglas Camata <[email protected]>

* Run docs formatter

Signed-off-by: Douglas Camata <[email protected]>

* Fix e2e tests where caching logic is desired

Signed-off-by: Douglas Camata <[email protected]>

Co-authored-by: Bartlomiej Plotka <[email protected]>

* mixin: Fix typo in ThanosCompactHalted alert (#5306)

Signed-off-by: Pedro Araujo <[email protected]>

* Avoid starting goroutines for memcached batch requests before gate (#5301)

Use the doWithBatch function to avoid starting goroutines to fetch batched
results from memcached before they are allowed to run via the concurrency
Gate. This avoids starting many goroutines which cannot make any progress
due to a concurrency limit.

Fixes #4967

Signed-off-by: Nick Pillitteri <[email protected]>

* Cut readme for 0.26 (#5311)

Co-authored-by: Wiard van Rij <[email protected]>

* Reviewed and updated Changelog for 0.26-rc0 (#5313)

Signed-off-by: Wiard van Rij <[email protected]>

Co-authored-by: Wiard van Rij <[email protected]>

* Cut 0.26.0-rc.0 set version correctly (#5317)

Signed-off-by: Wiard van Rij <[email protected]>

Co-authored-by: Wiard van Rij <[email protected]>

* docs: Fix broken link to introduction blog (#5319)

Signed-off-by: jmjf <[email protected]>

* Ensure memcached batched requests handle context cancelation (#5314)

* Ensure memcached batched requests handle context cancellation

Ensure that when the context used for Memcached GetMulti is cancelled,
getMultiBatched does not hang waiting for results that will never be
generated (since the batched requests will not run if the context has
been cancelled).

Fixes an issue introduced in #5301

Signed-off-by: Nick Pillitteri <[email protected]>

* Lint fixes

Signed-off-by: Nick Pillitteri <[email protected]>

* Code review changes: run batches unconditionally

Signed-off-by: Nick Pillitteri <[email protected]>

* stalebot: add generic label to avoid stalebot (#5322)

Add a generic label which tells stalebot not to close issues marked with
it.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Use proper replicalabels in GRPC Query API (#5308)

The GRPC Query API uses only the replica labels coming from the
RPC request and ignores the ones configured when starting the querier.

This commit ensures that the API falls back on the preconfigured
replica labels when they are not provided in the request.

Signed-off-by: Filip Petkovski <[email protected]>

* groupcache: reduce log severity (#5323)

Sometimes certain operations can fail with some error(-s) being expected
e.g. a deletion marker might or might not exist. Thus, these log lines
could get triggered even though nothing bad is happening. Since the
expected errors are known only at the very end, near the call site, and
because `error`s are already logged in other places, and because these
Fetch()/Store() functions are working in best-effort scenario, I propose
reducing the severity of these log lines to `debug`.

Fixes https://github.com/thanos-io/thanos/issues/5265.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Update release process (#5325)

* update release process

Signed-off-by: Wiard van Rij <[email protected]>

* Add info about VERSION file

Signed-off-by: Wiard van Rij <[email protected]>

* query-frontend: improve docs on requestes excluded from cache (#5326)

Signed-off-by: Douglas Camata <[email protected]>

* cut release 0.26.0 (#5330)

Signed-off-by: Wiard van Rij <[email protected]>

* Updates busybox SHA (#5336)

Signed-off-by: GitHub <[email protected]>

Co-authored-by: yeya24 <[email protected]>

* receive: fix deadlock on interrupt in routerOnly mode (#5339)

* fix receive router deadlock on interrupt

Signed-off-by: François Gouteroux <[email protected]>

* Update changelog

Signed-off-by: François Gouteroux <[email protected]>

* docs: Updated information about our community call. (#5309)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* reloader: Force trigger reload when config rollbacked (#5324)

* Add Cache metrics to groupcache (#5352)

Add metrics about the hot and main caches[0].
* Number of bytes in each cache.
* Number of items in each cache.
* Counter of evictions from each cache.

[0]: https://pkg.go.dev/github.com/vimeo/galaxycache#CacheStats

Signed-off-by: SuperQ <[email protected]>

* e2e: Refactored service helpers to be consistent with new API. (#5348)

* test: Added Alert compatibilty test.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Tmp.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Update.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* update.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* update.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* e2e: Refactored service helpers for newest e2e version.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Removed alert combatibiltiy test for now.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed lint.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed lint2.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed nginx service.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixes.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fix.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fix.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* fix.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Refactored ruler.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed test.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* fixes.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fix.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed compactor.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fix.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* What about now?

Signed-off-by: Bartlomiej Plotka <[email protected]>

* groupcache: fix handling of slashes (#5357)

Use https://github.com/julienschmidt/httprouter#catch-all-parameters for
the groupcache route otherwise slashes in the cache's key gets
interpreted by the router and thus groupcache's function never gets
invoked, and all clients get 404.

Remove test regarding cache hit because now Thanos Store during test
constantly generates cache hits due to 1s delay between block
information refreshes.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Adds more info about the formatting part. (#5347)

* Adds more info about the formatting part. Closes #5282

Signed-off-by: Wiard van Rij <[email protected]>

* adds extra newline

Signed-off-by: Wiard van Rij <[email protected]>

* Update promdoc to solve #5344 (#5345)

Signed-off-by: Wiard van Rij <[email protected]>

* e2e: Refactored Receive Builder to be consistent with other helpers. (#5358)

* e2e: Refactored Receive Builder to be consistent with other helpers.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addressed comments.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Updates busybox SHA (#5365)

Signed-off-by: GitHub <[email protected]>

Co-authored-by: yeya24 <[email protected]>

* e2e: Fixed exemplar support in receive helper. (#5372)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Enforce memcached concurrency limit with unbatched requests (#5360)

* Enforce memcached concurrency limit with unbatched requests

This ensures that requests that are _not_ split into batches still count
towards the concurrency limit that the client enforces.

This fixes an issue introduced in #5301

Signed-off-by: Nick Pillitteri <[email protected]>

* Lint fix

Signed-off-by: Nick Pillitteri <[email protected]>

* docs: fix link (#5379)

I think I've found a replacement for the dead link.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* cache: do not copy data in groupcache (#5378)

Add a unsafe codec which uses the given byte slices directly to avoid
copying - we are doing ioutil.ReadAll() either way so there is no need
to copy anything.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* fix ruler send empty alerts (#5377)

Signed-off-by: Ben Ye <[email protected]>

* Add custom `errors` package with stack trace functionality (#5239)

* feat: a simple stacktrace utility

Signed-off-by: Bisakh Mondal <[email protected]>

* feat: custom errors package with new, errorf, wrapping, unwrapping and stacktrace

Signed-off-by: Bisakh Mondal <[email protected]>

* chore: update existing errors import (small subset)

Signed-off-by: Bisakh Mondal <[email protected]>

* chore: update comments

Signed-off-by: Bisakh Mondal <[email protected]>

* add errors into skip-files linter config

Signed-off-by: Bisakh Mondal <[email protected]>

* intoduce UnwrapTillCause to suffice the limitation of Unwrap

Signed-off-by: Bisakh Mondal <[email protected]>

* Revert "chore: update existing errors import (small subset)"

This reverts commit d27f0177fe6c8a357ba10e4ac8bfee87c8bf985c.

Signed-off-by: Bisakh Mondal <[email protected]>

* revert makefile && golangcilint file

Signed-off-by: Bisakh Mondal <[email protected]>

* apply PR feedbacks

Signed-off-by: Bisakh Mondal <[email protected]>

* stacktrace and errors test

Signed-off-by: Bisakh Mondal <[email protected]>

* fix typo

Signed-off-by: Bisakh Mondal <[email protected]>

* update stacktrace testing regex

Signed-off-by: Bisakh Mondal <[email protected]>

* add lint ignore for standard errors import inside errors pkg

Signed-off-by: Bisakh Mondal <[email protected]>

* [test files] add copyright headers

Signed-off-by: Bisakh Mondal <[email protected]>

* add no lint to avoid false misspell detection of keyword Tast

Signed-off-by: Bisakh Mondal <[email protected]>

* update stacktrace output test line number with regex pattern

Signed-off-by: Bisakh Mondal <[email protected]>

* return pc slice with reduced capacity

Signed-off-by: Bisakh Mondal <[email protected]>

* segregate formatted vs non formatted methods

Signed-off-by: Bisakh Mondal <[email protected]>

* update with only f functions

Signed-off-by: Bisakh Mondal <[email protected]>

* Group memcached keys based on server when performing batch gets (#5356)

* Group memcached keys based on server when performing batch gets

Order and group keys during batch get operations based on the memcached
server they will be sharded to. This reduces the number of connections
that must be made within each batch of get operations.

Fixes #5353

Signed-off-by: Nick Pillitteri <[email protected]>

* Code review changes

Signed-off-by: Nick Pillitteri <[email protected]>

* Fix error in testutil method added

Signed-off-by: Nick Pillitteri <[email protected]>

* Code review: comments for selector interface

Signed-off-by: Nick Pillitteri <[email protected]>

* QueryFrontend: pre-compile regexp (#5383)

* pre compile regexp

Signed-off-by: Jin Dong <[email protected]>

* rename oppattern to labelvaluespattern

Signed-off-by: Jin Dong <[email protected]>

* [FEAT] adding thanos consul blogpost (#5387)

Signed-off-by: Nicolas Takashi <[email protected]>

* Fix empty $externalLabels when templating labels in rule. (#5394)

Signed-off-by: Rostislav Benes <[email protected]>

Co-authored-by: Rostislav Benes <[email protected]>

* support series relabeling on Thanos receiver (#5391)

* support series relabeling on Thanos receiver

Signed-off-by: Ben Ye <[email protected]>

* add changelog

Signed-off-by: Ben Ye <[email protected]>

* fix lint

Signed-off-by: Ben Ye <[email protected]>

* update lint

Signed-off-by: Ben Ye <[email protected]>

* fix e2e test

Signed-off-by: Ben Ye <[email protected]>

* fix relabel config pass

Signed-off-by: Ben Ye <[email protected]>

* cleanup white space

Signed-off-by: Ben Ye <[email protected]>

* address review comments

Signed-off-by: Ben Ye <[email protected]>

* address comments

Signed-off-by: Ben Ye <[email protected]>

* update comment

Signed-off-by: Ben Ye <[email protected]>

* Expose GatherFileStats. (#5400)

Signed-off-by: Peter Štibraný <[email protected]>

* Rule: Error out earlier when building alertmanager config (#5405)

* Error out earlier when building alertmanager config

Signed-off-by: Jéssica Lins <[email protected]>

* Add test case for empty host

Signed-off-by: Jéssica Lins <[email protected]>

* [5130] [.*:] Upgrade Minio used for local development and e2e tests (#5392)

* add updated bingo .gitignore

Signed-off-by: B0go <[email protected]>

* update bingo minio version to commit 91130e884b5df59d66a45a0aad4f48db88f5ca63

Signed-off-by: B0go <[email protected]>

* trigger CI

Signed-off-by: B0go <[email protected]>

* Submit a proposal for vertical query sharding (#5350)

Signed-off-by: fpetkovski <[email protected]>

* query: Close() after using query (#5410)

* query: Close() after using query

This should reduce memory usage because Close() returns points back to a
sync.Pool.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* CHANGELOG: add item

Signed-off-by: Giedrius Statkevičius <[email protected]>

* query: call Close() in gRPC API too

Signed-off-by: Giedrius Statkevičius <[email protected]>

* avoided potential panic due to divide by 0 (#5412)

Signed-off-by: Aditi Ahuja <[email protected]>

* sidecar/compact/store/receiver - Add the prefix option to buckets (#5337)

* Create prefixed bucket

Signed-off-by: jademcosta <[email protected]>

* started PrefixedBucket tests

Signed-off-by: Maria Eduarda Duarte <[email protected]>

* finish objstore tests

Signed-off-by: Maria Eduarda Duarte <[email protected]>

* Simplify string removal logic

Signed-off-by: jademcosta <[email protected]>

* Test more prefix cases on PrefixedBucket

Signed-off-by: jademcosta <[email protected]>

* Only use a prefixedbucket if we have a valid prefix

Signed-off-by: jademcosta <[email protected]>

* Add single unit test for prefixedBucket prefix

Signed-off-by: jademcosta <[email protected]>

* test other prefixes on UsesPrefixTest

Signed-off-by: Maria Eduarda Duarte <[email protected]>

* add remaining methods to UsesPrefixTest

Signed-off-by: Maria Eduarda Duarte <[email protected]>

* add prefix to docs examples

Signed-off-by: Maria Eduarda Duarte <[email protected]>

* Simplify Iter method

Signed-off-by: jademcosta <[email protected]>

* add prefix explanation to S3 docs

Signed-off-by: Maria Eduarda Duarte <[email protected]>

* Conclusion of prefix sentence on docs

Signed-off-by: jademcosta <[email protected]>

* Use DirDelim instead of magic string

Signed-off-by: jademcosta <[email protected]>

* Add log when using prefixed bucket

Signed-off-by: jademcosta <[email protected]>

* Remove "@" from test string to make them simpler

Signed-off-by: jademcosta <[email protected]>

* fix BucketConfig Config type - back to interface

Signed-off-by: Maria Eduarda Duarte <[email protected]>

* add changelog

Signed-off-by: Maria Eduarda Duarte <[email protected]>

* add missing checks in UsesPrefixTest

Signed-off-by: Maria Eduarda Duarte <[email protected]>

* fix linter and test errors

Signed-off-by: Maria Eduarda Duarte <[email protected]>

* Add license to new files

Signed-off-by: jademcosta <[email protected]>

* Remove autogenerated docs

Signed-off-by: jademcosta <[email protected]>

* Remove duplicated transformation of string->[]byte

Signed-off-by: jademcosta <[email protected]>

* Add prefixed bucket on all e2e tests for S3

The idea is that if it works, we can add for all other providers.
Signed-off-by: jademcosta <[email protected]>

* Add e2e tests using prefixed bucket to all providers

Signed-off-by: jademcosta <[email protected]>

* refactor: move validPrefix to prefixed_bucket logic

Signed-off-by: Maria Eduarda Duarte <[email protected]>

* Enhance the documentation about prefix.

Signed-off-by: jademcosta <[email protected]>

* Fix format
Signed-off-by: jademcosta <[email protected]>

* Add prefix entry on bucket config example

Signed-off-by: jademcosta <[email protected]>

* Removing redundancies on prefix checks and tests

We already check if the prefix if not empty when creating the bucket.

Signed-off-by: jademcosta <[email protected]>

* Remove redundant YAML unmarshal
Signed-off-by: jademcosta <[email protected]>

* Remove unused parameter
Signed-off-by: jademcosta <[email protected]>

* Remove docs that should be auto-geneated
Signed-off-by: jademcosta <[email protected]>

* refactor: move prefix to config root level

Signed-off-by: Maria Eduarda Duarte <[email protected]>

* add auto generated docs

Signed-off-by: Maria Eduarda Duarte <[email protected]>

* fix changelog

Signed-off-by: Maria Eduarda Duarte <[email protected]>

Co-authored-by: Maria Eduarda Duarte <[email protected]>

* Ruler: Change default evaluation interval to 1m (#5417)

* Change default eval interval to 1m

Signed-off-by: Matej Gera <[email protected]>

* Update CHANGELOG

Signed-off-by: Matej Gera <[email protected]>

* Updates busybox SHA (#5423)

Signed-off-by: GitHub <[email protected]>

Co-authored-by: yeya24 <[email protected]>

* receive: Added Ketamo Consistent hashing (#5408)

* Add support for consistent hashing in receivers

This commit adds support for distributing series in Receivers using
consistent hashing based on the libketama algorithm.

Signed-off-by: Filip Petkovski <[email protected]>

* Use require package for test assertions

Signed-off-by: Filip Petkovski <[email protected]>

* Rename algorithm from consistent to ketama

Signed-off-by: Filip Petkovski <[email protected]>

* S3: Add config option to enforce the minio DNS lookup (#5409)

* Add config option to enforce the minio DNS lookup

Signed-off-by: Jakob Hahn <[email protected]>

* Useenums instead of boolean for bucket_lookup_type

Signed-off-by: Jakob Hahn <[email protected]>

* Expose tsdb status in receiver (#5402)

* Expose tsdb status in receiver

This commit implements the api/v1/status/tsdb API in the Receiver.

Signed-off-by: Filip Petkovski <[email protected]>

* Add docs and todo

Signed-off-by: Filip Petkovski <[email protected]>

* Fix tests

Signed-off-by: Filip Petkovski <[email protected]>

* Receive: option to extract tenant from client certificate (#5153)

* added option to extract tenant from client certificate

Signed-off-by: Magnus Kaiser <[email protected]>

* added suggestions from PR

Signed-off-by: Magnus Kaiser <[email protected]>

* removed else cases

Signed-off-by: Magnus Kaiser <[email protected]>

* corrected location of certificate field check

Signed-off-by: Magnus Kaiser <[email protected]>

* fixed issue with err definition

Signed-off-by: Magnus Kaiser <[email protected]>

* updated docs

Signed-off-by: Magnus Kaiser <[email protected]>

* corrected comment

Signed-off-by: Magnus Kaiser <[email protected]>

Co-authored-by: Magnus Kaiser <[email protected]>

* Improve ketama hashring replication (#5427)

With the Ketama hashring, replication is currently handled by choosing
subsequent nodes in the list of endpoints. This can lead to existing nodes
getting more series when the hashring is scaled.

This commit changes replication to choose subsequent nodes from the hashring
which should not create new series in old nodes when the hashring is scaled.

Signed-off-by: Filip Petkovski <[email protected]>

* Cut readme for 0.27 (#5429)

Signed-off-by: Wiard van Rij <[email protected]>

* Added alert compliance test for Thanos (#5315)

* test: Added Alert compatibilty test.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Tmp.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Update.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* update.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* update.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* e2e: Refactored service helpers for newest e2e version.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Removed alert combatibiltiy test for now.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* e2e: Added test for compatibility.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Added Querier /alerts API.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* e2e:Added replica labels.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Option to remove replica-label.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* skip.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Use stateful ruler and default resend delay

Signed-off-by: Matej Gera <[email protected]>

* Update docs

Signed-off-by: Matej Gera <[email protected]>

Co-authored-by: Matej Gera <[email protected]>

* 0.27-rc0 Update readme and version (#5430)

* Update readme and version

Signed-off-by: Wiard van Rij <[email protected]>

* Fix newlines

Signed-off-by: Wiard van Rij <[email protected]>

* Fixes typo

Signed-off-by: Wiard van Rij <[email protected]>

* fixes noise

Signed-off-by: Wiard van Rij <[email protected]>

* Alert Compliance: Fix wrong ruler configuration (#5433)

* [receive] Export metrics about remote write requests per tenant (#5424)

* Add write metrics to Thanos Receive

Signed-off-by: Douglas Camata <[email protected]>

* Let the middleware count inflight HTTP requests

Signed-off-by: Douglas Camata <[email protected]>

* Update Receive write metrics type & definition

Signed-off-by: Douglas Camata <[email protected]>

* Put option back in its place to avoid big diff

Signed-off-by: Douglas Camata <[email protected]>

* Fetch tenant from headers instead of context

It might not be in the context in some cases.

Signed-off-by: Douglas Camata <[email protected]>

* Delete unnecessary tenant parser middleware

Signed-off-by: Douglas Camata <[email protected]>

* Refactor & reuse code for HTTP instrumentation

Signed-off-by: Douglas Camata <[email protected]>

* Add missing copyright to some files

Signed-off-by: Douglas Camata <[email protected]>

* Add changelog entry for Receive & new HTTP metrics

Signed-off-by: Douglas Camata <[email protected]>

* Remove TODO added by accident

Signed-off-by: Douglas Camata <[email protected]>

* Make error handling code shorter

Co-authored-by: Bartlomiej Plotka <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>

* Make switch statement simpler

Signed-off-by: Douglas Camata <[email protected]>

* Remove method label from timeseries' metrics

Signed-off-by: Douglas Camata <[email protected]>

* Count samples of all series instead of each

Signed-off-by: Douglas Camata <[email protected]>

* Remove in-flight requests metric

Will add this in a follow-up PR to keep this small.

Signed-off-by: Douglas Camata <[email protected]>

* Change timeseries/samples metrics to histograms

The buckets were picked based on the fact that Prometheus' default
remote write configuration
(see https://prometheus.io/docs/practices/remote_write/#memory-usage)
set a max of 500 samples sent per second.

Signed-off-by: Douglas Camata <[email protected]>

* Fix Prometheus registry for histograms

Signed-off-by: Douglas Camata <[email protected]>

* Fix comment in NewHandler functions

There are now four metrics instead of five.

Signed-off-by: Douglas Camata <[email protected]>

Co-authored-by: Bartlomiej Plotka <[email protected]>

* remove unused block-sync-concurrency flag (#5426)

* remove unused block-sync-concurrency flag

Signed-off-by: Ben Ye <[email protected]>

* add changelog

Signed-off-by: Ben Ye <[email protected]>

* update

Signed-off-by: Ben Ye <[email protected]>

* fix e2e test

Signed-off-by: Ben Ye <[email protected]>

* fix tests

Signed-off-by: Ben Ye <[email protected]>

* fix docs typo in metric thanos_compact_halted (#5448)

Signed-off-by: Nikita Matveenko <[email protected]>

* Implement tenant expiration (#5420)

* Implement tenant expiration

This commit adds dynamic TSDB pruning for tenants which have not
received new samples within a certain period of time.

Signed-off-by: Filip Petkovski <[email protected]>

* Add link to receiver tenant-lifecycle-management

Signed-off-by: Filip Petkovski <[email protected]>

* Docs: Remove Katacoda links (#5454)

* Remove Katacoda links

Signed-off-by: Matej Gera <[email protected]>

* Remove one more reference

Signed-off-by: Matej Gera <[email protected]>

* Fixed lint on Go 1.18.3+ (#5459)

Signed-off-by: bwplotka <[email protected]>

* Add HTTP metrics for in-flight requests (#5440)

* Add HTTP metrics for in-flight requests

Signed-off-by: Douglas Camata <[email protected]>

* Fix changelog entry after PR creation

Signed-off-by: Douglas Camata <[email protected]>

* Fix link in old CHANGELOG entry

Signed-off-by: Douglas Camata <[email protected]>

* Fix style in the CHANGELOG

All the entries should end up with a period.

Signed-off-by: Douglas Camata <[email protected]>

* Improve help for in-flight htttp requests metric

Signed-off-by: Douglas Camata <[email protected]>

* Move changelog entry pending PR

Signed-off-by: Douglas Camata <[email protected]>

* Add a method label to the in-flight HTTP requests

Signed-off-by: Douglas Camata <[email protected]>

* docs: Fix heading level of "Excluded from caching" (#5455)

* Refactor DefaultTransport() from objstore to package exthttp (#5447)

* Refactoring the DefaultTransport func in package exthttp

Signed-off-by: Srushti Sapkale <[email protected]>

* Refactoring the DefaultTransport func from s3 in package exthttp

Signed-off-by: Srushti Sapkale <[email protected]>

* Updated helpers.go

corrected argument for DefaultTransport() in helpers.go

Signed-off-by: Srushti (sroo-sh-tee) <[email protected]>

* Changed the argument type in getContainerURL

Signed-off-by: Srushti Sapkale <[email protected]>

* Update pkg/exthttp/transport.go

Co-authored-by: Bartlomiej Plotka <[email protected]>

Signed-off-by: Srushti (sroo-sh-tee) <[email protected]>

* Update pkg/exthttp/transport.go

Co-authored-by: Bartlomiej Plotka <[email protected]>

Signed-off-by: Srushti (sroo-sh-tee) <[email protected]>

* Removed the use of NewTransport() in cos.go

Signed-off-by: Srushti Sapkale <[email protected]>

* Moved TLSConfig struct and functions that need it from objstore to exthttp

Signed-off-by: Srushti Sapkale <[email protected]>

* Changed s3.go

Signed-off-by: Srushti Sapkale <[email protected]>

* Kept s3.go and helpers.go unchanged to not break the cortex deps

Signed-off-by: Srushti Sapkale <[email protected]>

* Consistency changed made while pair++ programming.

Signed-off-by: bwplotka <[email protected]>

* Created a new tlsconfig in exthttp and minor changes in cos.go

Signed-off-by: Srushti Sapkale <[email protected]>

* Commented in s3.go

Signed-off-by: Srushti Sapkale <[email protected]>

* Minor changes in transport.go

Signed-off-by: Srushti Sapkale <[email protected]>

* Changed transport.go

Signed-off-by: Srushti Sapkale <[email protected]>

* Changed transport.go and tlsconfig.go

Signed-off-by: Srushti Sapkale <[email protected]>

* Removed changes from prometheus.mod and prometheus.sum

Signed-off-by: Srushti Sapkale <[email protected]>

* Minor updation in cos.go

Signed-off-by: Srushti Sapkale <[email protected]>

Co-authored-by: bwplotka <[email protected]>

* receive: Fix race condition when pruning tenants (#5460)

Pruning Receiver tenants has a race condition caused by concurrently
removing items from the tenants map.

This commit addresses the issue by using a mutex to guard the tenants map.

Signed-off-by: fpetkovski <[email protected]>

* Adding SCMP as an adopter (#5466)

Signed-off-by: Chris Ng <[email protected]>

* Updated busybox version. (#5471)

Signed-off-by: bwplotka <[email protected]>

* Refactor endpoint ref clients

Signed-off-by: Matej Gera <[email protected]>

* Fix E2E test env name clash

Signed-off-by: Matej Gera <[email protected]>

* Build with Go 1.18 (#5258)

* Build with Go 1.18

Signed-off-by: Sylvain Rabot <[email protected]>

* Try something

Signed-off-by: Sylvain Rabot <[email protected]>

* Upgrade minio

Signed-off-by: Sylvain Rabot <[email protected]>

* Replace json-iterator/reflect2 in bingo

Signed-off-by: Sylvain Rabot <[email protected]>

* Ignore 405 errors for prometheus buildVersion API requests (#5477)

Older versions of prometheus (such as 2.7 which is shipped by Debian
buster) return a 405 error for non-existent API endpoints instead of the
404 returned by more recent versions.

Signed-off-by: Nicolas Dandrimont <[email protected]>

* *: Cut 0.27.0 (#5473)

* Cut 0.27.0

Signed-off-by: Matej Gera <[email protected]>

* Updated busybox version. (#5471)

Signed-off-by: bwplotka <[email protected]>
Signed-off-by: Matej Gera <[email protected]>

* Docs: Remove Katacoda links (#5454)

* Remove Katacoda links

Signed-off-by: Matej Gera <[email protected]>

* Remove one more reference

Signed-off-by: Matej Gera <[email protected]>

Co-authored-by: Bartlomiej Plotka <[email protected]>
Signed-off-by: Matej Gera <[email protected]>

* Update compact.md (#5465)

* During 1h downsampling skip XOR chunks that may erroneously be present in 5m resolution blocks (#5453)

* Add fpetkovski to triage list

Signed-off-by: Filip Petkovski <[email protected]>

* Use Azure BlobURL.Download instead of in-memory buffer (#5451)

Modify the azure.Bucket get methods to use BlobURL.Download for fetching
blobs and blob ranges. This avoids the need to allocate a buffer for storing
the entire expected size of the object in memory. Instead, use a ReaderCloser
view of the body returned by the download method.

See grafana/mimir#2229

Signed-off-by: Nick Pillitteri <[email protected]>

* Update storage.md (#5486)

* [receive] Add per-tenant charts to Receive's example dashboard  (#5472)

* Start to add tenant charts to Receive

Signed-off-by: Douglas Camata <[email protected]>

* Properly filter HTTP status codes

Signed-off-by: Douglas Camata <[email protected]>

* Fix tenant error rate chart

Signed-off-by: Douglas Camata <[email protected]>

* Refactor to improve readability and consistency

Signed-off-by: Douglas Camata <[email protected]>

* Refactor one more usage of code and tenant labels

Signed-off-by: Douglas Camata <[email protected]>

* Filter tenant metrics to the Receive handler

Signed-off-by: Douglas Camata <[email protected]>

* Format math expression properly

Signed-off-by: Douglas Camata <[email protected]>

* Update CHANGELOG

Signed-off-by: Douglas Camata <[email protected]>

* Add samples charts to series & samples row

Signed-off-by: Douglas Camata <[email protected]>

* Bump Go version in all the GH Actions (#5487)

* Bump go version in go mod

This is a follow up to #5258, which made the project be built with Go 1.18.

Signed-off-by: Douglas Camata <[email protected]>

* Update Go version in all GH Actions

Signed-off-by: Douglas Camata <[email protected]>

* Run go mod tidy

Signed-off-by: Douglas Camata <[email protected]>

* Added changelog entry

Signed-off-by: Douglas Camata <[email protected]>

* Put back Go 1.17 in go.mod

Because we don't use any Go 1.18 feature yet, so it's not needed

Signed-off-by: Douglas Camata <[email protected]>

* Update go.sum after changing go.mod to go 1.17

Signed-off-by: Douglas Camata <[email protected]>

* Remove non-user-impacting entry for changelog

Signed-off-by: Douglas Camata <[email protected]>

* objstore: Download and Upload block files in parallel (#5475)

* Parallel Chunks

Signed-off-by: Alan Protasio <[email protected]>
Signed-off-by: alanprot <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>

* test

Signed-off-by: Alan Protasio <[email protected]>
Signed-off-by: alanprot <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>

* Changelog

Signed-off-by: Alan Protasio <[email protected]>
Signed-off-by: alanprot <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>

* making ApplyDownloadOptions private

Signed-off-by: Alan Protasio <[email protected]>
Signed-off-by: alanprot <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>

* upload concurrency

Signed-off-by: alanprot <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>
Signed-off-by: alanprot <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>

* Upload Test

Signed-off-by: Alan Protasio <[email protected]>
Signed-off-by: alanprot <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>

* update change log

Signed-off-by: Alan Protasio <[email protected]>
Signed-off-by: alanprot <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>

* Change comments

Signed-off-by: Alan Protasio <[email protected]>
Signed-off-by: alanprot <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>

* Address comments

Signed-off-by: Alan Protasio <[email protected]>
Signed-off-by: alanprot <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>

* Remove duplicate entries on changelog

Signed-off-by: Alan Protasio <[email protected]>
Signed-off-by: alanprot <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>

* Addressing Comments

Signed-off-by: alanprot <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>

* update golang.org/x/sync

Signed-off-by: alanprot <[email protected]>
Signed-off-by: Alan Protasio <[email protected]>

* Adding Commentts

Signed-off-by: Alan Protasio <[email protected]>

* Use default HTTP config for E2E S3 tests (#5483)

Signed-off-by: Matej Gera <[email protected]>

* chore: Included githubactions in the dependabot config (#5364)

This should help with keeping the GitHub actions updated on new releases. This will also help with keeping it secure.

Dependabot helps in keeping the supply chain secure https://docs.github.com/en/code-security/dependabot

GitHub actions up to date https://docs.github.com/en/code-security/dependabot/working-with-dependabot/keeping-your-actions-up-to-date-with-dependabot

https://github.com/ossf/scorecard/blob/main/docs/checks.md#dependency-update-tool
Signed-off-by: naveensrinivasan <[email protected]>

* bump codemirror and promql editor to the last version (#5491)

Signed-off-by: Augustin Husson <[email protected]>

* receiver: Expose stats for all tenants (#5470)

* receiver: Expose stats for all tenants

Thanos Receiver supports the Prometheus tsdb status API and can expose
TSDB stats for a single tenant.

This commit extends that functionality and allows users to request
TSDB stats for all tenants using the all_tenants=true query parameter.

Signed-off-by: Filip Petkovski <[email protected]>

* Add back chunk count

Signed-off-by: Filip Petkovski <[email protected]>

* Simplify TSDBStats interface

Signed-off-by: Filip Petkovski <[email protected]>

* Return empty result for no stats

Signed-off-by: Filip Petkovski <[email protected]>

* CHANGELOG.md: regenerate (#5495)

* receive: Fix stats nil pointer panic (#5494)

When fetching TSDB stats from receivers, certain TSDBs might not be
initialized yet. This can lead to a nil pointer access when the
status endpoint is accessed before all TSDBs are initialized.

This commit adds an explicit check for each tenant's TSDB when
exporting TSDB stats.

Signed-off-by: Filip Petkovski <[email protected]>

* Update query.md (#5496)

Fix typo of parameter --store.sd-files

Signed-off-by: Firxiao <[email protected]>

* Parallel download blocks - Follow up of #5475 (#5493)

* Download blocks in parallel

Signed-off-by: Alan Protasio <[email protected]>

* remove the go func

Signed-off-by: Alan Protasio <[email protected]>

* Doc

Signed-off-by: Alan Protasio <[email protected]>

* CHANGELOG

Signed-off-by: Alan Protasio <[email protected]>

* doc

Signed-off-by: alanprot <[email protected]>

* AddressComments

Signed-off-by: alanprot <[email protected]>

* fix typo

Signed-off-by: Alan Protasio <[email protected]>

* Upgrade mdox with cache and some http settings to reduce CI failures (#5500)

* Pin mdox to latest master commit

It suppors now a cache for link validation and some HTTP
configuration that can be used to help avoid intermittent
CI failures.

Signed-off-by: Douglas Camata <[email protected]>

* Add mdox cache and HTTP configuration

The cache has a default TTL (5 days)

A timeout of 1m and 10 connections per host at transport
level should help us reduce the intermittent failures if
we have to invalidate the cache.

Signed-off-by: Douglas Camata <[email protected]>

* Add Github Action cache for the mdox cache

Using the hash of the md files as cache key.

Signed-off-by: Douglas Camata <[email protected]>

* Upgrade cache actions to v3 and add restore key

Signed-off-by: Douglas Camata <[email protected]>

* Empty commit to test CI build cache

Signed-off-by: GitHub <[email protected]>

* Use 2.5 days as jitter for mdox cache

Signed-off-by: Douglas Camata <[email protected]>

* Fix bad editor auto-formating again

Signed-off-by: Douglas Camata <[email protected]>

* Updated minio-go to latest; removed fork. (#5474)

* Updated minio-go fork to latest.

NOTE: Optimization is propopsed to upstream to avoid fork in future.

Relates to https://github.com/thanos-io/thanos/issues/5101 and https://github.com/thanos-io/thanos/issues/5130

Signed-off-by: bwplotka <[email protected]>

# Conflicts:
#	go.mod
#	go.sum

* Removed fork.

Signed-off-by: bwplotka <[email protected]>

* Added comment.

Signed-off-by: bwplotka <[email protected]>

* Receiver: Handle storage exemplar multi-error (#5502)

* Handle exemplar store errors as conflict

Signed-off-by: Matej Gera <[email protected]>

* Adjust tests

Signed-off-by: Matej Gera <[email protected]>

* Update CHANGELOG

Signed-off-by: Matej Gera <[email protected]>

* Fixing Race condition Introduced by #5493  (#5503)

* Update busybox image versions (#5506)

Signed-off-by: Kemal Akkoyun <[email protected]>

* Updates busybox SHA (#5507)

Signed-off-by: GitHub <[email protected]>

Co-authored-by: yeya24 <[email protected]>

* chore: Update Prometheus dependency (#5484)

* chore: Update Prometheus dependency

Update Prometheus from v2.33.5 to v2.36.2.

Signed-off-by: SuperQ <[email protected]>

* Update query tests for cortex changes.

Signed-off-by: SuperQ <[email protected]>

* Use the default rules.RuleGroupPostProcessFunc.

Signed-off-by: SuperQ <[email protected]>

* Update QueryStats use.

Signed-off-by: SuperQ <[email protected]>

* Update Cortex.

Signed-off-by: SuperQ <[email protected]>

* Update queryfrontend for Cortex changes.

Signed-off-by: SuperQ <[email protected]>

* Bump pprof.

Signed-off-by: SuperQ <[email protected]>

* Add changelog entry.

Signed-off-by: SuperQ <[email protected]>

* Adapt to changed query stats API

Signed-off-by: Kemal Akkoyun <[email protected]>

* Sync dependencies

Signed-off-by: Kemal Akkoyun <[email protected]>

* Reflect changed metric names

Signed-off-by: Kemal Akkoyun <[email protected]>

Co-authored-by: Kemal Akkoyun <[email protected]>
Co-authored-by: Kemal Akkoyun <[email protected]>

* chore: Vendor Cortex dependency as an internal package (#5504)

* Vendor Cortex dependency as an internal package

Signed-off-by: Kemal Akkoyun <[email protected]>

* Add gitattributes

Signed-off-by: Kemal Akkoyun <[email protected]>

* Skip checks for vendored directory

Signed-off-by: Kemal Akkoyun <[email protected]>

* Add copyright headers for Cortex

Signed-off-by: Kemal Akkoyun <[email protected]>

* *: Move objstore out of repo (#5510)

* *: Move objstore out of repo

Signed-off-by: Kemal Akkoyun <[email protected]>

* Fix doc checks

Signed-off-by: Kemal Akkoyun <[email protected]>

* chore: Update Prometheus to v2.37.0 (#5511)

* chore: Update Prometheus to v2.37.0

Update Prometheus to the latest release. Note that Prometheus
upstream now tags v0.x.y to map to the 2.x.y releases.

Signed-off-by: SuperQ <[email protected]>

* Cleanup direct/indirect go.mod requirements.

Signed-off-by: SuperQ <[email protected]>

* chore: Update Go modules (#5516)

* Update weaveworks/common to remove node_exporter indirect dep.
* Update simonpasquier/klog-gokit/v2.
* Update google.golang.org/grpc lock to v1.45.0.
* Cleanup replacements that are now handled by indirect requirements.
* Fixup grpc.WithInsecure() use.

Signed-off-by: SuperQ <[email protected]>

* chore: Update Go modules (#5518)

* Reuse upstream TSDB status structs (#5526)

This commit replaces the copied TSDB status structs with direct
references from prometheus/prometheus.

Signed-off-by: Filip Petkovski <[email protected]>

* Fix proposal on website (#5530)

Signed-off-by: Saswata Mukherjee <[email protected]>

* Update all bingo dependencies (#5525)

This commit updates all bingo dependencies to their latest versions.

It pins golang.org/x/sys to v0.0.0-20220715151400-c0bba94af5f8 for
the github.com/google/go-jsonnet dependency in order to prevent
failures when running make docs on Mac OS.

Signed-off-by: Filip Petkovski <[email protected]>

* delete_katacoda (#5529)

Signed-off-by: Akshit42-hue <[email protected]>

* Remove empty RuleGroups in api/v1/rules when using matchers (#5537)

* Remove empty RuleGroups in api/v1/rules

Signed-off-by: Saswata Mukherjee <[email protected]>

* Implement suggestion

Signed-off-by: Saswata Mukherjee <[email protected]>

* Rename variables

Signed-off-by: Saswata Mukherjee <[email protected]>

* fix(api): When querying api query on endpoint alerts return a json struct with alerts in lowercase. (#5534)

To be same result as prometheus api
Signed-off-by: Guillaume audic <[email protected]>

* Receiver: Add benchmark for receive writer (#5533)

* Add benchmark for receive writer

Signed-off-by: Matej Gera <[email protected]>

* Incorporate feedback

- Clearer parameter naming; use a separate temp dir for bench

Signed-off-by: Matej Gera <[email protected]>

* Submit a proposal for Active Series Limiting for Hashring Topology (#5415)

* Add proposal for Active Series Limiting for Hashring Topology

Signed-off-by: Saswata Mukherjee <[email protected]>

* Resize images

Signed-off-by: Saswata Mukherjee <[email protected]>

* Add Observatorium as an alternative

Signed-off-by: Saswata Mukherjee <[email protected]>

* Implement suggestions; add TODO

Signed-off-by: Saswata Mukherjee <[email protected]>

* Update proposal

Signed-off-by: Saswata Mukherjee <[email protected]>

* Implement suggestions: add sections numbers

Signed-off-by: Saswata Mukherjee <[email protected]>

* Refactor EndpointSet (#5538)

* Refactor EndpointSet

This commit refactors the EndpointSet struct in order to make it easier
to understand and work with.

Signed-off-by: Filip Petkovski <[email protected]>

* Handle context cancellation in endpoint mock

Signed-off-by: Filip Petkovski <[email protected]>

* Make additions and removals of refs atomic.

Signed-off-by: Filip Petkovski <[email protected]>

* Fix changed-docs grep regex (#5556)

Signed-off-by: Saswata Mukherjee <[email protected]>

* Added Vertical Query Sharding to Query-Frontend (#5342)

* Update faillint to v1.10.0

Signed-off-by: Filip Petkovski <[email protected]>

* Implement query sharding

This commit implements query sharding for grouping PromQL expressions.

Sharding is initiated by analyzing the PromQL and extracting
grouping labels. Extracted labels are propagated down to Stores which
partition the response by hashmoding all series on those labels.

If a query is shardable, the partitioning and merging process will be
initiated by the Query Frontend. The Query Frontend will make N distinct
queries across a set of Queriers and merge the results back before
presenting them to the user.

Signed-off-by: Filip Petkovski <[email protected]>

* First code review pass

Signed-off-by: Filip Petkovski <[email protected]>

* Use sync pool to reuse sharding buffers

Signed-off-by: Filip Petkovski <[email protected]>

* Add test for binary expression with constant

Signed-off-by: Filip Petkovski <[email protected]>

* Include external labels in series sharding

Signed-off-by: Filip Petkovski <[email protected]>

* Rule: Fix e2e test flake (#5558)

* Rule: Fix e2e test flake

Signed-off-by: Saswata Mukherjee <[email protected]>

* Fix lint

Signed-off-by: Saswata Mukherjee <[email protected]>

* Check errors

Signed-off-by: Saswata Mukherjee <[email protected]>

* Change to github.com/thanos-io/thanos/pkg/errors

Signed-off-by: Saswata Mukherjee <[email protected]>

* Implement suggestion

Signed-off-by: Saswata Mukherjee <[email protected]>

* Fix multi-tenant exemplar matchers (#5554)

* Fix multi-tenant exemplar matchers

The exemplar proxy synthesizes a query based on PromQL expression matchers
and individual store's label sets. When a store has multiple label sets
with same label names but different values (e.g. multitenant Receivers),
each exemplar matcher will be repeated once for each label set. Because of this,
a receiver hosting 200 tenants can get the same exemplar matcher 200 times. This leads
to the underlying stores slowing down and timing out when asked for exemplars.

This commit modifies the exemplar proxy to deduplicate matchers and only send
a matcher once to an underlying store.

Signed-off-by: Filip Petkovski <[email protected]>

* Address CR comments

Signed-off-by: Filip Petkovski <[email protected]>

* Receive: add per request limits for remote write (#5527)

* Add per request limits for remote write

Signed-off-by: Douglas Camata <[email protected]>

* Remove useless TODO item

Signed-off-by: Douglas Camata <[email protected]>

* Refactor write request limits test

Signed-off-by: Douglas Camata <[email protected]>

* Add write concurrency limit to Receive

Signed-off-by: Douglas Camata <[email protected]>

* Change write limits config option name

Signed-off-by: Douglas Camata <[email protected]>

* Document remote write concurrenty limit

Signed-off-by: Douglas Camata <[email protected]>

* Add changelog entry

Signed-off-by: Douglas Camata <[email protected]>

* Format docs

Signed-off-by: Douglas Camata <[email protected]>

* Extract request limiting logic from handler

Signed-off-by: Douglas Camata <[email protected]>

* Add copyright header

Signed-off-by: Douglas Camata <[email protected]>

* Add a TODO for per-tenant limits

Signed-off-by: Douglas Camata <[email protected]>

* Add default value and hide the request limit flags

Signed-off-by: Douglas Camata <[email protected]>

* Improve TODO comment in request limits

Signed-off-by: Douglas Camata <[email protected]>

* Update Receive docs after flags wre made hidden

Signed-off-by: Douglas Camata <[email protected]>

* Add note about WIP in Receive request limits doc

Signed-off-by: Douglas Camata <[email protected]>

* Fix typo in Receive docs

Co-authored-by: Filip Petkovski <[email protected]>

Signed-off-by: Douglas Camata <[email protected]>

* Fix help text for concurrent request limit

Signed-off-by: Douglas Camata <[email protected]>

* Use byte unit helpers for improved readability

Signed-off-by: Douglas Camata <[email protected]>

* Removed check for nil writeGate

The constructor sets the writeGate to a noopGate.

Signed-off-by: Douglas Camata <[email protected]>

* Better organize linebreaks

Signed-off-by: Douglas Camata <[email protected]>

* Fix help text for limits hit metric

Signed-off-by: Douglas Camata <[email protected]>

* Apply some english feedback

Signed-off-by: Douglas Camata <[email protected]>

* Improve limits & gates documentationb

Signed-off-by: Douglas Camata <[email protected]>

* Fix import clause

Signed-off-by: Douglas Camata <[email protected]>

* Use a 3 node hashring for write limits test

This should ensure the request fanout logic cannot somehow interfere
with the request limit logic.

Signed-off-by: Douglas Camata <[email protected]>

* Fix comment

Co-authored-by: Bartlomiej Plotka <[email protected]>

Signed-off-by: Douglas Camata <[email protected]>

* Announce sharding in ruler and store proxy (#5560)

The ruler and store proxy currently support series sharding
through the components that they use. However, this capability is not
announced to the querier.

This commit modifies their Info calls to indicate to the querier
that it doesn't need to shard the response it receives from rulers
and other store proxies.

Signed-off-by: Filip Petkovski <[email protected]>

* Fix flaky e2e tests (#5563)

* Tools: Fix e2e test flake

Signed-off-by: Saswata Mukherjee <[email protected]>

* Metadata: Fix flaky e2e test

Signed-off-by: Saswata Mukherjee <[email protected]>

* Compact: Fix flaky e2e test

Signed-off-by: Saswata Mukherjee <[email protected]>

* Bumping actions/cache to v3 for e2e tests

Signed-off-by: Saswata Mukherjee <[email protected]>

* Add missing e2e.WaitMissingMetrics

Signed-off-by: Saswata Mukherjee <[email protected]>

* Meta-monitoring based active series limiting (#5520)

* Add initial PoC for meta-monitoring Receive active series limits

Signed-off-by: Saswata Mukherjee <[email protected]>

* Add e2e tests, rebase

Signed-off-by: Saswata Mukherjee <[email protected]>

* Add multitenant test + remake diagrams

Signed-off-by: Saswata Mukherjee <[email protected]>

* Implement suggestions; Make naming consistent; Rm/Add metrics

Signed-off-by: Saswata Mukherjee <[email protected]>

* Reuse meta-monitoring client

Signed-off-by: Saswata Mukherjee <[email protected]>

* Fix panic

Signed-off-by: Saswata Mukherjee <[email protected]>

* Cache meta-monitoring query result

Signed-off-by: Saswata Mukherjee <[email protected]>

* Fix lint

Signed-off-by: Saswata Mukherjee <[email protected]>

* Fail fast when limiting

Signed-off-by: Saswata Mukherjee <[email protected]>

* Implement suggestions: docs + mutex + struct

Signed-off-by: Saswata Mukherjee <[email protected]>

* Add interface and no-op

Signed-off-by: Saswata Mukherjee <[email protected]>

* Add changelog entry

Signed-off-by: Saswata Mukherjee <[email protected]>

* Add seriesLimitSupported to handler

Signed-off-by: Saswata Mukherjee <[email protected]>

* Remove tools fork

Signed-off-by: Saswata Mukherjee <[email protected]>

* Change docs header

Signed-off-by: Saswata Mukherjee <[email protected]>

* Remove usage of ioutil (#5564)

Signed-off-by: Saswata Mukherjee <[email protected]>

* docs/contribution.md: Update required Go version  (#5557)

* delete_katacoda

Signed-off-by: Akshit42-hue <[email protected]>

* updated go version

Signed-off-by: Akshit42-hue <[email protected]>

* update golang version

Signed-off-by: Akshit42-hue <[email protected]>

* updated

Signed-off-by: Akshit42-hue <[email protected]>

* Retrigger CI

Signed-off-by: Akshit42-hue <[email protected]>

* Retrigger CI

Signed-off-by: Akshit42-hue <[email protected]>

* fix an expression param in a link to an alert in the rules page (#5562)

Signed-off-by: Rostislav Benes <[email protected]>

Co-authored-by: Rostislav Benes <[email protected]>

* Receiver: Validate labels in write requests (#5508)

* Add label set validation method

Signed-off-by: Matej Gera <matejgera@g…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants