-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HA handling for store nodes #199
Comments
I think we might need that sooner than later... (: How can we do it easily? Basically we need to tell |
The most basic way would just be the option to add for example --bucketid="xxx" to the storage command. |
For active/passive this could be done using a leader latch protocol and sharing the data downloaded by the leader as it could announce any new downloaded bucket via gossip (for a faster failover) and share it via HTTP/gRPC. This would eliminate the need to fetch the data from an object store directly and allow for the query nodes to have only a single source of truth (the current leader) |
I'd like to volunteer to take this on. For our use case, downtime caused by the store instance fronting an S3 bucket being rescheduled to another machine is not really palatable. I'm thinking of an active-active solution, since it avoids some of the complexities around deciding which instance is 'active' and would be more efficient with resources. As store nodes are essentially just caches, I think it should reasonable straightforward to achieve. While thinking about high availability, we should also consider allowing the store nodes to scale horizontally for very large deployments, effectively allowing horizontal scaling the LRU cache of indices. I propose:
Just an idea: If we have multiple shards, we might simplify the store instances by avoiding persisting the cache to disk, since the amount of data to pull from object storage would be reduced by |
@mattbostock Thanks! It all works for one assumption: Thanos setup has only bucket to take data from, are we ok with it? I have seen some use cases for multiple buckets connected to same Thanos "cluster/network/setup", because "it is easier to manage", "my object storage is specific" etc. Maybe that's separate issue, but woth to be aware of this while implementing HA.
Makes sense, just I would love to hear/see more about the implementation details. As you suggested offline: https://godoc.org/github.com/golang/groupcache sound nice but it means that we are talking about sharding fully on stores (you ask whatever store and it gives you correct answer 100% time even if it needs to ask its peers) or maybe we want thanos-query to be aware of store sharding? Also are we are talking about sharding index cache based on... what? On matchers 0.o?
Totally agree and thanks for example 👍 However, I would start from something simple first - just replicating (so true HA), because that is what you need (from you what you say). This will enable horizontal scaling (will offload single store) and potentially improve performance as well. Just sharding will ONLY improve the availability (but will still have some major disruption time), regarding the performance it is hard to say without #346 (which is in progress). |
Added a proposal for high-availability for store instances here: #404 |
This can be solved by just by running multiple of Store Gateways behind any Loadbalancer (like Kuberentes Service) and without gossip. |
…hanos-io#199) * Replace summary in extprom metrics with histogram (thanos-io#6327) * Replaced summary in extprom metrics with histogram Signed-off-by: Sebastian Rabenhorst <[email protected]> * Added changelog Signed-off-by: Sebastian Rabenhorst <[email protected]> * Removed unused parameters from NewInstrumentationMiddleware Signed-off-by: Sebastian Rabenhorst <[email protected]> * Reverted NewInstrumentationMiddleware Signed-off-by: Sebastian Rabenhorst <[email protected]> --------- Signed-off-by: Sebastian Rabenhorst <[email protected]> * Avoid expensive log.Valuer evaluation for disallowed levels (thanos-io#6322) Signed-off-by: Xiaochao Dong (@damnever) <[email protected]> * Fix inconsistent error for series limits in Store API (thanos-io#6330) * store: fix inconsistent error for series limits Signed-off-by: Thibault Mange <[email protected]> * update changelog Signed-off-by: Thibault Mange <[email protected]> * Update pkg/store/bucket.go Co-authored-by: Saswata Mukherjee <[email protected]> Signed-off-by: Thibault Mange <[email protected]> * Update pkg/store/bucket.go Co-authored-by: Saswata Mukherjee <[email protected]> Signed-off-by: Thibault Mange <[email protected]> * rename labelValues serires liimiter test function Signed-off-by: Thibault Mange <[email protected]> --------- Signed-off-by: Thibault Mange <[email protected]> Co-authored-by: Saswata Mukherjee <[email protected]> * *: remove unmaintained gzip library (thanos-io#6332) Switch from nytimes gzip library to the klaustpost's gzip code. The old gzip HTTP handler shows up a lot in allocs so that's how I ended up doing this change. Signed-off-by: Giedrius Statkevičius <[email protected]> * Traces sampler env var (thanos-io#6306) * Issue#5947 OTEL_TRACES_SAMPLER env var Signed-off-by: shayyxi <[email protected]> * Test correction Signed-off-by: shayyxi <[email protected]> * doc failure correction. parse float argument correction. Signed-off-by: shayyxi <[email protected]> * added the changelog. Signed-off-by: shayyxi <[email protected]> * ran make docs to fix the build failure. Signed-off-by: shayyxi <[email protected]> * corrected the incorrect change in tools.md Signed-off-by: shayyxi <[email protected]> * fixed review comments. Signed-off-by: shayyxi <[email protected]> --------- Signed-off-by: shayyxi <[email protected]> Signed-off-by: Shazi <[email protected]> Co-authored-by: shayyxi <[email protected]> * query: use storepb.SeriesServer (thanos-io#6334) Use storepb.SeriesServer instead of the concrete struct. This allows implementing functionality on top of the proxy. Signed-off-by: Giedrius Statkevičius <[email protected]> * cacheutil: upgrade `rueidis` to v1.0.2 to improve error handling while shrinking a redis cluster. redis/rueidis#209 (thanos-io#6342) * use github.com/onsi/gomega/gleak to detect goroutine leak with timeout Signed-off-by: Rueian <[email protected]> * fix: spelling errors DoInSpanWtihErr to DoInSpanWithErr (thanos-io#6345) Signed-off-by: aimuz <[email protected]> * Return grpc code resource exhausted for byte limit error (thanos-io#6325) * return grpc code resource exhausted for byte limit error Signed-off-by: Ben Ye <[email protected]> * fix lint Signed-off-by: Ben Ye <[email protected]> * update partial response strategy Signed-off-by: Ben Ye <[email protected]> * fix limit Signed-off-by: Ben Ye <[email protected]> * try to fix tests Signed-off-by: Ben Ye <[email protected]> * fix test error message Signed-off-by: Ben Ye <[email protected]> * fix test Signed-off-by: Ben Ye <[email protected]> --------- Signed-off-by: Ben Ye <[email protected]> * Expose info for each TSDB This commit exposes the label set alongside the min and max time for each TSDB covered by a Store. This information is used to scope the min time for a remote query so that we do not produce partial aggregates in distriuted mode. Signed-off-by: Filip Petkovski <[email protected]> * Add test case for proxy store Signed-off-by: Filip Petkovski <[email protected]> * Bump promql-engine to fix thanos-io/promql-engine#239 (thanos-io#6349) Signed-off-by: Alban HURTAUD <[email protected]> * Updates busybox SHA (thanos-io#6365) Signed-off-by: GitHub <[email protected]> Co-authored-by: fpetkovski <[email protected]> * Query: Add +Inf bucket to query duration metrics (thanos-io#6358) * Query: Add +Inf bucket to query duration metrics For the query duration metrics (`thanos_store_api_query_duration_seconds`), we record query respond latency, based on the size of the query (samples/series), and save to a histogram. However, when a query is made which exceeds the biggest sample/serie size, we would prior to this commit, put the request into the largest bucket. With this commit, we instead create an `+Inf` bucket, and put requests which are larger than the biggest defined bucket into that. This gives more accurate results, and also allow one to see if the bucket sizes are incorrectly sized. Signed-off-by: Jacob Baungard Hansen <[email protected]> * Tests: Mutex around non-thread safe random source When creating test blocks, we use a non-thread safe random source, in multiple goroutines. Due to this, tests would sometime panic. This commits puts a mutex around calls using the same source, in order to avoid this. This should hopefully improve reliability of e2e tests. Signed-off-by: Jacob Baungard Hansen <[email protected]> --------- Signed-off-by: Jacob Baungard Hansen <[email protected]> * e2e(query): Reproduce dedup issue from thanos-io#6257 Signed-off-by: Douglas Camata <[email protected]> * Add dedup e2e test for Receive With internal and external labels support. Signed-off-by: Douglas Camata <[email protected]> * Simplify generated blocks for query test Signed-off-by: Douglas Camata <[email protected]> * Improve query dedup test Signed-off-by: Douglas Camata <[email protected]> * Write a query test for dedup with sidecar Signed-off-by: Douglas Camata <[email protected]> * Refactor query dedup test with sidecar Signed-off-by: Douglas Camata <[email protected]> * Fix Receive query test Now it properly ensures the double dedup works (on internal and external labels). Signed-off-by: Douglas Camata <[email protected]> * Fix receive drawing Signed-off-by: Douglas Camata <[email protected]> * Add one extra test caes for query dedup from store Signed-off-by: Douglas Camata <[email protected]> * Complement test for Receive query with dedup Signed-off-by: Douglas Camata <[email protected]> * Complement test for Sidecar query dedup Signed-off-by: Douglas Camata <[email protected]> * Expected failure of block label query dedup tests Signed-off-by: Douglas Camata <[email protected]> * Rerun CI Signed-off-by: Douglas Camata <[email protected]> * Rerun CI Signed-off-by: Douglas Camata <[email protected]> * Check context when expanding postings (thanos-io#6363) * check context when expanding postings Signed-off-by: Ben Ye <[email protected]> * update changelog Signed-off-by: Ben Ye <[email protected]> --------- Signed-off-by: Ben Ye <[email protected]> * ui: only keep name in store_matches param (thanos-io#6371) We are doing store matching on the `name` field hence only keep that field in the URL because otherwise the URL could get quite lengthy with external labelsets inside of it. Besides unit tests, I have also tested locally: - Enable store filtering; - Select store(-s); - Copy/paste URL into the new tab and see that the same stores are loaded like expected; - See that URL only has names in them. Signed-off-by: Giedrius Statkevičius <[email protected]> * docs: replace --store with --endpoint Replace deprecated `--store` with `--endpoint` in docs. Signed-off-by: Paul Gier <[email protected]> * Optimizing "grafana generated" regex matchers (thanos-io#6376) * Opmizing Group Regex Signed-off-by: Alan Protasio <[email protected]> * fixing native histogram tests Signed-off-by: Alan Protasio <[email protected]> --------- Signed-off-by: Alan Protasio <[email protected]> * Cache: various index cache client improvements (thanos-io#6374) * Query Explanation (thanos-io#6346) * Return Query Explaination in QueryAPI A param `explain` is added to QueryAPI, if true then explanation returned by the `Explain()` method of the query having structure `ExplainOutputNode` is returned in response. Query Explanation is added under new field in response that is `thanosInfo`. Signed-off-by: Pradyumna Krishna <[email protected]> * Add explain checkbox in thanos UI A explain checkbox is added to Thanos Query UI, that requests for query explanation from thanos query api. Signed-off-by: Pradyumna Krishna <[email protected]> * Add ExpandableNode Component ExpandableNode component renders Query Explanation in the thanos UI. Requires a new package `react-accessible-treeview`. Signed-off-by: Pradyumna Krishna <[email protected]> * Disable Explain checkbox on prometheus engine Prometheus engine sends out error if toggle explain button. To provide better experience, the explain checkbox get disbaled on switching to prometheus engine and enable back on switching to thanos engine. Signed-off-by: Pradyumna Krishna <[email protected]> * Add alert box with horizontal scrolling for Explanation Signed-off-by: Pradyumna Krishna <[email protected]> * Remove ExpandableNode and Add ListTree Updates the design for query explanation box, removes `ExpandableNode` and the dependency. Builts a new `ListTree` that does the same using reactstrap and custom css. Signed-off-by: Pradyumna Krishna <[email protected]> * Minor refactor in Query API response `thanosInfo` is removed from Query reponse and used `explanation` directly. `disableCheckbox` is also renamed to `disableExplainCheckbox` in thanos UI. Signed-off-by: Pradyumna Krishna <[email protected]> * Update UI tests to passing Signed-off-by: Pradyumna Krishna <[email protected]> * Minor UI changes and test fix UI improvements and Panel test fix other way around, resetting the results on panel construction. Signed-off-by: Pradyumna Krishna <[email protected]> * Update promql-engine to use Explain method Signed-off-by: Pradyumna Krishna <[email protected]> * Build UI assets Build UI assets, that runs new thanos UI with explain button. Signed-off-by: Pradyumna Krishna <[email protected]> * Revert proxy url change from package.json `proxy` was accidently changed and committed with package.json when removed dependency. Hence, reverting it back. Signed-off-by: Pradyumna Krishna <[email protected]> * Minor changes in UI Fix requested changes in UI. - Rename `state` and `setState` to `mapping` and `setMapping`. - Rename `NodeTree` to `QueryTree`. - Use unicode characters instead of `-` and `+`. - Fix blue box on explain button. Signed-off-by: Pradyumna Krishna <[email protected]> * Update UI assets Signed-off-by: Pradyumna Krishna <[email protected]> --------- Signed-off-by: Pradyumna Krishna <[email protected]> * Implementing Regex optimization on the `MatchNotRegexp` and `MatchNotEqual` matcher type (thanos-io#6379) * Implementing Regex optimization on the MatchNotRegexp matcher type Signed-off-by: Alan Protasio <[email protected]> * Opmizing MatchNotEqual Signed-off-by: Alan Protasio <[email protected]> --------- Signed-off-by: Alan Protasio <[email protected]> * Put back the correct makefile Signed-off-by: Douglas Camata <[email protected]> * Remove extra line that broke untouched test Signed-off-by: Douglas Camata <[email protected]> * Add back line break at end of makefile Signed-off-by: Douglas Camata <[email protected]> * Fix Receive single ingestor test Signed-off-by: Douglas Camata <[email protected]> * Reproduce dedup issue in Receive Signed-off-by: Douglas Camata <[email protected]> * Add even more test cases for dedup on store gw Signed-off-by: Douglas Camata <[email protected]> * Reproduce dedup bug in Sidecar Signed-off-by: Douglas Camata <[email protected]> * Reuse nginx image name Signed-off-by: Douglas Camata <[email protected]> * Let all users read the metrics file from static metrics server Signed-off-by: Douglas Camata <[email protected]> * Rerun CI Signed-off-by: Douglas Camata <[email protected]> * Rerun CI Signed-off-by: Douglas Camata <[email protected]> * Reformat asciiflow chart Signed-off-by: Douglas Camata <[email protected]> * Reuse static metrics server from e2e framework Signed-off-by: Douglas Camata <[email protected]> * add de-cix as adopter (thanos-io#6386) Signed-off-by: Raul Garcia Sanchez <[email protected]> * [chore] Updating Query Engine and Prometheus (thanos-io#6392) * Updating Query Engine Signed-off-by: Alan Protasio <[email protected]> * fix prometheus breaking change Signed-off-by: Alan Protasio <[email protected]> * Update prometheus with prometheus/prometheus#12387 Signed-off-by: Alan Protasio <[email protected]> --------- Signed-off-by: Alan Protasio <[email protected]> * Receive: Allow specifying tenant-specific external labels in RouterIngestor (thanos-io#5777) Signed-off-by: haanhvu <[email protected]> * check context cancel when doing posting batches (thanos-io#6396) Signed-off-by: Ben Ye <[email protected]> * Expose store gateway query stats in series response hints (thanos-io#6352) * expose query stats hints Signed-off-by: Ben Ye <[email protected]> * update Signed-off-by: Ben Ye <[email protected]> * add query stats hints in result Signed-off-by: Ben Ye <[email protected]> * update changelog Signed-off-by: Ben Ye <[email protected]> * add merge method Signed-off-by: Ben Ye <[email protected]> * fix unit test Signed-off-by: Ben Ye <[email protected]> modify hints proto Signed-off-by: Ben Ye <[email protected]> fix unit test Signed-off-by: Ben Ye <[email protected]> update format Signed-off-by: Ben Ye <[email protected]> * update comments Signed-off-by: Ben Ye <[email protected]> * try again Signed-off-by: Ben Ye <[email protected]> --------- Signed-off-by: Ben Ye <[email protected]> * receive: make az aware ketama hashring (thanos-io#6369) * receive: make az aware ketama hashring Signed-off-by: Alexander Rickardsson <[email protected]> * receive: pass endpoints in hashring config as object Signed-off-by: Michael Hoffmann <[email protected]> * receive: add some tests for consistent hashing in presence of AZs Signed-off-by: Michael Hoffmann <[email protected]> * receive,docs: add migration note for az aware hashring Signed-off-by: Michael Hoffmann <[email protected]> --------- Signed-off-by: Alexander Rickardsson <[email protected]> Signed-off-by: Michael Hoffmann <[email protected]> Co-authored-by: Michael Hoffmann <[email protected]> * Proposal: query path tenancy (thanos-io#6320) * Add 1st version of query path tenancy proposal Signed-off-by: Douglas Camata <[email protected]> * Update proposal after initial feedback Signed-off-by: Douglas Camata <[email protected]> * Add cool picture Signed-off-by: Douglas Camata <[email protected]> * Include example in cross tenant query complications Signed-off-by: Douglas Camata <[email protected]> * Improve reasoning for why not using the QFE Signed-off-by: Douglas Camata <[email protected]> * Improve writing in "How" section Signed-off-by: Douglas Camata <[email protected]> * Fix owner profile link Signed-off-by: Douglas Camata <[email protected]> * Apply suggestions from code review Co-authored-by: Saswata Mukherjee <[email protected]> Signed-off-by: Douglas Camata <[email protected]> * Address few more PR review comments Signed-off-by: Douglas Camata <[email protected]> * Address feedback on flag name text Signed-off-by: Douglas Camata <[email protected]> * Update diagram Signed-off-by: Douglas Camata <[email protected]> * Improve non-goals text Signed-off-by: Douglas Camata <[email protected]> * Update diagram Signed-off-by: Douglas Camata <[email protected]> * Update docs/proposals-accepted/202304-query-path-tenancy.md Co-authored-by: Filip Petkovski <[email protected]> Signed-off-by: Douglas Camata <[email protected]> * Clarify scenario for pitfalls of current solution Signed-off-by: Douglas Camata <[email protected]> * Clarify that Store doesn't care about tenant label Signed-off-by: Douglas Camata <[email protected]> * Add an action plan Signed-off-by: Douglas Camata <[email protected]> * Mention alternative idea of modifying Store API Signed-off-by: Douglas Camata <[email protected]> * Fix typo Co-authored-by: Giedrius Statkevičius <[email protected]> Signed-off-by: Douglas Camata <[email protected]> * Address lots of feedback on the proposal Signed-off-by: Douglas Camata <[email protected]> * Format query path tenancy proposal doc Signed-off-by: Douglas Camata <[email protected]> * Add a "Tenancy Model" subsection to "Goals" Signed-off-by: Douglas Camata <[email protected]> * Mention header semanthics in comparison with gRPC message field Signed-off-by: Douglas Camata <[email protected]> * Improve action plan structure and writing Signed-off-by: Douglas Camata <[email protected]> --------- Signed-off-by: Douglas Camata <[email protected]> Co-authored-by: Saswata Mukherjee <[email protected]> Co-authored-by: Filip Petkovski <[email protected]> Co-authored-by: Giedrius Statkevičius <[email protected]> * Fix double-counting bug in http_request_duration metric (thanos-io#6399) * fix double-counting bug in http_request_duration metric Signed-off-by: 4orty <[email protected]> * Update Changelog Signed-off-by: 4orty <[email protected]> --------- Signed-off-by: 4orty <[email protected]> * Updates busybox SHA (thanos-io#6403) Signed-off-by: GitHub <[email protected]> Co-authored-by: fpetkovski <[email protected]> * Fix series stats merge (thanos-io#6408) * fix series stats merge Signed-off-by: Ben Ye <[email protected]> * update license header Signed-off-by: Ben Ye <[email protected]> * use reflect Signed-off-by: Ben Ye <[email protected]> --------- Signed-off-by: Ben Ye <[email protected]> * Receive: allow unlimited head_series_limit tenants (thanos-io#6406) With this commit we now allow to configure tenants with unlimited active series limit by setting the limit to `0`. Prior to this commit setting a per tenant limit to `0` would cause the tenant to be unable to write any metrics at all. This fixes: thanos-io#6393 Signed-off-by: Jacob Baungard Hansen <[email protected]> * expose downloaded data size in query hints (thanos-io#6409) Signed-off-by: Ben Ye <[email protected]> * maintainers: add myself to triagers (thanos-io#6414) Signed-off-by: Michael Hoffmann <[email protected]> * Add `@douglascamata` to triagers (thanos-io#6418) Signed-off-by: Douglas Camata <[email protected]> * Add Blog (thanos-io#6411) * Add LFX blog Signed-off-by: Pradyumna Krishna <[email protected]> * Add Headers to blog Signed-off-by: Pradyumna Krishna <[email protected]> * Lint blog Signed-off-by: Pradyumna Krishna <[email protected]> --------- Signed-off-by: Pradyumna Krishna <[email protected]> * blog: Fix images for LFX post (thanos-io#6422) * blog: Fix images for LFX post Signed-off-by: Saswata Mukherjee <[email protected]> * fix lint Signed-off-by: Saswata Mukherjee <[email protected]> --------- Signed-off-by: Saswata Mukherjee <[email protected]> * Index Cache: Change cache key for postings (thanos-io#6405) * extend postings cache key with codec Signed-off-by: Ben Ye <[email protected]> * add changelog Signed-off-by: Ben Ye <[email protected]> * update code back Signed-off-by: Ben Ye <[email protected]> * add colon Signed-off-by: Ben Ye <[email protected]> * update changelog Signed-off-by: Ben Ye <[email protected]> * fix another test Signed-off-by: Ben Ye <[email protected]> * add compression scheme const to remote index cache Signed-off-by: Ben Ye <[email protected]> * address required comments Signed-off-by: Ben Ye <[email protected]> * fix compression scheme name Signed-off-by: Ben Ye <[email protected]> --------- Signed-off-by: Ben Ye <[email protected]> * Receive: upgrading logs for failed uploads to error (thanos-io#6427) * FIX: upgrading log for failed upload to error Signed-off-by: Victor Fernandes <[email protected]> * docs: added changelog entry Signed-off-by: Victor Fernandes <[email protected]> --------- Signed-off-by: Victor Fernandes <[email protected]> * fix postings test Signed-off-by: Ben Ye <[email protected]> * Add aiven as adopter... more soon! (thanos-io#6430) Signed-off-by: Jonah Kowall <[email protected]> * Report gRPC connnection errors to the caller (thanos-io#6428) By default `grpc.DialContext()` is non-blocking so any connection issue will not be surfaced to the user. This change makes it blocking and configures the gRPC dialer to report the underlying error if any happens. Signed-off-by: Simon Pasquier <[email protected]> * chore: remove duplicated `gopkg.in/fsnotify.v1` dep (thanos-io#6432) * chore: remove duplicated `gopkg.in/fsnotify.v1` dep `github.com/fsnotify/fsnotify` and `gopkg.in/fsnotify.v1` are the same dependency. We can keep `github.com/fsnotify/fsnotify` and remove `gopkg.in/fsnotify.v1`. Signed-off-by: Eng Zer Jun <[email protected]> * docs: add changelog Signed-off-by: Eng Zer Jun <[email protected]> --------- Signed-off-by: Eng Zer Jun <[email protected]> * Expose estimated chunk and series size as configurable options (thanos-io#6426) * expose estimated chunk and series size as configurable options Signed-off-by: Ben Ye <[email protected]> * fix lint Signed-off-by: Ben Ye <[email protected]> * fix test Signed-off-by: Ben Ye <[email protected]> * fix test Signed-off-by: Ben Ye <[email protected]> --------- Signed-off-by: Ben Ye <[email protected]> * Receive: make tsdb stats limit configurable (thanos-io#6437) * Receive: make tsdb stats limit configurable Signed-off-by: Michael Hoffmann <[email protected]> * Receive: make tsdb stats limit configurable Signed-off-by: Michael Hoffmann <[email protected]> --------- Signed-off-by: Michael Hoffmann <[email protected]> * *: wire new Engine/Explain fields in query-frontend (thanos-io#6433) - Pass Engine/Explain fields in query-frontend codecs - Add Engine field to QFE cache key - Add e2e tests for all cases Signed-off-by: Giedrius Statkevičius <[email protected]> * index cache: Cache expanded postings (thanos-io#6420) * cache expanded postings in index cache Signed-off-by: Ben Ye <[email protected]> * update changelog Signed-off-by: Ben Ye <[email protected]> * fix Signed-off-by: Ben Ye <[email protected]> * fix lint Signed-off-by: Ben Ye <[email protected]> * rebase main and added compression name to key Signed-off-by: Ben Ye <[email protected]> * update key Signed-off-by: Ben Ye <[email protected]> * add e2e test for memcached Signed-off-by: Ben Ye <[email protected]> * fix cache config Signed-off-by: Ben Ye <[email protected]> * address review comments Signed-off-by: Ben Ye <[email protected]> --------- Signed-off-by: Ben Ye <[email protected]> * add approximate series size to index stats (thanos-io#6425) Signed-off-by: Ben Ye <[email protected]> * index stats: fix chunk size calculation (thanos-io#6424) Signed-off-by: Ben Ye <[email protected]> * Remove some unused Cortex vendored code and metrics (thanos-io#6440) * Fixed DefaultPromConfig * Fixed imports * Back to diffVarintSnappyEncode * Merge pull request thanos-io#180 from Shopify/optimize-timerange-calculation Cache calculated mint and maxt for each remote engine * Updated busybox * fixing lint * Fixing merge conflict Signed-off-by: Pedro Tanaka <[email protected]> * Fixing missing import Signed-off-by: Pedro Tanaka <[email protected]> * fix lint again Signed-off-by: Pedro Tanaka <[email protected]> * resolving conflict merges Signed-off-by: Pedro Tanaka <[email protected]> * Fixed import and fn order * Fixed unit tests * Updated promdoc.sum * Back to custom promql engine * Removed custom promql engine and moved to latest upstream * Ran go mod tidy * Fixed GetQueryAPIClients * Store: fix crash on empty regex matcher Signed-off-by: Michael Hoffmann <[email protected]> --------- Signed-off-by: Sebastian Rabenhorst <[email protected]> Signed-off-by: Xiaochao Dong (@damnever) <[email protected]> Signed-off-by: Thibault Mange <[email protected]> Signed-off-by: Giedrius Statkevičius <[email protected]> Signed-off-by: shayyxi <[email protected]> Signed-off-by: Shazi <[email protected]> Signed-off-by: Rueian <[email protected]> Signed-off-by: aimuz <[email protected]> Signed-off-by: Ben Ye <[email protected]> Signed-off-by: Filip Petkovski <[email protected]> Signed-off-by: Alban HURTAUD <[email protected]> Signed-off-by: GitHub <[email protected]> Signed-off-by: Jacob Baungard Hansen <[email protected]> Signed-off-by: Douglas Camata <[email protected]> Signed-off-by: Paul Gier <[email protected]> Signed-off-by: Alan Protasio <[email protected]> Signed-off-by: Pradyumna Krishna <[email protected]> Signed-off-by: Raul Garcia Sanchez <[email protected]> Signed-off-by: haanhvu <[email protected]> Signed-off-by: Alexander Rickardsson <[email protected]> Signed-off-by: Michael Hoffmann <[email protected]> Signed-off-by: 4orty <[email protected]> Signed-off-by: Michael Hoffmann <[email protected]> Signed-off-by: Saswata Mukherjee <[email protected]> Signed-off-by: Victor Fernandes <[email protected]> Signed-off-by: Jonah Kowall <[email protected]> Signed-off-by: Simon Pasquier <[email protected]> Signed-off-by: Eng Zer Jun <[email protected]> Signed-off-by: Pedro Tanaka <[email protected]> Co-authored-by: Sebastian Rabenhorst <[email protected]> Co-authored-by: Xiaochao Dong <[email protected]> Co-authored-by: Thibault Mange <[email protected]> Co-authored-by: Saswata Mukherjee <[email protected]> Co-authored-by: Giedrius Statkevičius <[email protected]> Co-authored-by: Shazi <[email protected]> Co-authored-by: shayyxi <[email protected]> Co-authored-by: Rueian <[email protected]> Co-authored-by: aimuz <[email protected]> Co-authored-by: Ben Ye <[email protected]> Co-authored-by: Filip Petkovski <[email protected]> Co-authored-by: Alban Hurtaud <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: fpetkovski <[email protected]> Co-authored-by: Jacob Baungård Hansen <[email protected]> Co-authored-by: Douglas Camata <[email protected]> Co-authored-by: Paul Gier <[email protected]> Co-authored-by: Alan Protasio <[email protected]> Co-authored-by: Pradyumna Krishna <[email protected]> Co-authored-by: Raúl Garcia Sanchez <[email protected]> Co-authored-by: Ha Anh Vu <[email protected]> Co-authored-by: Alexander Rickardsson <[email protected]> Co-authored-by: Michael Hoffmann <[email protected]> Co-authored-by: Giedrius Statkevičius <[email protected]> Co-authored-by: Wonki Kim <[email protected]> Co-authored-by: Michael Hoffmann <[email protected]> Co-authored-by: Victor Hugo Brito Fernandes <[email protected]> Co-authored-by: Jonah Kowall <[email protected]> Co-authored-by: Simon Pasquier <[email protected]> Co-authored-by: Eng Zer Jun <[email protected]> Co-authored-by: Sebastian Rabenhorst <[email protected]>
…proxy (thanos-io#199)" This reverts commit a93191e.
Store nodes are currently generally run as a single replica. It's not super critical to have HA in general since several hours or even days of recent data are HA via the Prometheus servers. But for some scenarios it might still be preferable.
Two could simply be deployed and the query node would take care of deduplication/merging just like for Prometheus HA pairs. But unlike Prometheus servers, the underlying data is truly the same in this case and fetching twice the amount is unnecessary overhead.
Some simple logic could be added to the query node to recognize real duplicates (Prometheus HA pairs are actually different through a
replica
label) and to only query one of them.The text was updated successfully, but these errors were encountered: