Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thanos store gateway: fatal error: concurrent map writes #2471

Closed
hawran opened this issue Apr 20, 2020 · 3 comments
Closed

thanos store gateway: fatal error: concurrent map writes #2471

hawran opened this issue Apr 20, 2020 · 3 comments
Labels

Comments

@hawran
Copy link

hawran commented Apr 20, 2020

Thanos, Prometheus and Golang version used:
thanos: 0.12.0
prometheus: 2.17.1
go: 1.13

Object Storage Provider:
ceph s3

What happened:
A couple of restarts during last two days because of internal errors.

What you expected to happen:
No such restarts.

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components:
https://gist.github.com/hawran/f87f6b03aceac64553d9b2cd8e6c2532

Anything else we need to know:

@bwplotka
Copy link
Member

bwplotka commented Apr 20, 2020

Ack, looks like a bug to be fixed in 0.12.1 @squat

Will look on this after breakfast

@bwplotka bwplotka added the bug label Apr 20, 2020
@squat
Copy link
Member

squat commented Apr 20, 2020

+1 thanks for the report. will investigate

squat added a commit to squat/thanos that referenced this issue Apr 20, 2020
Fixes: thanos-io#2471

This commit fixes an issue where multiple goroutines in the block
fetcher filtering were concurrently accessing the same map. The
goroutines were concurrently writing AND reading to the shared metas
map. This commit guards this concurrent access by giving the
DeduplicateFilter struct a mutex.

Signed-off-by: Lucas Servén Marín <[email protected]>
squat added a commit to squat/thanos that referenced this issue Apr 20, 2020
Fixes: thanos-io#2471

This commit fixes an issue where multiple goroutines in the block
fetcher filtering were concurrently accessing the same map. The
goroutines were concurrently writing AND reading to the shared metas
map. This commit guards this concurrent access by giving the
DeduplicateFilter struct a mutex.

Signed-off-by: Lucas Servén Marín <[email protected]>
squat added a commit to squat/thanos that referenced this issue Apr 20, 2020
Fixes: thanos-io#2471

This commit fixes an issue where multiple goroutines in the block
fetcher filtering were concurrently accessing the same map. The
goroutines were concurrently writing AND reading to the shared metas
map. This commit guards this concurrent access by giving the
DeduplicateFilter struct a mutex.

Signed-off-by: Lucas Servén Marín <[email protected]>
bwplotka pushed a commit that referenced this issue Apr 20, 2020
Fixes: #2471

This commit fixes an issue where multiple goroutines in the block
fetcher filtering were concurrently accessing the same map. The
goroutines were concurrently writing AND reading to the shared metas
map. This commit guards this concurrent access by giving the
DeduplicateFilter struct a mutex.

Signed-off-by: Lucas Servén Marín <[email protected]>
@bwplotka
Copy link
Member

bwplotka commented Apr 20, 2020

Fixed on release-0.12, should be ported to master soon 🤗 Will be part of 0.12.1

Thanks for quick report!

bwplotka added a commit that referenced this issue May 20, 2020
* Removed dependency on Cortex fork; Moved to official one. (#2199)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Typo corrections quick-tutorial.md (#2196)

* Corrected all Prometheus possessives to read `Prometheus's`, this matches Prometheus's own documentation.
* Corrected `simple` to `simply` when describing compactor scanning behaviour

Signed-off-by: Peter Avdjian <[email protected]>

* tracing: Simplified creation of spans. (#2202)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed links to dashboards json files. (#2203)

Signed-off-by: Roman Grytskiv <[email protected]>

* Skip deleting files that we just deleted (#2185)

* Skip deleting files that we just deleted

We see this happening with Swift. Because the consistency of swift is eventual, swift sometimes didn't process the deletion of the meta file yet, and so it turns up in the bkt.Iter(). The second deletion then causes a 404 and compaction fails.

Signed-off-by: Wim Fournier <[email protected]>

* return, as this is a func. Add debug log and comment

Signed-off-by: Wim Fournier <[email protected]>

* fixing build: wrong parameter name

Signed-off-by: Wim Fournier <[email protected]>

* fix lint

Signed-off-by: Wim Fournier <[email protected]>

* Refactor deleteDir into deleteDirRec and add a parameter for a function that allows to keep certain files.

Signed-off-by: Wim Fournier <[email protected]>

* Fix lint

Signed-off-by: Wim Fournier <[email protected]>

* implementing suggested fixes

Signed-off-by: Wim Fournier <[email protected]>

* improve web.route-prefix handling (#2208)

This makes the handling of web.route-prefix more similar to the
behavior in Prometheus.  Correctly handles '/' and prefixes which
do not begin with a '/'.

Signed-off-by: Paul Gier <[email protected]>

* Merge release-0.11 back into master (#2212)

* Create release v0.11.0-rc.0 (#2156)

* Update version to v0.11.0-rc.0

* Update CHANGELOG with all PRs for v0.11

* Improve CHANGELOG by being more explicit

* Bumped minio-go library to v6.0.49, fixing an IAM bug in v6.0.45 (#2189)

Signed-off-by: Kraig Amador <[email protected]>

* Create release candidate  v0.11.0-rc.1 (#2192)

Signed-off-by: Matthias Loibl <[email protected]>

* Release v0.11.0 (#2205)

Signed-off-by: Matthias Loibl <[email protected]>

* Update VERSION to 0.12.0-dev

Signed-off-by: Matthias Loibl <[email protected]>

* Resolve go.sum merge conflict and run go mod tidy

Signed-off-by: Matthias Loibl <[email protected]>

Co-authored-by: Kraig Amador <[email protected]>

* returns error messages when trigger reload with http (#1848)

* returns error messages when trigger reload with http

Signed-off-by: arthur yang <[email protected]>

* use simple reloadRules function instead of magic chan error error

Signed-off-by: yapo.yang <[email protected]>

* add tailing period for comment

Signed-off-by: yapo.yang <[email protected]>

* fix comment

Signed-off-by: arthur yang <[email protected]>

* add white space for better code reading

Signed-off-by: arthur yang <[email protected]>

* collect thanos rule metrics into one struct

Signed-off-by: arthur yang <[email protected]>

* remove termination logic and keep log only

Signed-off-by: arthur yang <[email protected]>

* update changelog for #1848

Signed-off-by: arthur yang <[email protected]>

* add tailing period

Signed-off-by: arthur yang <[email protected]>

* check whether registry is nil

Signed-off-by: arthur yang <[email protected]>

* tailing period in metrics

Signed-off-by: arthur yang <[email protected]>

* cancel with context

Signed-off-by: arthur yang <[email protected]>

* return ctx.Err() instead of errors.New

Signed-off-by: arthur yang <[email protected]>

* register thanos rule metrics with promauto

Signed-off-by: arthur yang <[email protected]>

* return errs before set success related metrics

Signed-off-by: arthur yang <[email protected]>

* revert go.sum go.mod change

Signed-off-by: arthur yang <[email protected]>

* reload webhandler/sighup in one for loop

Signed-off-by: arthur yang <[email protected]>

* reload with chan chan error

Signed-off-by: yapo.yang <[email protected]>

* Fix error in component status help message (#2216)

Signed-off-by: mcsammac

Date:      Wed Mar 4 13:50:17 2020 -0500
On branch master
Changes to be committed:
	modified:   pkg/prober/intrumentation.go

Signed-off-by: s320009 <[email protected]>

* tutorials: fix typo in image version (#2223)

Signed-off-by: Paul Gier <[email protected]>

* Blocked classic prometheus constructors, moved all to promauto; Removed unnecessary printfs. (#2228)

Fixes: https://github.com/thanos-io/thanos/issues/2102

Also blocked them on CI side, thanks to https://github.com/fatih/faillint/pull/8

Signed-off-by: Bartlomiej Plotka <[email protected]>

* ruler: Fix #2204 bug where alert queue is unpoppable causing full queue and dropped alerts (#2238)

* Add test for alert queue Pop after multiple Push

Signed-off-by: Robin Clarke-Williams <[email protected]>

* Fix alert queue bug by resignal after Pop (#2204)

Signed-off-by: Robin Clarke-Williams <[email protected]>

* Fix alert queue test and simplify

Signed-off-by: Robin Clarke-Williams <[email protected]>

* Update CHANGELOG.md

Signed-off-by: Robin Clarke-Williams <[email protected]>

* Link to thanos-io/thanos PR in CHANGELOG.md

Signed-off-by: Robin Clarke-Williams <[email protected]>

* bucket: improve shard label handling (#2219)

Signed-off-by: Jacob Colvin <[email protected]>

* fixing querier deployment kube manifest example 404 error (#2229)

Signed-off-by: Rajesh Rajendran <[email protected]>

* *: Fix misuse of pkg/errors.Errorf and error directive (#2253)

* Fix pkg/errors error directive issues

Signed-off-by: Kemal Akkoyun <[email protected]>

* Fix misuse of Errorf

Signed-off-by: Kemal Akkoyun <[email protected]>

* Fix false metric name in Store GW e2e test (#2256)

Signed-off-by: Kemal Akkoyun <[email protected]>

* Add scheme to the alertmanagers.url in ruler example (#2255)

Signed-off-by: gitlawr <[email protected]>

* Sort chunks by thanos.downsample.resolution for better grouping (#2231)

Signed-off-by: Paul Traylor <[email protected]>

* Remove duplicate log.level arg in quickstart.sh (#2148)

Signed-off-by: Richard Poole <[email protected]>

* tutorials: fix incorrect query (#2239)

You would have to query `prometheus_tsdb_head_series` instead of `sum(prometheus_tsdb_head_series)` in order to get the 5 results when deduplicating.

Signed-off-by: John Chen <[email protected]>

* Use new go jsonnet formatter (#2258)

Signed-off-by: Kemal Akkoyun <[email protected]>

* docs: Document Thanos Sharding (#1922)

* docs: Document Thanos Sharding

Signed-off-by: Xiang Dai <[email protected]>

* Add time partitioning

Signed-off-by: Xiang Dai <[email protected]>

* feedback

Signed-off-by: Xiang Dai <[email protected]>

* Sharding: document supported relabel action and add store gateway backgroud (#2272)

* Sharding: document supported relabel action and add store gateway background

Signed-off-by: Xiang Dai <[email protected]>

* add hashmod

Signed-off-by: Xiang Dai <[email protected]>

* Add wait-interval flag (#2265)

Signed-off-by: Kemal Akkoyun <[email protected]>

* store: Optimized labels conversion on store.Series; Added unsafe labels conversion. (#2230)

## Changes

* method TranslateLables CPU Optimized (streamed sorting).
* All store GW label conversation to []storepb.Label are now alloc-less.

```
go test -bench=BenchmarkUnsafeVSSafeLabelsConversion -run=^$ -benchmem -timeout 2h -benchtime 10s ./pkg/store/storepb/...
 goos: linux
 goarch: amd64
 pkg: github.com/thanos-io/thanos/pkg/store/storepb
 BenchmarkUnsafeVSSafeLabelsConversion/safe-12         	   34822	    339076 ns/op	  655368 B/op	       2 allocs/op
 BenchmarkUnsafeVSSafeLabelsConversion/unsafe-12       	1000000000	         2.32 ns/op	       0 B/op	       0 allocs/op
PASS
```

TODO: Do the same on Querier.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* fix: Ignore the OS-X Trash (#2274)

Signed-off-by: kushthedude <[email protected]>

* docs/sharding.md: fix a typo (#2273)

Signed-off-by: Xiang Dai <[email protected]>

* fix replicate duplicate metrics (#2254)

Signed-off-by: yeya24 <[email protected]>

* Document downsample component (#2090)

* scripts/genflagdocs.sh: Generate downsample flag

Signed-off-by: Xiang Dai <[email protected]>

* Document downsample component

Signed-off-by: Xiang Dai <[email protected]>

* Move downsample as bucket sub-command

Signed-off-by: Xiang Dai <[email protected]>

* update docs

Signed-off-by: Xiang Dai <[email protected]>

* feedback

Signed-off-by: Xiang Dai <[email protected]>

* Crashing error messages now will print stacktrace. (#2277)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Downsample: update changelog (#2278)

* Downsample: update changelog

Signed-off-by: Xiang Dai <[email protected]>

* feedback

Signed-off-by: Xiang Dai <[email protected]>

* thanos-mixin: clear units/axis (#2279)

* thanos-mixin: clear units/axis

Signed-off-by: Xiang Dai <[email protected]>

* fix nits

Signed-off-by: Xiang Dai <[email protected]>

* store, compact, bucket: Delay deletes by scheduling block deletion with deletion-mark.json file (#2136)

Signed-off-by: khyatisoneji <[email protected]>

* Use maxInt instead of math.MaxInt64 (#2268)

math.MaxInt64 doesn't work on 32-bit systems (like linux/arm builds)

Signed-off-by: Peter Štibraný <[email protected]>

* Replace objstore.Exists function calls with bkt.Exists (#2284)

Signed-off-by: khyatisoneji <[email protected]>

* Added Xiang to Triage Role. (#2289)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Enrich Memcached client logs (#2292)

* Enrich Memcached client logs

Signed-off-by: Marco Pracucci <[email protected]>

* Update pkg/cacheutil/memcached_client.go

Signed-off-by: Marco Pracucci <[email protected]>

Co-Authored-By: Bartlomiej Plotka <[email protected]>

* Update pkg/cacheutil/memcached_client.go

Signed-off-by: Marco Pracucci <[email protected]>

Co-Authored-By: Bartlomiej Plotka <[email protected]>

Co-authored-by: Bartlomiej Plotka <[email protected]>

* Added Kemal to Triage Role. (#2293)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* bucket: handle instances where no blocks are loaded (#2271)

* bucket: handle instances where no blocks are loaded

Signed-off-by: Jacob Colvin <[email protected]>

* bucket: reject all falsy label values

Signed-off-by: Jacob Colvin <[email protected]>

* bucket: update changelog

Signed-off-by: Jacob Colvin <[email protected]>

* docs/sharding.md: Replace example floating link with permalink (#2296)

Signed-off-by: Frederic Branczyk <[email protected]>

* Added latest release badge. (#2300)

I think there are NOT enough badges, so added one more!

Signed-off-by: Bartlomiej Plotka <[email protected]>

* store: Postings fetching optimizations (#2294)

* Avoid fetching duplicate keys.
Simplified groups with add/remove keys.

Signed-off-by: Peter Štibraný <[email protected]>

* Added shortcuts

Signed-off-by: Peter Štibraný <[email protected]>

* Optimize away fetching of ALL postings, if possible.
Only remove postings for each key once.

Signed-off-by: Peter Štibraný <[email protected]>

* Don't do individual index.Without, but merge them first.

Signed-off-by: Peter Štibraný <[email protected]>

* Don't use map for fetching postings, but return slice instead.

This is in line with original code. Using a map was nicer,
but more expensive in terms of allocations and hashing
labels.

Signed-off-by: Peter Štibraný <[email protected]>

* Renamed 'all' to 'allRequested'.

Signed-off-by: Peter Štibraný <[email protected]>

* Typo

Signed-off-by: Peter Štibraný <[email protected]>

* Make linter happy.

Signed-off-by: Peter Štibraný <[email protected]>

* Added comment to fetchPostings.

Signed-off-by: Peter Štibraný <[email protected]>

* Group vars

Signed-off-by: Peter Štibraný <[email protected]>

* Comments

Signed-off-by: Peter Štibraný <[email protected]>

* Use allPostings and emptyPostings variables for special cases.

Signed-off-by: Peter Štibraný <[email protected]>

* Unify terminology to "special All postings"

Signed-off-by: Peter Štibraný <[email protected]>

* Address feedback.

Signed-off-by: Peter Štibraný <[email protected]>

* Added CHANGELOG.md entry.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix check for empty group.

Signed-off-by: Peter Štibraný <[email protected]>

* Comment

Signed-off-by: Peter Štibraný <[email protected]>

* Special All postings is now added as a new group

No special handling required anymore.

Signed-off-by: Peter Štibraný <[email protected]>

* Updated comment

Signed-off-by: Peter Štibraný <[email protected]>

* cmd/thanos/receive: Remove unused TLSClientConfig from Options (#2299)

Signed-off-by: mrIncompetent <[email protected]>

* compactor: Add ReplicaLabelRemover as MetaFetcher filter to enable offline vertical compaction/deduplication for replicated data (#2250)

* Create ReplicaLabelsFilter to allow for offline deduplication

Signed-off-by: Matthias Loibl <[email protected]>

* Start adding a e2e test for offline-deduplication with Thanos compact

Signed-off-by: Matthias Loibl <[email protected]>

* Address issues that have discovered after review

Signed-off-by: Kemal Akkoyun <[email protected]>

* Fix e2e test service issue

Signed-off-by: Kemal Akkoyun <[email protected]>

* Improve fetcher unit tests

Signed-off-by: Kemal Akkoyun <[email protected]>

* Add simple compactor e2e tests with replica remover

Signed-off-by: Kemal Akkoyun <[email protected]>

* Remove unnecessary interface

Signed-off-by: Kemal Akkoyun <[email protected]>

* Address review issues

Signed-off-by: Kemal Akkoyun <[email protected]>

* Add more test cases

Signed-off-by: Kemal Akkoyun <[email protected]>

* Improve and stabilize e2e tests

Signed-off-by: Kemal Akkoyun <[email protected]>

* Address review issues

Signed-off-by: Kemal Akkoyun <[email protected]>

* Increase ruler sd refresh interval

Signed-off-by: Kemal Akkoyun <[email protected]>

* Address review issues

Signed-off-by: Kemal Akkoyun <[email protected]>

* Separate filters and modifiers

Signed-off-by: Kemal Akkoyun <[email protected]>

Co-authored-by: Matthias Loibl <[email protected]>

* docs/release: squat to release v0.12.0 (#2312)

Signed-off-by: Lucas Servén Marín <[email protected]>

* cmd/thanos/receive: Serve TLS when TLSConfig is given (#2311)

Signed-off-by: mrIncompetent <[email protected]>
Signed-off-by: Lucas Servén Marín <[email protected]>

Co-authored-by: mrIncompetent <[email protected]>

* cmd/thanos/compact: add bucket UI (#1714)

This commit enhances the compact component so that it runs the bucket UI
whenever the --wait flag is also passed. In order to reduce the overhead
of running the UI in addition to the compactor, this commit also
refactors the compactor and bucket commands a bit in order to re-use a
single meta fetcher.

Signed-off-by: Lucas Servén Marín <[email protected]>

* reloadRules initlialization should fail (#2301)

Signed-off-by: arthur yang <[email protected]>

* Fixed inconsistent metrics and methods (#2319)

Signed-off-by: jojohappy <[email protected]>

* e2e: Refactored compactor test; Fixed flakiness. (#2313)

Also:

* Reduced number of services for e2e for latency
* Fixed halting
* Improved logging.
* Improved test cases (e.g added test for compaction and halting)


Signed-off-by: Bartlomiej Plotka <[email protected]>

* pkg/store: Report no data if no stores discovered (#2310)

* pkg/store: Report no data if no stores discovered

Signed-off-by: Frederic Branczyk <[email protected]>

* CHANGELOG.md: Add timespan reported on empty stores

Signed-off-by: Frederic Branczyk <[email protected]>

* Added max_item_size to Memcached client (#2304)

* Added max_item_size to Memcached client

Signed-off-by: Marco Pracucci <[email protected]>

* Changed imports order and splitted tests

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed type casting

Signed-off-by: Marco Pracucci <[email protected]>

* Changed imports grouping

Signed-off-by: Marco Pracucci <[email protected]>

* Changed memcached max_item_size default from 0 to 1MB

Signed-off-by: Marco Pracucci <[email protected]>

* Increased e2e tests timeout

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed typo in CHANGELOG

Signed-off-by: Marco Pracucci <[email protected]>

* Reverted Makefile changes

Signed-off-by: Marco Pracucci <[email protected]>

* tesutil: Enchanced testutil, refactored for our needs. (#2325)

Changed LICENSE as we no longer use version we copied back then.
Most of it was reimplemented.

Why?
* Much richer diff (inspired by testify packages
* Consistent API
* Less indentation.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* make, ci: Check example alerts and rules in CI (#2318)

* Check example alerts and rules in CI

Signed-off-by: Kemal Akkoyun <[email protected]>

* Add require clean tree

Signed-off-by: Kemal Akkoyun <[email protected]>

* Fix latency alerts (#2316)

Signed-off-by: Kemal Akkoyun <[email protected]>

* Fixed e2e. (#2327)

Sorry, was late when we merged the fix. Funny bug: It would start to fail exactly 12h AFTER 25.03 8:00 GMT

Should be fine now... and in future until changed ;p

Signed-off-by: Bartlomiej Plotka <[email protected]>

* store: added option to reencode and compress postings before storing them to the cache (#2297)

* Added "diff+varint+snappy" codec for postings.

Signed-off-by: Peter Štibraný <[email protected]>

* Added option to reencode and compress postings stored in cache

Signed-off-by: Peter Štibraný <[email protected]>

* Expose enablePostingsCompression flag as CLI parameter.

Signed-off-by: Peter Štibraný <[email protected]>

* Use "github.com/pkg/errors" instead of "errors" package.

Signed-off-by: Peter Štibraný <[email protected]>

* remove break

Signed-off-by: Peter Štibraný <[email protected]>

* Removed empty branch

Signed-off-by: Peter Štibraný <[email protected]>

* Added copyright headers.

Signed-off-by: Peter Štibraný <[email protected]>

* Added CHANGELOG.md entry

Signed-off-by: Peter Štibraný <[email protected]>

* Added comments.

Signed-off-by: Peter Štibraný <[email protected]>

* Use Encbuf and Decbuf.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix comments in test file.

Signed-off-by: Peter Štibraný <[email protected]>

* Another comment...

Signed-off-by: Peter Štibraný <[email protected]>

* Removed diffVarintSnappyEncode function.

Signed-off-by: Peter Štibraný <[email protected]>

* Comment on usage with in-memory cache.

Signed-off-by: Peter Štibraný <[email protected]>

* var block

Signed-off-by: Peter Štibraný <[email protected]>

* Removed extra comment.

Signed-off-by: Peter Štibraný <[email protected]>

* Move comment to error message.

Signed-off-by: Peter Štibraný <[email protected]>

* Separated snappy compression and postings reencoding into two functions.
There is now header only for snappy-compressed postings.

Signed-off-by: Peter Štibraný <[email protected]>

* Added comment on using diff+varint+snappy.

Signed-off-by: Peter Štibraný <[email protected]>

* Shorten header

Signed-off-by: Peter Štibraný <[email protected]>

* Lint...

Signed-off-by: Peter Štibraný <[email protected]>

* Changed experimental.enable-postings-compression to experimental.enable-index-cache-postings-compression

Signed-off-by: Peter Štibraný <[email protected]>

* Added metrics for postings compression

Signed-off-by: Peter Štibraný <[email protected]>

* Added metrics for postings decompression

Signed-off-by: Peter Štibraný <[email protected]>

* Reorder metrics

Signed-off-by: Peter Štibraný <[email protected]>

* Fixed comment.

Signed-off-by: Peter Štibraný <[email protected]>

* Fixed comment.

Signed-off-by: Peter Štibraný <[email protected]>

* Use encode/decode labels.

Signed-off-by: Peter Štibraný <[email protected]>

* mixin: Make alert threshold values parametric (#2317)

* Make alert threshold values parametric

Signed-off-by: Kemal Akkoyun <[email protected]>

* Rename variable

Signed-off-by: Kemal Akkoyun <[email protected]>

* Adjsut default values for latency thresholds

Signed-off-by: Kemal Akkoyun <[email protected]>

* Update UW logo (#2329)

Signed-off-by: Povilas Versockas <[email protected]>

* block fetcher with errgroup (#2309)

* block fetcher with errgroup

Signed-off-by: arthur yang <[email protected]>

* errorgroup goroutine defer close

Signed-off-by: arthur yang <[email protected]>

* website: fix 404 on root of sections (#2328)

Signed-off-by: Prem Kumar <[email protected]>

* Add mallgroup.com to adopters (#2331)

Signed-off-by: Daniel Rataj <[email protected]>

Co-authored-by: Daniel Rataj <[email protected]>

* store: Binary index header is now production ready and enabled by default (#2330)

* store: Binary index header is now production ready and enabled by default.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed typo.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Add leboncoin company as adopter (#2333)

Signed-off-by: Guillaume Chenuet <[email protected]>

* website: Collapsible menu sections (#2336)

* website: make sidemenu collapsed by default

Signed-off-by: Prem Kumar <[email protected]>

* website: add caret svg in expandble sidemenu

Signed-off-by: Prem Kumar <[email protected]>

* website: expand current section's sidemenu by default

Signed-off-by: Prem Kumar <[email protected]>

* ui: fix store never removed from /stores page bug (#2339)

* ui: fix store never removed from /stores page bug

We need to update `LastCheck` only if the error is non-nil. That field
is used in the cleanup function to know when to remove the StoreAPI from
the UI. If we always update it, even if an error has happened, that
means that `--store.unhealthy-timeout` is never respected.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* query: fix storeset Update() test

Now let's start with a proper state where LastCheck is not 0 at the
beginning and we have 2 active stores, 3 store statuses just like the
original author had intended.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* fix typo in readme (#2342)

data -> date

Signed-off-by: afirth <[email protected]>

* query: add --store-strict flag (#2337)

* query: add --store-strict flag

Add a new flag called `--store-strict` as agreed per
https://thanos.io/proposals/202001_thanos_query_health_handling.md/

I have updated the proposal to reflect the reality.

Third time's the charm, I believe it :-)

Now the flag is called `--store-strict` which only accepts statically
defined nodes. I guess the code is even simpler now.

I have also fixed one small issue where `%w` was used in
`errors.Errorf`. Couldn't compile Thanos locally with Go 1.14 without
this fix.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* CHANGELOG: fix changelog item

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Register grpc prometheus middleware metrics (#2347)

Signed-off-by: Kemal Akkoyun <[email protected]>

* website: Enabled two scripts to fix Google analytics. (#2346)

* website: Enabled two scripts to fix Google analytics.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed also inline style.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Added Workfront as adopter (#2351)

Signed-off-by: Ryan Orth <[email protected]>

Co-authored-by: Ryan Orth <[email protected]>

* compact: Fixed minor logging issues. (#2353)

Fixes: https://github.com/thanos-io/thanos/issues/2322

Signed-off-by: Bartlomiej Plotka <[email protected]>

* fetcher: Made metaFetcher go routine safe; Fixed multiple bucket UI + fetcher issues. (#2354)

Fixed https://github.com/thanos-io/thanos/issues/2349
Fixed races (we were reusing fetcher by both bucket UI and compaction syncs...
Fixed logging
Added singleflight to ensure we don't synchronize too often.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* test/e2e: Add timestamp to e2e test log output (#2358)

Signed-off-by: Frederic Branczyk <[email protected]>

* store & compact: For components that operates on blocks - expose the UI on /loaded-blocks (#2357)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* rule: fix query addr parsing (#2288)

* rule: fix query addr parsing

Signed-off-by: Tobiasz Heller <[email protected]>

* CR: support different schemas

Signed-off-by: Tobiasz Heller <[email protected]>

* CR: docs and err

Signed-off-by: Tobiasz Heller <[email protected]>

* CR: improve error handling and more TC

Signed-off-by: Tobiasz Heller <[email protected]>

* mixin: Remove unused jobPrefix field (#2364)

Signed-off-by: Lili Cosic <[email protected]>

* Create release v0.12.0-rc.0 (#2360)

Signed-off-by: Lucas Servén Marín <[email protected]>

* Allow more connection reuse than the default of 2 (#2343)

Signed-off-by: Jakob Kartschall <[email protected]>

* Makefile: ignore GCS in CI (#2368)

We got booted from the GCS account, so skip this in CI for now.

Signed-off-by: Lucas Servén Marín <[email protected]>

* Revert "Makefile: ignore GCS in CI (#2368)" (#2373)

This reverts commit 8591434856ced5803e399b4d9d1bf2d1459c0ee0.

* mixin: Added critical Rules alerts. (#2374)

* mixin: Added critical Rules alerts.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addressed comments.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* mixin: Made sure Rule alerts are not firing if one replica is failing. (#2375)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Update S3 endpoint mapping link (#2377)

The link for the AWS Region Endpoint Mappings for S3 was out of date, this PR updates it to point to the new location.

Signed-off-by: João Carvalho <[email protected]>

* Fix2213 0.12 (#2382)

* binaryHeader: Fixed partial write issue for index-header.

Fixes https://github.com/thanos-io/thanos/issues/2213

This caused was indicated as regression of latency, and also causes potential critical issue
for store GW, where manual delete of index-header from local storage was required.

This might be considered as blocker for 0.12, so it would be worth to port it to 0.12 TBH @squat.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* binary_reader: ensure fs is synced before renaming

Signed-off-by: Lucas Servén Marín <[email protected]>

Co-authored-by: Bartlomiej Plotka <[email protected]>

* objstore: Added WithExpectedErrs which allows to control instrumentation (e.g not increment failures for expected not found) (#2383)

* objstore: Added WithExpectedErrs to Reader which allows to control instrumentation (e.g not increment failures for expected not found).

This allows to not wake up oncall in the middle of night, becuase of expeced, properly handled case (:

Also: Has to move inmem to objstore for testing.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* pkg/objstore: fix NewBucket comments.

This commit fixes the documentation comments for the NewBucket funcs.

Signed-off-by: Lucas Servén Marín <[email protected]>

Co-authored-by: Bartlomiej Plotka <[email protected]>

* pkge/receive: trace TSDB ingestion (#2384)

This commit adds a tracing span around the writing of remote-write
requests into TSDB. This will help us differentiate between the
latencies in the forwarding of requests around the hashring and the
latencies of appending to the database.

This commit also removes the `thanos_` prefix from the forwarding span
to better align with the span naming in the rest of the project.

Signed-off-by: Lucas Servén Marín <[email protected]>

* compact: Made MarkForDeletion less strict; Added more debugability to block deletion logic, made meta sync explicit. (#2385)

Also:

* Changed order: Now BestEffortCleanAbortedPartialUploads is before DeleteMarkedBlocks.
* Increment markedForDeletion counter only when we actually uploaded block.
* Fixed logging issues.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Compactor: Document reasons and solutions about overlaps (#2191)

* troubleshooting.md: document overlaps

Signed-off-by: Xiang Dai <[email protected]>

* feedback

Signed-off-by: Xiang Dai <[email protected]>

* feedback

Signed-off-by: Xiang Dai <[email protected]>

* add reminder label to stale bot config (#2378)

Signed-off-by: yeya24 <[email protected]>

* fix sharding docs style; fix promtail link (#2379)

Signed-off-by: yeya24 <[email protected]>

* store: Fixed binary header bug that was causing all postings to be kept in memory instead of 1/32 as we meant. (#2390)

* store: Fixed binary header bug that was causing all postings to be kept in memory instead of 1/32 as we meant.

Spotted by @mkabischev! Thanks to you and @d-ulyanov as well! Epic finding +1


Test output before fix:
					testutil.Equals(t, 1, br.version)
					testutil.Equals(t, 2, br.indexVersion)
					testutil.Equals(t, &BinaryTOC{Symbols: headerLen, PostingsOffsetTable: 66}, br.toc)
					testutil.Equals(t, int64(626), br.indexLastPostingEnd)
					testutil.Equals(t, 8, br.symbols.Size())
					testutil.Equals(t, map[string]*postingValueOffsets{
						"": {
							offsets:       []postingOffset{{value: "", tableOff: 4}},
							lastValOffset: 392,
						},
						"a": {
							offsets: []postingOffset{
								{value: "1", tableOff: 9},
								{value: "11", tableOff: 16},
								{value: "12", tableOff: 24},
								{value: "2", tableOff: 32},
								{value: "3", tableOff: 39},
								{value: "4", tableOff: 46},
								{value: "5", tableOff: 53},
								{value: "6", tableOff: 60},
								{value: "7", tableOff: 67},
								{value: "8", tableOff: 74},
								{value: "9", tableOff: 81},
							},
							lastValOffset: 572,
						},
						"longer-string": {
							offsets:       []postingOffset{{value: "1", tableOff: 88}},
							lastValOffset: 622,
						},
					}, br.postings)
					testutil.Equals(t, 0, len(br.postingsV1))
					testutil.Equals(t, 2, len(br.nameSymbols))

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Added CHANGELOG item.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed build errs.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addressed Lucas comment.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* store: Fixed critical bug, when certain not-existing value queried was causing "invalid size" error. (#2393)

Reason why we could not reproduce it locally was that for most of non-existing value
we were lucky that buffer was still long enough and we could read and decode some (malformed) variadic type.
For certain rare cases, buffer was not long enough.

Fixed and spotted thanks to amazing @mkabischev!

* Added more regression tests for binary header.

Without the fix it fails with:
```
            header_test.go:154: header_test.go:154:

                	exp: range not found

                	got: get postings offset entry: invalid size
```

Signed-off-by: Bartlomiej Plotka <[email protected]>

* VERSION: cut v0.12.0-rc.1 (#2396)

Signed-off-by: Lucas Servén Marín <[email protected]>

* mixin: Change critical rule alert to be symtom based (#2398)

This change makes the critical (typically paging) alert more symptom
based, rather than observing data written to disk. Additionally after
this change the alert will only fire if there are actually rules loaded.

Additionally to no rules loaded the previous alert was also prone to
rules that legitimately are not writing data.

Signed-off-by: Frederic Branczyk <[email protected]>

* scripts: Added grpcurl script useful for Thanos debugging. (#2403)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* bucket docs: fix "thanos downsample" remnant (#2409)

and follow formatting of the other bucket commands

Signed-off-by: John Belmonte <[email protected]>

* docs: Added Thanos Go style guide and some development tips. (#2359)

* docs: Added Thanos Go style guide and some development tips.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addressed comments; added TOC and image.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Added more rules.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Grammarly fixes!

Signed-off-by: Bartlomiej Plotka <[email protected]>

* docs: Fixed table formatting for coding style guide. (#2421)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Added extra check for sorting time Duration and int strings (#2416)

Signed-off-by: kadern0 <[email protected]>

* docs: Added minor note to single rule. (#2422)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed TOC. (#2424)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* store dashboard: fix gRPC streamed detail panels (#2426)

Fixes #2425

Signed-off-by: John Belmonte <[email protected]>

* use bytes unit where appropriate on grafana dashboards (#2423)

Signed-off-by: John Belmonte <[email protected]>

* bucket verify: document that compactor should be disabled (#2418)

Signed-off-by: John Belmonte <[email protected]>

* docs: Fixed typo in coding guide. (#2427)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Added Marco as Thanos Maintainer (#2428)

Also, reordered list alphabetically.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* store: proxy: fix queries never timing out bug (#2411)

* store: proxy: add test for deadlocking problem

Signed-off-by: Giedrius Statkevičius <[email protected]>

* store: proxy: add fix for timeouts

Checking here if the series context has ended is the correct fix here.
We want to check it because if any of the other Series() calls error out
then the context is canceled. So, it is equal to checking for errors
"downstream", in `mergedSeriesSet`.

Also, `handleErr()` here is the correct function to use because in such
a case we want to set `s.err` -- if `io.EOF` still hasn't been received
then it means that StoreAPI still has some data that it wants to send
but hasn't yet.

With this, the previously added test passes.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* docs: fixed typo in coding style guide (#2431)

Signed-off-by: Stephan Kirsten <[email protected]>

* docs/release-process: make shell command copyable (#2433)

In general, I think it is easier for users of guides when shell commands
are listed without a preceeding `$`, otherwise the commands cannot be
directly copied and pasted into a terminal.

Signed-off-by: Lucas Servén Marín <[email protected]>

* docs/contributing: clean up style guide grammar (#2432)

This commit makes some small grammar fixes to the coding style
guide.

Signed-off-by: Lucas Servén Marín <[email protected]>

* cut v0.12.0 (#2437)

Signed-off-by: Lucas Servén Marín <[email protected]>

* .circleci: use consistent ci image tags (#2440)

We were not using the latest thanos-ci image tag for every part of the
CI pipeline: we were using 0.3.0 for tests but 0.2.0 for all builds.

Signed-off-by: Lucas Servén Marín <[email protected]>

* CHANGELOG.md: fix changelog

The changelog in the release-0.12 branch is correct, but somewhere in
the merge back into master, the changelog was mangled. This puts the
fixes in their correct places.

Signed-off-by: Lucas Servén Marín <[email protected]>

* store: proxy: fix queries never timing out bug (#2411) (#2443)

* store: proxy: add test for deadlocking problem

Signed-off-by: Giedrius Statkevičius <[email protected]>

* store: proxy: add fix for timeouts

Checking here if the series context has ended is the correct fix here.
We want to check it because if any of the other Series() calls error out
then the context is canceled. So, it is equal to checking for errors
"downstream", in `mergedSeriesSet`.

Also, `handleErr()` here is the correct function to use because in such
a case we want to set `s.err` -- if `io.EOF` still hasn't been received
then it means that StoreAPI still has some data that it wants to send
but hasn't yet.

With this, the previously added test passes.

Signed-off-by: Giedrius Statkevičius <[email protected]>

Co-authored-by: Giedrius Statkevičius <[email protected]>

* proposal: Added proposal for new Thanos component: Thanos Frontend. (#2434)

* proposal: Added proposal for new Thanos component: Thanos Frontend.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Added more rationales for separate binary.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addressed Marco comments.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addressed lucas comments.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Changed to approved.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Moved to query-frontend command.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed memcached client metrics initialization (#2446)

Signed-off-by: Marco Pracucci <[email protected]>

* store: Added regex-set optimization to ExpandedPostings (#2450)

* Added regex-set optimization to ExpandedPostings

Signed-off-by: Peter Štibraný <[email protected]>

* Fixed capitalization.

Signed-off-by: Peter Štibraný <[email protected]>

* CHANGELOG.md

Signed-off-by: Peter Štibraný <[email protected]>

* Removed unnecessary change.

Signed-off-by: Peter Štibraný <[email protected]>

* Remove whitespace

Signed-off-by: Peter Štibraný <[email protected]>

* Use testutil instead of testify.

Signed-off-by: Peter Štibraný <[email protected]>

* Added copyright header, from original Prometheus querier.go

Signed-off-by: Peter Štibraný <[email protected]>

* Use Thanos copyright header. :facepalm:

Signed-off-by: Peter Štibraný <[email protected]>

* Added · at the end of the sentence. :exploding_head:.

I will randomly add emojis and GitHub emoji markup to commit messages that fix frustrating checks like this one. And intentionally not break the line. Let's see how lint deals with that! Ha.

Signed-off-by: Peter Štibraný <[email protected]>

* docs/contributing: use Before for IsExpired (#2456)

Signed-off-by: Davor Kapsa <[email protected]>

* cmd/thanos: clean gosimple S1039 (#2464)

Signed-off-by: Davor Kapsa <[email protected]>

* docs: Update CONTRIBUTING.md with DCO (#2465)

* docs: Update CONTRIBUTING.md with DCO

Signed-off-by: ranjithkumar007 <[email protected]>

* Update CONTRIBUTING.md

Co-Authored-By: Bartlomiej Plotka <[email protected]>
Signed-off-by: ranjithkumar007 <[email protected]>

Co-authored-by: Bartlomiej Plotka <[email protected]>

* Added tests to reproduce #2459. (#2462)

Related to: https://github.com/thanos-io/thanos/issues/2459

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Added a page for documenting beginner issues (#2461)

* Added some documentation for beginner issues

Signed-off-by: Yash <[email protected]>

* Edited some lines

Signed-off-by: Yash <[email protected]>

* Update docs/operating/troubleshooting.md

Co-Authored-By: Bartlomiej Plotka <[email protected]>
Signed-off-by: Yash <[email protected]>

Co-authored-by: Bartlomiej Plotka <[email protected]>

* pkg/block/fetcher: fix concurrent map usage (#2474)

Fixes: #2471

This commit fixes an issue where multiple goroutines in the block
fetcher filtering were concurrently accessing the same map. The
goroutines were concurrently writing AND reading to the shared metas
map. This commit guards this concurrent access by giving the
DeduplicateFilter struct a mutex.

Signed-off-by: Lucas Servén Marín <[email protected]>

* Reverted addition of deletion mark for partial uploads. (#2472)

Fixes https://github.com/thanos-io/thanos/issues/2459 (quick fix).

This keeps the logic from the 0.11.0 which was good enough.

Some improvement for future: https://github.com/thanos-io/thanos/issues/2470

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Remove optimizations for label=~".*" and label!~".*". (#2475)

* Remove optimizations for label=~".*" and label!~".*".

They are not correct.

Signed-off-by: Peter Štibraný <[email protected]>

* CHANGELOG.md

Signed-off-by: Peter Štibraný <[email protected]>

* cut v0.12.1 (#2476)

Signed-off-by: Lucas Servén Marín <[email protected]>

* fix thanos web route prefix register twice (#2489)

Signed-off-by: yeya24 <[email protected]>
Signed-off-by: Lucas Servén Marín <[email protected]>

Co-authored-by: yeya24 <[email protected]>

* Do not lock DNS Provider.Address() while Resolve() is running (#2492)

Signed-off-by: Marco Pracucci <[email protected]>

* Compact: Update compact documentation to better clarify dedupeReplicaLabels. (#2481)

* Update compact documentation to better clarify dedupeReplicaLabels.

Signed-off-by: Johnathan Falk <[email protected]>

* Fix capitalization.

Signed-off-by: Johnathan Falk <[email protected]>

* Gracefully handle additional oneof fields in SeriesResponse (#2501)

* Gracefully handle additional oneof fields in SeriesResponse

Signed-off-by: Marco Pracucci <[email protected]>

* Removed unnecessary continue

Signed-off-by: Marco Pracucci <[email protected]>

* Updated CHANGELOG

Signed-off-by: Marco Pracucci <[email protected]>

* fix typo (#2509)

Signed-off-by: arthur yang <[email protected]>

* Adjust memcached operation buckets (#2504)

Signed-off-by: Kemal Akkoyun <[email protected]>

* pkg/query: remove obsolete 'thanos_store_node_info' metric (#2505)

Signed-off-by: Simon Pasquier <[email protected]>

* Add Community information (#2510)

* Add Community information

Signed-off-by: Povilas Versockas <[email protected]>

* Fixes after review

Signed-off-by: Povilas Versockas <[email protected]>

* Move to contributing menu

Signed-off-by: Povilas Versockas <[email protected]>

* Remove incompleteView field from fetcher response. (#2455)

Signed-off-by: Peter Štibraný <[email protected]>

* Added hints support to store protobuf (#2502)

* Added hints support to store protobuf

Signed-off-by: Marco Pracucci <[email protected]>

* Updated CHANGELOG

Signed-off-by: Marco Pracucci <[email protected]>

* Reworded hints doc

Signed-off-by: Marco Pracucci <[email protected]>

* Removed hints_enabled from SeriesRequest

Signed-off-by: Marco Pracucci <[email protected]>

* Remove spurious newline after rebase

Signed-off-by: Marco Pracucci <[email protected]>

* Leveraging docker layer caching (#2508)

Signed-off-by: ankitjain28may <[email protected]>

* add gofmt -s step to makefile and golangci (#2463)

* gofmt -s files

Signed-off-by: Davor Kapsa <[email protected]>

* golangci: add gofmt to linters

Signed-off-by: Davor Kapsa <[email protected]>

* makefile: add gofmt to format

Signed-off-by: Davor Kapsa <[email protected]>

* Update coding-style-guide.md (#2520)

make `doSomething` a function call.

Signed-off-by: Halil Kaskavalci <[email protected]>

* Let's be more nicer on stale things (: (#2517)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* docs/proposals/202003_thanos_rules_federation: initial commit (#2263)

Signed-off-by: Sergiusz Urbaniak <[email protected]>

* cmd: Moved all no-service commands under new tools subcommand. (#2513)

This will allow better extensibility for future for non-bucket related tools we plan to add.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Added hints support to BucketStore.Series() (#2516)

* Added hints support to BucketStore.Series()

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed goimport grouping

Signed-off-by: Marco Pracucci <[email protected]>

* Added missing copyright

Signed-off-by: Marco Pracucci <[email protected]>

* Addressed review comments

Signed-off-by: Marco Pracucci <[email protected]>

* Exclude zoom.us from liche (because zoom.us response headers are over 4KB)

Signed-off-by: Marco Pracucci <[email protected]>

* update uswitch logo and branding (#2529)

Signed-off-by: Joseph-Irving <[email protected]>

* *: add metrics to the reloader package (#2521)

Signed-off-by: Simon Pasquier <[email protected]>

* Added LocalStore and realistic data for querier counter reset bug. (#2522) (#2538)

* Added LocalStore and realistic data for querier counter reset bug.

Tries to reproduces: https://github.com/thanos-io/thanos/issues/2401

I would still merge as it is a great test, and allows us to quickly
check data provided by Ben.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed tsdbstore required component type.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed ineffectual set.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed liche.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed unknown store issue.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* docs: fixed broken links in documentation (#2540)

* fix tiny typo

Signed-off-by: Dan Potepa <[email protected]>

* fix link to example manifest files

Signed-off-by: Dan Potepa <[email protected]>

* fixed some broken links

Signed-off-by: Dan Potepa <[email protected]>

* Clear duplicateIDs at the beginning of Filter. (#2544)

* Clear duplicateIDs at the beginning of Filter.

Signed-off-by: Peter Štibraný <[email protected]>

* CHANGELOG.md

Signed-off-by: Peter Štibraný <[email protected]>

* Address review feedback.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix whitespace noise.

Signed-off-by: Peter Štibraný <[email protected]>

* :whale: :neckbeard: :kick_scooter:

Signed-off-by: Peter Štibraný <[email protected]>

* cmd: rule: do not wrap reload endpoint with prefix twice (#2533)

* cmd: rule: do not wrap reload endpoint with '/'

Do not wrap the router with `/` on the `/-/reload` endpoint. Otherwise,
it is inaccessible when no prefix has been specified by the user.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* CHANGELOG: update

Signed-off-by: Giedrius Statkevičius <[email protected]>

* e2e: rule: add test for reloading rules via /-/reload

Add a test-case to the e2e tests for testing whether reloading rules via
/-/reload works.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* VERSION: cut release v0.12.2 (#2545)

Signed-off-by: Lucas Servén Marín <[email protected]>

* ui: bump jQuery version to v3.5.0 (#2549)

Signed-off-by: Prem Kumar <[email protected]>

* Bumped minio-go library to v6.0.53 (#2536)

* Bumped minio-go library to v6.0.53

Signed-off-by: alicek106 <[email protected]>

* Updated CHANGELOG with PR

Signed-off-by: alicek106 <[email protected]>

* Add deleteSeries skeleton to return bad request (#2530)

Signed-off-by: darshanime <[email protected]>

* Revert "Add deleteSeries skeleton to return bad request (#2530)" (#2551)

This reverts commit d0bcbff8375b6384292533ffa84b6408b85b0acb.

* Fixed the timezone url (#2553)

Signed-off-by: Yash <[email protected]>

* Updated to golang v1.14.2 (#2194)

* Update golang:1.14.2

Signed-off-by: Raúl Naveiras <[email protected]>

* Update thanos-ci:go1.14.2-node

It requires a manual process to generate and push this container.

```
make docker-ci DOCKER_CI_TAG=go1.14.2-node
```

Signed-off-by: Raúl Naveiras <[email protected]>

* Update golang:1.14.2 for github actions

Signed-off-by: Raúl Naveiras <[email protected]>

* Update CHANGELOG

Signed-off-by: Raúl Naveiras <[email protected]>

* Fix yaml indentation

Signed-off-by: Raúl Naveiras <[email protected]>

* Added Bartek as next release shepherd. (#2556)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* receive: Add support for TSDB per tenant (#2012)

* receive: Add support for TSDB per tenant

Signed-off-by: Frederic Branczyk <[email protected]>

* pkg/store: Merge SeriesSets of multiple TSDB stores

This is required as the Series gRPC method of the StoreAPI requires the
Series returned to be sorted.

Signed-off-by: Frederic Branczyk <[email protected]>

* pkg/receive: Add multitsdb shipper support

Signed-off-by: Frederic Branczyk <[email protected]>

* Address comments

Signed-off-by: Frederic Branczyk <[email protected]>

* Add more comments on types and functions

Signed-off-by: Frederic Branczyk <[email protected]>

* pkg/store/multitsdb.go: Remove unused struct field

Signed-off-by: Frederic Branczyk <[email protected]>

* pkg/receive/multitsdb.go: Remove unused Close method

TSDBs are implicitly closed by flushing the database, which is ensured
on shutdown, hence there is no need to have the explicit close method.

Signed-off-by: Frederic Branczyk <[email protected]>

* pkg/store/multitsdb.go: Make errors and warnings tenant aware

Signed-off-by: Frederic Branczyk <[email protected]>

* pkg/store/multitsdb.go: Consistent tenant aware errors and warnings

Signed-off-by: Frederic Branczyk <[email protected]>

* cmd/thanos/receive.go: Auto migrate legacy to multitsdb disk layout (#2557)

Signed-off-by: Frederic Branczyk <[email protected]>

* Merge 0.12 into master (#2559)

* Clear duplicateIDs at the beginning of Filter. (#2544)

* Clear duplicateIDs at the beginning of Filter.

Signed-off-by: Peter Štibraný <[email protected]>

* CHANGELOG.md

Signed-off-by: Peter Štibraný <[email protected]>

* Address review feedback.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix whitespace noise.

Signed-off-by: Peter Štibraný <[email protected]>

* :whale: :neckbeard: :kick_scooter:

Signed-off-by: Peter Štibraný <[email protected]>

* cmd: rule: do not wrap reload endpoint with prefix twice (#2533)

* cmd: rule: do not wrap reload endpoint with '/'

Do not wrap the router with `/` on the `/-/reload` endpoint. Otherwise,
it is inaccessible when no prefix has been specified by the user.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* CHANGELOG: update

Signed-off-by: Giedrius Statkevičius <[email protected]>

* e2e: rule: add test for reloading rules via /-/reload

Add a test-case to the e2e tests for testing whether reloading rules via
/-/reload works.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* VERSION: cut release v0.12.2 (#2545)

Signed-off-by: Lucas Servén Marín <[email protected]>

Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>

* Revert "Merge 0.12 into master (#2559)" (#2560)

This reverts commit 003d245282bd683826304d25d1719c39d7401629.

Signed-off-by: Lucas Servén Marín <[email protected]>

* querier: Added regressions tests for counter missed reset bug. (#2528)

* querier: Added regressions tests for counter missed bug.

PR with just tests, not fix yet.

Reproduces: https://github.com/thanos-io/thanos/issues/2401

* Added regressions tests for CounterSeriesIterator; Simplified aggregators.
* Fixes edge dedup cases for Next and added tests for deduplication.
* Refactored downsampling tests, added more realistic cases.
* Added check for duplicated chunks during downsampling.
* Removed duplicates for efficiency on promSeriesSet.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Addressed Giedrius comments.

Signed-off-by: Bartlomiej Plotka <[email protected]>

* receive: Use read locks where possible to read tenants (#2563)

Signed-off-by: Frederic Branczyk <[email protected]>

* receive: Block WAL replay when starting receive component (#2564)

Signed-off-by: Frederic Branczyk <[email protected]>

* docs: Added mention about thanos-remote-read integration. (#2566)

Thanks to G-Research as per: https://cloud-native.slack.com/archives/CL25937SP/p1588687640060200?thread_ts=1588167992.463800&cid=CL25937SP

Signed-off-by: Bartlomiej Plotka <[email protected]>

* query/storeset: do not close the connection if strict mode enabled (#2568)

* query/storeset: do not close the connection if strict mode enabled

Do not close the gRPC connection if establishing a connection has
succeeded but we have failed to get response to a Info() call. Without
this and with strict mode in such a case, we will always keep around a
closed connection that won't work anymore unless the whole Thanos Query
process will be restarted.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* query/storeset: add test, add CHANGELOG item

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Update gitignore with integration tests directory (#2552)

Signed-off-by: Ranjith Kumar <[email protected]>

* Fixed thanos_compact_garbage_collected_blocks_total metric help (#2572)

Signed-off-by: Marco Pracucci <[email protected]>

* Chunks caching at bucket level (#2532)

* Added generic cache interface.

Signed-off-by: Peter Štibraný <[email protected]>

* Added memcached implementation of Cache.

Signed-off-by: Peter Štibraný <[email protected]>

* Chunks-caching bucket.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix sentences

Signed-off-by: Peter Štibraný <[email protected]>

* Fix sentences

Signed-off-by: Peter Štibraný <[email protected]>

* Fix sentences

Signed-off-by: Peter Štibraný <[email protected]>

* Rename config objects.

Signed-off-by: Peter Štibraný <[email protected]>

* Review feedback.

Signed-off-by: Peter Štibraný <[email protected]>

* Review feedback.

Signed-off-by: Peter Štibraný <[email protected]>

* Added metrics for object size.

Signed-off-by: Peter Štibraný <[email protected]>

* Added requested chunk bytes metric.

Signed-off-by: Peter Štibraný <[email protected]>

* Caching bucket docs.

Signed-off-by: Peter Štibraný <[email protected]>

* Fixed tests.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix test.

Signed-off-by: Peter Štibraný <[email protected]>

* Update docs/components/store.md
Update pkg/store/cache/caching_bucket.go

Co-authored-by: Marco Pracucci <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>

* Dots

Signed-off-by: Peter Štibraný <[email protected]>

* Always set lastBlockOffset.

Signed-off-by: Peter Štibraný <[email protected]>

* Merged cached metric into fetched metric, added labels.

Signed-off-by: Peter Štibraný <[email protected]>

* Added CHANGELOG.md entry

Signed-off-by: Peter Štibraný <[email protected]>

* Reworded help for thanos_store_bucket_cache_fetched_chunk_bytes_total

Signed-off-by: Peter Štibraný <[email protected]>

* Added tracing around getRangeChunkFile method.

Signed-off-by: Peter Štibraný <[email protected]>

* Updated CHANGELOG.md

Signed-off-by: Peter Štibraný <[email protected]>

* Options

Signed-off-by: Peter Štibraný <[email protected]>

* Fix parameter name. (store. got dropped by accident)

Signed-off-by: Peter Štibraný <[email protected]>

* Use embedded Bucket

Signed-off-by: Peter Štibraný <[email protected]>

* Added comments.

Signed-off-by: Peter Štibraný <[email protected]>

* Fixed comment.

Signed-off-by: Peter Štibraný <[email protected]>

* Hide store.caching-bucket.config flags.

Signed-off-by: Peter Štibraný <[email protected]>

* Renamed block to subrange.

Signed-off-by: Peter Štibraný <[email protected]>

* Renamed block to subrange.

Signed-off-by: Peter Štibraný <[email protected]>

* Header

Signed-off-by: Peter Štibraný <[email protected]>

* Added TODO

Signed-off-by: Peter Štibraný <[email protected]>

* Removed TODO, in favor of creating issue.

Signed-off-by: Peter Štibraný <[email protected]>

* Use NopCloser.

Signed-off-by: Peter Štibraný <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>

* Reword block deletion comments and logs in compactor (#2574)

Signed-off-by: Marco Pracucci <[email protected]>

* Coding Style typos and a few grammar improvements (#2448)

Changes mainly made for consistency, like section headers being in imperative tense: "do this thing" instead of "this is the thing"

Signed-off-by: Stephen Weber <[email protected]>

* quickstart: fix bucket web after recent changes (#2580)

The subcommand is called now `tools bucket web` after the recent
changes.

Without this, the quickstart script outputs:
```
Error parsing commandline arguments: expected command but got "bucket"
thanos: error: expected command but got "bucket"
```

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Fix typo on reload function (#2584)

Signed-off-by: Joel Bastos <[email protected]>

* Refactor of commands and flag parsing for sidecar (#2267)

Signed-off-by: Philip Gough <[email protected]>

* ui: add new React UI from Prometheus (#2412)

* ui: add React UI from upstream Prometheus

Signed-off-by: Adrien Fillon <[email protected]>

* ui: incorporate new changes from Prometheus React UI

Signed-off-by: Prem Kumar <[email protected]>

* ui: adapted the React UI to Thanos

Signed-off-by: Prem Kumar <[email protected]>

Co-authored-by: Adrien Fillon <[email protected]>
Co-authored-by: Giedrius Statkevičius <[email protected]>

* Fix minor typos (#2586)

Signed-off-by: Pierre-Yves Aillet <[email protected]>

* react: update deps (#2589)

* react: graph/panel: revert changes temporarily

Signed-off-by: Giedrius Statkevičius <[email protected]>

* react-app: apply 'Update React vendoring'

Add the commit
https://github.com/prometheus/prometheus/commit/65a19421a42c69e16241eec24c66b98e4c8fa5da
via a 3-way merge.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* ui/react-app: update yarn deps

Should fix security warnings. Ported from
https://github.com/prometheus/prometheus/commit/24ecae995691dabf782a6b4a7464f7aab561b554.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* ui: update bindata

Signed-off-by: Giedrius Statkevičius <[email protected]>

* Makefile: remove --coverage from test run (#2591)

Found out that there is some weird interaction between `jest --coverage`
and `babel-plugin-istanbul`. Maybe related to:
https://github.com/facebook/jest/issues/6827.

From my testing, removing `--coverage` makes this work again. Probably
worth investigating in the future why that happens.

Also, this is really not needed during CI because we do not use the
coverage data anywhere anyway.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* ci: use GitHub Actions to test React UI (#2595)

* ci: test React UI using GitHub actions

Signed-off-by: Prem Kumar <[email protected]>

* ci: remove react-app-test from CircleCI as we now use GH Actions

Signed-off-by: Prem Kumar <[email protected]>

* pkg/ui: bump jQuery to 3.5.0 (#2597)

Signed-off-by: Lucas Servén Marín <[email protected]>

* Added receiver multidb unit tests for basic cases. (#2593)

Unfortunately, all passes. ):

Signed-off-by: Bartlomiej Plotka <[email protected]>

* Fixed make docs; Updated last disprepancies. (#2611)

Signed-off-by: Bartlomiej Plotka <[email protected]>

* mixin: Alert on receive not uploading recent data (#2612)

Signed-off-by: Frederic Branczyk <[email protected]>

* Metadata caching in bucket (#2579)

* Added caching for Iter.

Signed-off-by: Peter Štibraný <[email protected]>

* Added cache for Exists call for meta-files.

Signed-off-by: Peter Štibraný <[email protected]>

* Added cache for reading block metadata files.

Signed-off-by: Peter Štibraný <[email protected]>

* Make caching bucket configurable with different caches for different type of objects.

Signed-off-by: Peter Štibraný <[email protected]>

* Fixed tests.

Signed-off-by: Peter Štibraný <[email protected]>

* Added caching for ObjectSize. Enabled caching of index.

Signed-off-by: Peter Štibraný <[email protected]>

* Lint feedback.

Signed-off-by: Peter Štibraný <[email protected]>

* Use single set of metrics for all operations.

Signed-off-by: Peter Štibraný <[email protected]>

* Constants.

Signed-off-by: Peter Štibraný <[email protected]>

* Use operation specific config. Generic configuration is only for user.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix typo, make lint happy.

Signed-off-by: Peter Štibraný <[email protected]>

* Simplify constants.

Signed-off-by: Peter Štibraný <[email protected]>

* Simplify caching configuration.

Signed-off-by: Peter Štibraný <[email protected]>

* Refactor cache configuration.

Configuration is now passed to the cache when created.

Signed-off-by: Peter Štibraný <[email protected]>

* Review feedback.

Signed-off-by: Peter Štibraný <[email protected]>

* Fix operationRequests and operationHits for getRange.

Signed-off-by: Peter Štibraný <[email protected]>

* Make codec for Iter results configurable.

Signed-off-by: Peter Štibraný <[email protected]>

* Added header.

Signed-off-by: Peter Štibraný <[email protected]>

* Renamed "dir" config to "blocks-iter".

Signed-off-by: Peter Štibraný <[email protected]>

* Bump default values for meta exists/doesntExist ttls.

Signed-off-by: Peter Štibraný <[email protected]>

* Removed example how cache could be configured for index.

Signed-off-by: Peter Štibraný <[email protected]>

* Address review feedback.

Signed-off-by: Peter Štibraný <[email protected]>

* Get now implements streaming reader, and buffers object in memory.

Signed-off-by: Peter Štibraný <[email protected]>

* Added test for partial read.

Signed-off-by: Peter Štibraný <[email protected]>

* Removed unused function.

Signed-off-by: Peter Štibraný <[email protected]>

* Updated the help message for --data-di…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants