Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store/bucket: fix data race #6575

Merged
merged 2 commits into from
Aug 2, 2023
Merged

store/bucket: fix data race #6575

merged 2 commits into from
Aug 2, 2023

Conversation

GiedriusS
Copy link
Member

@GiedriusS GiedriusS commented Aug 1, 2023

Add a fix for issue #6545. Unfortunately, we don't have tests with -race at the moment but I tested this locally. Somehow the data race introduces inconsistencies in the retrieved data. I couldn't reproduce anymore after this fix.

@GiedriusS GiedriusS changed the title store: add issue 6545 repro store/bucket: fix data race Aug 1, 2023
@GiedriusS GiedriusS requested a review from yeya24 August 1, 2023 14:38
@GiedriusS GiedriusS marked this pull request as ready for review August 1, 2023 14:39
@pull-request-size pull-request-size bot added size/M and removed size/S labels Aug 1, 2023
The matchers slice is now sorted in each call but that introduces a data
race because the slice is shared between all calls. Do the sorting once
on the outermost function.

Signed-off-by: Giedrius Statkevičius <[email protected]>
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @GiedriusS, can you please explain more, why this change fixed the issue? Thanks

@GiedriusS
Copy link
Member Author

Hi @GiedriusS, can you please explain more, why this change fixed the issue? Thanks

That would require some digging but from my adhoc tests I cannot reproduce the original problem anymore. I guess I can try to add tests that loop infinitely and see what happens. Maybe this can be done separately? The data race needs to be fixed either way 😄

@yeya24
Copy link
Contributor

yeya24 commented Aug 1, 2023

@GiedriusS I am wondering if this also fixes #6495

@GiedriusS
Copy link
Member Author

:O I forgot about that issue. I think this also should fix it because in my tests I was pressing execute many times and very rarely the values were completely different even though the query is the same. I'll work on tests tomorrow

@yeya24
Copy link
Contributor

yeya24 commented Aug 1, 2023

Found another race in chunks reader. But that's a small one and a separate issue. I just opened a separate pr #6578

@fpetkovski
Copy link
Contributor

fpetkovski commented Aug 1, 2023

cc @rabenhorst potentially a fix for #6495.

@yeya24
Copy link
Contributor

yeya24 commented Aug 1, 2023

Yeah I think I can reproduce the data race using the test updated yeya24@be85c85 with multiple matchers.

Data race is at https://github.com/thanos-io/thanos/blob/main/pkg/store/bucket.go#L2208 and other place we need to read the matchers.
I think this fix is valid and tested it fixed the data race.

goos: darwin
goarch: arm64
pkg: github.com/thanos-io/thanos/pkg/store
BenchmarkBlockSeries/concurrency:_16-10         	==================
WARNING: DATA RACE
Read at 0x00c0001a6088 by goroutine 94:
  github.com/thanos-io/thanos/pkg/store.(*bucketIndexReader).ExpandedPostings.func1()
      /Users/benye/hub/gowork/thanos/pkg/store/bucket.go:2208 +0x58
  sort.insertionSort_func()
      /usr/local/go/src/sort/zsortfunc.go:12 +0xd0
  sort.pdqsort_func()
      /usr/local/go/src/sort/zsortfunc.go:73 +0x2d0
  sort.Slice()
      /usr/local/go/src/sort/slice.go:26 +0x1a4
  github.com/thanos-io/thanos/pkg/store.(*bucketIndexReader).ExpandedPostings()
      /Users/benye/hub/gowork/thanos/pkg/store/bucket.go:2207 +0xb8
  github.com/thanos-io/thanos/pkg/store.(*blockSeriesClient).ExpandPostings()
      /Users/benye/hub/gowork/thanos/pkg/store/bucket.go:969 +0xa0
  github.com/thanos-io/thanos/pkg/store.benchmarkBlockSeriesWithConcurrency.func1()
      /Users/benye/hub/gowork/thanos/pkg/store/bucket_test.go:2585 +0x28c

Previous write at 0x00c0001a6088 by goroutine 100:
  internal/reflectlite.Swapper.func3()
      /usr/local/go/src/internal/reflectlite/swapper.go:42 +0xa0
  sort.insertionSort_func()
      /usr/local/go/src/sort/zsortfunc.go:13 +0x88
  sort.pdqsort_func()
      /usr/local/go/src/sort/zsortfunc.go:73 +0x2d0
  sort.Slice()
      /usr/local/go/src/sort/slice.go:26 +0x1a4
  github.com/thanos-io/thanos/pkg/store.(*bucketIndexReader).ExpandedPostings()
      /Users/benye/hub/gowork/thanos/pkg/store/bucket.go:2207 +0xb8
  github.com/thanos-io/thanos/pkg/store.(*blockSeriesClient).ExpandPostings()
      /Users/benye/hub/gowork/thanos/pkg/store/bucket.go:969 +0xa0
  github.com/thanos-io/thanos/pkg/store.benchmarkBlockSeriesWithConcurrency.func1()
      /Users/benye/hub/gowork/thanos/pkg/store/bucket_test.go:2585 +0x28c

yeya24
yeya24 previously approved these changes Aug 1, 2023
Signed-off-by: Giedrius Statkevičius <[email protected]>
@GiedriusS
Copy link
Member Author

Added a test for this exact race. The test fails on the main branch immediately almost:

/bin/go test -v -race -timeout 10m -run ^TestExpandedPostingsRace$ github.com/thanos-io/thanos/pkg/store

However, I deployed this prod and #6545 still occurs. I couldn't reproduce #6495 with this.

I also tried writing this test using BucketStore but with it, I wasn't able to reproduce the race :/ probably race detector slows down the code so much that the first sorting happens before all of the others.

@yeya24
Copy link
Contributor

yeya24 commented Aug 2, 2023

@GiedriusS I will merge this pr first to fix the data race.

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yeya24 yeya24 merged commit eb80318 into main Aug 2, 2023
@yeya24 yeya24 deleted the add_repro branch August 2, 2023 16:57
GiedriusS added a commit to vinted/thanos that referenced this pull request Aug 10, 2023
* store/bucket: remove sort.Slice data race

The matchers slice is now sorted in each call but that introduces a data
race because the slice is shared between all calls. Do the sorting once
on the outermost function.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* store: add test for ExpandPostings() race

Signed-off-by: Giedrius Statkevičius <[email protected]>

---------

Signed-off-by: Giedrius Statkevičius <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants