Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(memcached): panic on send on closed channel. #7817

Merged
merged 3 commits into from
Dec 1, 2022

Conversation

kavirajk
Copy link
Contributor

@kavirajk kavirajk commented Nov 30, 2022

What this PR does / why we need it:
Our memcached client has race between closing inputCh and submitting work to it (writing to channel) by different goroutines.

This PR fix it by having closed channel to track if inputCh is closed and make it visible to other writter go-routines.

Before:

```
loki$ go test -v ./pkg/storage/chunk/cache/... -run TestMemcached_fetchKeysBatched
=== RUN   TestMemcached_fetchKeysBatched
panic: send on closed channel

goroutine 29 [running]:
github.com/grafana/loki/pkg/storage/chunk/cache.(*Memcached).fetchKeysBatched.func1()
        /home/kavirajk/src/loki/pkg/storage/chunk/cache/memcached.go:167 +0x87
created by github.com/grafana/loki/pkg/storage/chunk/cache.(*Memcached).fetchKeysBatched
        /home/kavirajk/src/loki/pkg/storage/chunk/cache/memcached.go:164 +0x165
FAIL    github.com/grafana/loki/pkg/storage/chunk/cache 0.015s
FAIL
```

After:

```
loki$ go test -v ./pkg/storage/chunk/cache/... -run TestMemcached_fetchKeysBatched
=== RUN   TestMemcached_fetchKeysBatched
--- PASS: TestMemcached_fetchKeysBatched (0.00s)
PASS
ok      github.com/grafana/loki/pkg/storage/chunk/cache 0.013s
```

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Found this panic on lots of internal Loki prod clusters.

Checklist

  • Tests updated
  • CHANGELOG.md updated

Our memcached client has race between closing `inputCh` and submitting work to it (writing to channel) by different goroutines.

This PR fix it by having `closed` channel to track if `inputCh` is closed and make it visible to other writter go-routines.

Before:
```
=== RUN   TestMemcached_fetchKeysBatched
panic: send on closed channel

goroutine 29 [running]:
github.com/grafana/loki/pkg/storage/chunk/cache.(*Memcached).fetchKeysBatched.func1()
	/home/kavirajk/src/loki/pkg/storage/chunk/cache/memcached.go:167 +0x87
created by github.com/grafana/loki/pkg/storage/chunk/cache.(*Memcached).fetchKeysBatched
	/home/kavirajk/src/loki/pkg/storage/chunk/cache/memcached.go:164 +0x165
FAIL	github.com/grafana/loki/pkg/storage/chunk/cache	0.015s
FAIL
```

After:
```
loki$ go test -v ./pkg/storage/chunk/cache/... -run TestMemcached_fetchKeysBatched
=== RUN   TestMemcached_fetchKeysBatched
--- PASS: TestMemcached_fetchKeysBatched (0.00s)
PASS
ok  	github.com/grafana/loki/pkg/storage/chunk/cache	0.013s
```

Signed-off-by: Kaviraj <[email protected]>
Signed-off-by: Kaviraj <[email protected]>
@kavirajk kavirajk force-pushed the kavirajk/fix-panic-memcache-send-on-closed-chan branch from b21bfc0 to 5ae4ab3 Compare November 30, 2022 20:37
Signed-off-by: Kaviraj <[email protected]>
@kavirajk kavirajk marked this pull request as ready for review November 30, 2022 20:40
@kavirajk kavirajk requested a review from a team as a code owner November 30, 2022 20:40
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find @kavirajk!

@dannykopping dannykopping merged commit 4efdc13 into main Dec 1, 2022
@dannykopping dannykopping deleted the kavirajk/fix-panic-memcache-send-on-closed-chan branch December 1, 2022 10:40
Abuelodelanada pushed a commit to canonical/loki that referenced this pull request Dec 1, 2022
kavirajk added a commit that referenced this pull request Dec 2, 2022
Follow up to #7817

This PR uses `default` branch instead of `inputCh <- work` to make sure we are writing to closed `inputCh`.

Signed-off-by: Kaviraj <[email protected]>
kavirajk added a commit that referenced this pull request Dec 2, 2022
)

Signed-off-by: Kaviraj <[email protected]>

**What this PR does / why we need it**:
Follow up to #7817

This PR uses `default` branch instead of `inputCh <- work` to make sure
we are writing to closed `inputCh`.

Gist is, `default` run [only when none of the branch is
ready](https://go.dev/tour/concurrency/6). which makes more sense rather
than to have `inputCh <- work` (writing to closed channel on the branch
condition)

These can be explained by these two tiny snippets.
* [with `default`](https://go.dev/play/p/-FspbTZd20I)
* [without `default`](https://go.dev/play/p/Ag4WznOaEq0)

**Which issue(s) this PR fixes**:
Fixes #NA

**Special notes for your reviewer**:
We already have test `TestMemcached_fetchKeysBatched` to catch this
behaviour. In fact this test caught this probabilistic behaviour in some
CI failures. [here ](https://drone.grafana.net/grafana/loki/17853/3/4)
and[ here](https://drone.grafana.net/grafana/loki/17854/3/4)

Thanks @DylanGuedes for catching this CI failures.

**Checklist**

Signed-off-by: Kaviraj <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants