-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic: Value has already been added to the cache #1701
Comments
Thanks for the report @growler, and sorry you're hitting that panic. Would you be able to share the code you're running when you hit this? I tried to put together a reproducer with the information you provided, but I wasn't able to reproduce the panic (on macOS). If you can't share the code, it would be good to know:
|
The other thing that would be good to know is if you're doing anything concurrently to the DB. For instance, iterators are not goroutine-safe, but use of multiple iterators in dedicated goroutines is safe. |
Hello, thank you for your response. I'm migrating an internal tool, a sort of crdt database, which is quite big, and code is a mess a.t.m., so it's hard to make a test case. I will try to produce one over weekend. Otherwise, the workload is quite simple. There are 9 goroutines: 8 workers, taking job from a queue, processing and storing result to database. I use indexedbatch here, there are three sets and one merge (I suspected merge and turned it off as it just counts some statistics, but it still panics). Average record size is 120 bytes, keys are 16-80 bytes. One of the key prefix is a sort of replication log. 1 cleaner wich makes a rangedelete about every ten seconds, removing replicated records from a mentioned log. The code heavily uses pooling a lot, i.e.
but as far as I understand semantics, it's totally safe to reuse slice once Set returns. Options are default, iterator options are default as well. |
Okay, I narrowed it down. It seems the problem is indeed with configuration, which I failed to notice (apparently, configuring pebble.Opts and using it in different files were bad idea) Here's the panicking code. Panic is triggering by levels configuration, if you remove levels, this code works fine. I guess this configuration is wrong for some reason, but panicking is not really obvious.
|
btw, if configured with only a single level with ZstCompression, the code panics even more spectaculary. So I believe ZstdCompression is the primary suspect here. As a side note, does it makes sense to have different compression on different levels? With huge enough workload triggering compression while DB is open, code fails less often, so I managed to measure. At 1 mln of records (each record is a set of 3-5 keys) Zstd adds ~3ms avg latency but gives ~153 bytes per record vs ~190 of Snappy compression, which gives ~20Gb at the target dataset size. So if Zstd is a working option, I would prefer to stick with it. |
I took the code you pasted above and ran it verbatim (pure copy / paste) with $ go version
go version go1.18.2 darwin/amd64
$ uname -a
Darwin crlMBP-C02FR0V6MD6TNTM2.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 Here are my laptop specs. I'm also on an Intel MacBook: Model Name: MacBook Pro
Model Identifier: MacBookPro16,1
Processor Name: 8-Core Intel Core i9
Processor Speed: 2.3 GHz
Number of Processors: 1
Total Number of Cores: 8
L2 Cache (per Core): 256 KB
L3 Cache: 16 MB
Hyper-Threading Technology: Enabled
Memory: 32 GB
System Firmware Version: 1731.100.130.0.0 (iBridge: 19.16.14243.0.0,0)
OS Loader Version: 540.100.7~23
I tried this too, but I don't see the panic. I put my Do you think you could do the same with the minimal reproducer you have (those three files)? Just to make sure we're on the same page. I'm just running something like: $ while true; do date -u; go run .; echo; done
Fri May 20 22:08:09 UTC 2022
first
second
third
Fri May 20 22:08:10 UTC 2022
first
second
third
...
Conceptually this seems fine, though I'm not sure if it's a configuration that we've tested in production. It's probably something we can test in our metamorphic test suite (i.e. mixed compression). I'll open a separate issue to add that. |
I did a |
Here's mine go.sum and go.mod |
Aha! Progress. I noticed that the version of zstd that you're using is a little newer. I bumped that in my version, and it started panic-ing in the same way: $ go get -u github.com/DataDog/zstd
go: upgraded github.com/DataDog/zstd v1.4.5 => v1.5.2
$ diff -u go.mod.old go.mod
--- go.mod.old 2022-05-20 15:28:42.000000000 -0700
+++ go.mod 2022-05-20 15:26:42.000000000 -0700
@@ -5,7 +5,7 @@
require github.com/cockroachdb/pebble v0.0.0-20220415182917-06c9d3be25b3 // crl-release-22.1
require (
- github.com/DataDog/zstd v1.4.5 // indirect
+ github.com/DataDog/zstd v1.5.2 // indirect
github.com/cespare/xxhash/v2 v2.1.1 // indirect
github.com/cockroachdb/errors v1.8.1 // indirect
github.com/cockroachdb/logtags v0.0.0-20190617123548-eb05cc24525f // indirect
$ go mod tidy
$ while true; do date -u; go run .; echo; done
Fri May 20 22:26:50 UTC 2022
first
second
panic: pebble: Value has already been added to the cache: refs=0
goroutine 1 [running]:
github.com/cockroachdb/pebble/internal/cache.(*shard).Set(0xc00019b9f8?, 0x42c7ca5?, 0xc111720?, 0x19f?, 0x147?)
/Users/nickt/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/internal/cache/clockpro.go:136 +0x365
github.com/cockroachdb/pebble/internal/cache.(*Cache).Set(0xc0004fa000?, 0xc111720?, 0xc111740?, 0x149?, 0x14e?)
/Users/nickt/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/internal/cache/clockpro.go:750 +0x46
github.com/cockroachdb/pebble/sstable.(*Reader).readBlock(0xc00057edc0, {0x4078b12?, 0x4800940?}, 0x0, 0x0)
/Users/nickt/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/sstable/reader.go:2365 +0x617
github.com/cockroachdb/pebble/sstable.(*Reader).readIndex(...)
/Users/nickt/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/sstable/reader.go:2231
github.com/cockroachdb/pebble/sstable.(*singleLevelIterator).init(0xc000014380, 0xc00057edc0, {0x0, 0x0, 0x0}, {0x0, 0x0, 0x0}, 0x0)
/Users/nickt/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/sstable/reader.go:269 +0x7e
github.com/cockroachdb/pebble/sstable.(*Reader).NewIterWithBlockPropertyFilters(0xc00057edc0, {0x0, 0x0, 0x0}, {0x0, 0x0, 0x0}, 0x0?)
/Users/nickt/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/sstable/reader.go:2152 +0x14f
github.com/cockroachdb/pebble.(*tableCacheShard).newIters(0xc0004f4460, 0x0?, 0xc0000544d0, 0x0, 0xc00022a108)
/Users/nickt/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/table_cache.go:368 +0x22e
github.com/cockroachdb/pebble.(*tableCacheContainer).newIters(0x0?, 0x0?, 0x0?, 0x0?)
/Users/nickt/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/table_cache.go:117 +0x45
github.com/cockroachdb/pebble.(*levelIter).loadFile(0xc000054480, 0xc000378420, 0x1)
/Users/nickt/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/level_iter.go:346 +0x20f
github.com/cockroachdb/pebble.(*levelIter).First(0xc000054480)
/Users/nickt/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/level_iter.go:498 +0x7c
github.com/cockroachdb/pebble.(*mergingIter).First(0xc000053bb8)
/Users/nickt/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/merging_iter.go:986 +0xd1
github.com/cockroachdb/pebble.(*Iterator).First(0xc000053900)
/Users/nickt/go/pkg/mod/github.com/cockroachdb/pebble@v0.0.0-20220415182917-06c9d3be25b3/iterator.go:1148 +0x108
main.main()
/Users/nickt/Downloads/test/main.go:48 +0x39e
exit status 2 Are you able to try dropping the zstd version down to the version Pebble has ( I will try and build / test Pebble now against the latest version and see if our test suite breaks. Either way, we're probably going to have to dig into why that's causing a panic in the cache - that's a little surprising to me at first glance. |
Bumping the zstd version results in various panics. The following is a minimal reproducer that fails reliably: ```console $ go test -mod=vendor -run TestBytesIterated ./sstable --- FAIL: TestBytesIterated (5.03s) --- FAIL: TestBytesIterated/Zstd (0.00s) panic: pebble: Value has already been added to the cache: refs=0 [recovered] panic: pebble: Value has already been added to the cache: refs=0 goroutine 208 [running]: testing.tRunner.func1.2({0x44b2ac0, 0xc000300100}) /nix/store/0bh8m4vqb7c7n4sn73ka53yn45yj3zd0-go-1.18.2/share/go/src/testing/testing.go:1389 +0x24e testing.tRunner.func1() /nix/store/0bh8m4vqb7c7n4sn73ka53yn45yj3zd0-go-1.18.2/share/go/src/testing/testing.go:1392 +0x39f panic({0x44b2ac0, 0xc000300100}) /nix/store/0bh8m4vqb7c7n4sn73ka53yn45yj3zd0-go-1.18.2/share/go/src/runtime/panic.go:838 +0x207 github.com/cockroachdb/pebble/internal/cache.(*shard).Set(0xc0000d3b68?, 0x4310c25?, 0x600001710040?, 0x1b?, 0x16?) /Users/nickt/Development/pebble/internal/cache/clockpro.go:136 +0x365 github.com/cockroachdb/pebble/internal/cache.(*Cache).Set(0x5772c48?, 0x600001710007?, 0x600001710060?, 0x17?, 0x1c?) /Users/nickt/Development/pebble/internal/cache/clockpro.go:750 +0x46 github.com/cockroachdb/pebble/sstable.(*Reader).readBlock(0xc0000dd8c0, {0x0?, 0x17?}, 0x0, 0xc00019e5f8) /Users/nickt/Development/pebble/sstable/reader.go:2372 +0x617 github.com/cockroachdb/pebble/sstable.(*singleLevelIterator).readBlockWithStats(0xc00019e380, {0x2?, 0x2?}, 0x430699f?) /Users/nickt/Development/pebble/sstable/reader.go:406 +0x2f github.com/cockroachdb/pebble/sstable.(*singleLevelIterator).loadBlock(0xc00019e380) /Users/nickt/Development/pebble/sstable/reader.go:388 +0x22a github.com/cockroachdb/pebble/sstable.(*singleLevelIterator).firstInternal(0xc00019e380) /Users/nickt/Development/pebble/sstable/reader.go:786 +0xe7 github.com/cockroachdb/pebble/sstable.(*singleLevelIterator).First(0x0?) /Users/nickt/Development/pebble/sstable/reader.go:768 +0x27 github.com/cockroachdb/pebble/sstable.(*compactionIterator).First(0xc0003c02a0) /Users/nickt/Development/pebble/sstable/reader.go:1127 +0x58 github.com/cockroachdb/pebble/sstable.testBytesIteratedWithCompression(0xc000511a00, 0x40551ff?, 0x1, {0xc00002c8a0, 0x5, 0x100000001?}, {0xc0000d3f38, 0x5, 0x3351bd6b84e6e?}) /Users/nickt/Development/pebble/sstable/reader_test.go:495 +0x223 github.com/cockroachdb/pebble/sstable.TestBytesIterated.func3(0x0?) /Users/nickt/Development/pebble/sstable/reader_test.go:531 +0x85 testing.tRunner(0xc000511a00, 0xc000088060) /nix/store/0bh8m4vqb7c7n4sn73ka53yn45yj3zd0-go-1.18.2/share/go/src/testing/testing.go:1439 +0x102 created by testing.(*T).Run /nix/store/0bh8m4vqb7c7n4sn73ka53yn45yj3zd0-go-1.18.2/share/go/src/testing/testing.go:1486 +0x35f FAIL github.com/cockroachdb/pebble/sstable 5.245s FAIL ```
Looks like bumping to that version breaks a lot of tests. I put together a minimal reproducer in #1704. Let us know how you get on with the older zstd version. That appears to work without issue. We'll continue to dig into why the newer version isn't playing nicely. |
Yes, downgrading has fixed the panic, thank you! I'm not sure what caused zstd to upgrade. In fact, I used your go.mod and go.sum to make a workspace and then just did a |
Hello,
I'm evaluating pebble and triggered a panic I don't understant yet.
The case is really simple:
The problem seems to be random, sometimes (like 1 in 10 runs) pebble does not panic. I tried latest master and crl-release-22.1.
Also, it seems to happen only at my Intel MacBook, linux (both armv8 and intel) work fine.
Please let me know how could I help to debug it.
crl-release-22.1 (06c9d3b)
b14ad70:
The text was updated successfully, but these errors were encountered: