Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic (unexpected fault address) on db.meta when rollback #262

Closed
hufengwei opened this issue Feb 24, 2021 · 9 comments
Closed

panic (unexpected fault address) on db.meta when rollback #262

hufengwei opened this issue Feb 24, 2021 · 9 comments
Labels
Milestone

Comments

@hufengwei
Copy link

hufengwei commented Feb 24, 2021

Release: v1.3.5

Platform: windows/amd64

Cause:
According to the Call Stack, it went into panic when reading db.meta0 and db.meta1. By analyzing the code, I cannot find any lock when retrieving meta page references. So it seems that this would be the cause if the meta pages were re-mapped at the moment? Correct me if I am wrong, thanks!

// mmap opens the underlying memory-mapped file and initializes the meta references.
// minsz is the minimum size that the new mmap can be.
func (db *DB) mmap(minsz int) error {
db.mmaplock.Lock()
defer db.mmaplock.Unlock()

...

// Save references to the meta pages.
db.meta0 = db.page(0).meta()
db.meta1 = db.page(1).meta()

...
}

// meta retrieves the current meta page reference.
func (db *DB) meta() *meta {
// We have to return the meta with the highest txid which doesn't fail
// validation. Otherwise, we can cause errors when in fact the database is
// in a consistent state. metaA is the one with the higher txid.
metaA := db.meta0
metaB := db.meta1
if db.meta1.txid > db.meta0.txid {
metaA = db.meta1
metaB = db.meta0
}
...
}

Call Stack:

unexpected fault address 0xa501040\n
fatal error: fault\n
[signal 0xc0000005 code=0x0 addr=0xa501040 pc=0x899ccb]\n\n
goroutine 1136819 [running]:\nruntime.throw(0xc53fb5, 0x5)\n\t/usr/local/go/src/runtime/panic.go:605 +0x9c fp=0xc042b9bc80 sp=0xc042b9bc60 pc=0x42e78c\n
runtime.sigpanic()\n\t/usr/local/go/src/runtime/signal_windows.go:164 +0x14e fp=0xc042b9bcb0 sp=0xc042b9bc80 pc=0x443eae\n
vendor/github.com/etcd-io/bbolt.(*DB).meta(0xc04435e000, 0x0)\n\tvendor/github.com/etcd-io/bbolt/db.go:901 +0x3b fp=0xc042b9bce8 sp=0xc042b9bcb0 pc=0x899ccb\n
vendor/github.com/etcd-io/bbolt.(*DB).hasSyncedFreelist(0xc04435e000, 0x1f7)\n\tvendor/github.com/etcd-io/bbolt/db.go:323 +0x32 fp=0xc042b9bd08 sp=0xc042b9bce8 pc=0x897542\n
vendor/github.com/etcd-io/bbolt.(*Tx).rollback(0xc043f32b60)\n\tvendor/github.com/etcd-io/bbolt/tx.go:280 +0x83 fp=0xc042b9bd40 sp=0xc042b9bd08 pc=0x8a4f93\n
vendor/github.com/etcd-io/bbolt.(*Tx).Commit(0xc043f32b60, 0x0, 0x0)\n\tvendor/github.com/etcd-io/bbolt/tx.go:162 +0x5fe fp=0xc042b9be58 sp=0xc042b9bd40 pc=0x8a49ce\n
vendor/github.com/etcd-io/bbolt.(*DB).Update(0xc04435e000, 0xc052db1f30, 0x0, 0x0)\n\tvendor/github.com/etcd-io/bbolt/db.go:701 +0xf9 fp=0xc042b9bea8 sp=0xc042b9be58 pc=0x898f69\n
vendor/github.com/etcd-io/bbolt.(*batch).run(0xc05404a100)\n\tvendor/github.com/etcd-io/bbolt/db.go:809

@shenlanzifa
Copy link

I got the same problem in v1.3.6

@shenlanzifa
Copy link

it seems like db.meta1 has been freed in b.rootNode.spill(). i guess.

func (n *node) spill() error {
.....
for _, node := range nodes {
// Add node's page to the freelist if it's not new.
if node.pgid > 0 {
tx.db.freelist.free(tx.meta.txid, tx.page(node.pgid))
node.pgid = 0
}
.....

@boreq
Copy link

boreq commented Dec 20, 2022

We are experiencing the same problem in version v1.3.6 as well.

  1. Tx.Commit (tx.go:141) is triggered which calls Tx.root.spill() in tx.go:160.
  2. Tx.root.spill() calls b.rootnode.spill() in bucket.go:570.
  3. We eventually end up in DB.allocate in db.go:986 and fail with mmap allocate error: cannot allocate memory.
  4. This leads to calling tx.rollback() in tx.go:161. It calls tx.db.hasSyncedFreelist which eventually accesses db.meta0 and db.meta1 which are nil.
unexpected fault address 0x2f7004040
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x2 addr=0x2f7004040 pc=0x105b4c228]

goroutine 4288824 [running]:
runtime.throw({0x1060d8aa2?, 0x10587a5bc?})
	runtime/panic.go:1047 +0x40 fp=0x132311c30 sp=0x132311c00 pc=0x10585af70
runtime.sigpanic()
	runtime/signal_unix.go:842 +0x1a0 fp=0x132311c60 sp=0x132311c30 pc=0x105871cf0
go.etcd.io/bbolt.(*DB).meta(0x1300c4080?)
	go.etcd.io/[email protected]/db.go:948 +0x28 fp=0x132311cc0 sp=0x132311c70 pc=0x105b4c228
go.etcd.io/bbolt.(*DB).hasSyncedFreelist(...)
	go.etcd.io/[email protected]/db.go:330
go.etcd.io/bbolt.(*Tx).rollback(0x1323e2000)
	go.etcd.io/[email protected]/tx.go:278 +0x4c fp=0x132311cf0 sp=0x132311cc0 pc=0x105b544ac
go.etcd.io/bbolt.(*Tx).Commit(0x1323e2000)
	go.etcd.io/[email protected]/tx.go:161 +0x1d4 fp=0x132311e10 sp=0x132311cf0 pc=0x105b53f24
go.etcd.io/bbolt.(*DB).Update(0x10646d118?, 0x2b9bb6ee0)
	go.etcd.io/[email protected]/db.go:748 +0xc8 fp=0x132311e80 sp=0x132311e10 pc=0x105b4b2d8
go.etcd.io/bbolt.(*batch).run(0x2cdb2f9c0)
	go.etcd.io/[email protected]/db.go:856 +0x198 fp=0x132311f00 sp=0x132311e80 pc=0x105b4bb88
go.etcd.io/bbolt.(*batch).run-fm()
	<autogenerated>:1 +0x28 fp=0x132311f20 sp=0x132311f00 pc=0x105b57398
sync.(*Once).doSlow(0x2b9bb6fa0?, 0x17a222?)
	sync/once.go:74 +0x104 fp=0x132311f80 sp=0x132311f20 pc=0x105891844
sync.(*Once).Do(...)
	sync/once.go:65
go.etcd.io/bbolt.(*batch).trigger(0x2b9bb6f20?)
	go.etcd.io/[email protected]/db.go:838 +0x48 fp=0x132311fb0 sp=0x132311f80 pc=0x105b4b9c8
go.etcd.io/bbolt.(*batch).trigger-fm()
	<autogenerated>:1 +0x28 fp=0x132311fd0 sp=0x132311fb0 pc=0x105b57338
runtime.goexit()
	runtime/asm_arm64.s:1165 +0x4 fp=0x132311fd0 sp=0x132311fd0 pc=0x10588c474
created by time.goFunc
	time/sleep.go:176 +0x38

Ignoring the panic itself the underlying error which is returned (cannot allocate memory) seems to be related to virtual memory limits on iOS. As far as I understand this may warrant creating another issue as if I am not mistaken bolt requires that the entire file must be mmaped at the same time.

@ahrtr
Copy link
Member

ahrtr commented Dec 21, 2022

@boreq can you still see the panic using #362?

@gibeautc
Copy link

gibeautc commented Dec 30, 2022

Joining the discussion here. We are experiencing this same issue in a released product. In our case this is being run on a small limux SOM with limited resources. It certainly feels like its memory related ( I can recreate it by continuing to add to the database until it hits the 512MB mark, at which point it panics. But what is interesting, is that I can restart my test program on the existing database, and it will grow past the 512MB file size

The system we are using only has 256MB of ram.

It does not always panic and crash, and it seems to be related to how large the records are that are being added. It feels like if the records are small enough, it sometimes passes this threshold without any issues. Other times the program can restart and get past the threshold, and with even larger records being added, it will panic again on every attempt to add more records.

I have small test program that I can provide, but in my case I am only seeing this on certain hardware. (we have other products that use a different SOM with different kernel and more ram.) But am more then willing to try out changes or specific tests if that is helpful.

EDIT: here is some system specs in case that's helpful

cat /proc/meminfo 
MemTotal:         247652 kB
MemFree:            8908 kB
MemAvailable:     200252 kB
cat /proc/cpuinfo 
processor	: 0
model name	: ARMv7 Processor rev 5 (v7l)
BogoMIPS	: 18.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xc07
CPU revision	: 5
uname -r
4.4.15-v4.4.15+g017b90c

@ptabor
Copy link
Contributor

ptabor commented Dec 30, 2022

@gibeautc
Interesting. Passing a X MB threshold sounds more like: #285.
Are you using NoFreelistSync ? Does setting it to false helps ?

@gibeautc
Copy link

@ptabor I had not looked at that before, looking at it now, and will try it and report back

@gibeautc
Copy link

gibeautc commented Dec 30, 2022

@ptabor setting the NoFreelistSync to true (it defaults to false, so thats what I was using in previous testing) in the config does not change the behavior, still crashing

Using File: test2.db
Using BE: false
Chunk Size: 1048576
NoFreeListSync Option: true
DB Size(MB): 501
Total Cnt: 4   Delta: 4
DB Size(MB): 502
Total Cnt: 7   Delta: 3
DB Size(MB): 505
DB Size(MB): 505
Total Cnt: 10   Delta: 3
DB Size(MB): 508
Total Cnt: 13   Delta: 3
DB Size(MB): 511
DB Size(MB): 511
Total Cnt: 16   Delta: 3
unexpected fault address 0x46c13040
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x46c13040 pc=0xbd9ac]

goroutine 1 [running]:
runtime.throw({0xe6ce0, 0x5})
	/usr/local/go/src/runtime/panic.go:992 +0x5c fp=0x856da0 sp=0x856d8c pc=0x4682c
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:825 +0x1d0 fp=0x856db8 sp=0x856da0 pc=0x5cf84
go.etcd.io/bbolt.(*DB).meta(0x868000)
	/home/chadg/go/pkg/mod/go.etcd.io/[email protected]/db.go:948 +0x1c fp=0x856dd4 sp=0x856dbc pc=0xbd9ac
go.etcd.io/bbolt.(*DB).hasSyncedFreelist(...)
	/home/chadg/go/pkg/mod/go.etcd.io/[email protected]/db.go:330
go.etcd.io/bbolt.(*Tx).rollback(0x810480)
	/home/chadg/go/pkg/mod/go.etcd.io/[email protected]/tx.go:278 +0x5c fp=0x856de8 sp=0x856dd4 pc=0xc7378
go.etcd.io/bbolt.(*Tx).Commit(0x810480)
	/home/chadg/go/pkg/mod/go.etcd.io/[email protected]/tx.go:161 +0x2a0 fp=0x856e7c sp=0x856de8 pc=0xc6c00
go.etcd.io/bbolt.(*DB).Update(0x868000, 0x856ec4)
	/home/chadg/go/pkg/mod/go.etcd.io/[email protected]/db.go:748 +0x104 fp=0x856eb0 sp=0x856e7c pc=0xbd908
main.Add(0x868000, {0x880000, 0x100000, 0x100000}, 0x0)

@ahrtr
Copy link
Member

ahrtr commented Jan 13, 2023

Fixed in #362.

Please anyone let me know if you still see the panic.

@ahrtr ahrtr closed this as completed Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

6 participants