Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutting down causes "mremap size mismatch #183

Closed
decentral1se opened this issue Oct 31, 2022 · 22 comments
Closed

Shutting down causes "mremap size mismatch #183

decentral1se opened this issue Oct 31, 2022 · 22 comments
Assignees
Labels
bug Something isn't working

Comments

@decentral1se
Copy link
Member

decentral1se commented Oct 31, 2022

A new issue to dive into this 🀿 πŸŠβ€β™€οΈ Would be great to fix for stability reasons...

#124 (comment)

Quick update from my testing on Orange Pi Zero:

I have not seen any memory crashes on startup. Once it's up-and-running it seems stable, even after 1h30m. I have, however, seen some errors when I shutdown the process (Ctrl+c):

level=warn t=12m39.797272388s event=killed msg="received signal, shutting down" signal=interrupt
level=warn t=12m39.798054875s unit=gossip event="live qry on rxlog exited"
level=error t=12m41.798690257s conn="ssb-ws :8998 listen exited" err="accept tcp [::]:8987: use of closed network connection"
level=debug t=12m41.798912212s event="sbot closing" msg="connections closed"
level=error t=12m41.799388412s event="fatal error" err="sbot: index group shutdown failed: sbot index(contacts) update of backlog failed: margaret: Entry Nulled"
t=12m41.807593941s event=panic location=checkAndLog panicLog=panics/checkAndLog1848513752 err="sbot: index group shutdown failed: sbot index(contacts) update of backlog failed: margaret: Entry Nulled"

Then on the next startup attempts:

go-sbot: failed to instantiate ssb server: while opening memtables error: while opening fid: 1 error: while updating skiplist error: mremap size mismatch: requested: 20 got: 67108864
go-sbot: failed to instantiate ssb server: During db.vlog.open error: while truncating last value log file: /home/pihole/.ssb-go/sublogs/shared-badger/000006.vlog error: mremap size mismatch: requested: 20 got: 67108864
go-sbot: failed to instantiate ssb server: while opening memtables error: while opening fid: 2 error: while updating skiplist error: mremap size mismatch: requested: 20 got: 67108864

Deleting the shared-badger sublogs directory solves the problem.

/cc @mycognosist

@decentral1se
Copy link
Member Author

Maybe related to #81 πŸ€”

@decentral1se decentral1se changed the title Shutting down of go-ssb causes "mremap size mismatch" Shutting down causes "mremap size mismatch" Oct 31, 2022
@decentral1se
Copy link
Member Author

decentral1se commented Oct 31, 2022

Oh dear me: https://discuss.dgraph.io/t/error-mremap-size-mismatch-on-arm64/15333/7

More worryingly: https://discuss.dgraph.io/t/error-mremap-size-mismatch-on-arm64/15333/8

What kind of data loss does this cause? I'm not sure. And it affects arm* (edit: only 32bit?) arch only due to it not being officially supported?

Maybe we can't say go-ssb supports arm* (edit: only 32bit?) atm since there are data loss issues...

@mycognosist
Copy link
Member

😨 😒

Does the version of badger we're using include this recent PR?

dgraph-io/badger#1806

It seems there have been some arm64-related PR's lately but I'm not yet sure if they address our issues:

dgraph-io/ristretto#309

@decentral1se
Copy link
Member Author

Holy cow, even more 😱 dgraph-io/badger#1799

So yeh, I think I can pin to the latest commit and see if that helps? I'll get a PR cooking πŸ§‘β€πŸ³

@decentral1se
Copy link
Member Author

@mycognosist sorry I've got no way of testing this atm... but have got the PRs ready:

You'd need to have a local clone of margaret alongside a clone of go-ssb and then do something like:

cd margaret
git remote add d1 https://github.com/decentral1se/margaret
git fetch -a d1
git checkout upgrade-deps
go build -v ./...
cd ../go-ssb
git remote add d1 https://github.com/decentral1se/ssb
git fetch -a d1
git checkout latest-badger
go build -v ./cmd/go-sbot
go build -v ./cmd/sbotcli

Then you can run the patched go-ssb and see if the shutdown causes the same errors...

No worries if you don't have time for this atm, it's getting quite involved now πŸ˜…

@decentral1se
Copy link
Member Author

decentral1se commented Oct 31, 2022

Hmmm it looks like they once upon a time had support for older arm / 32 bit but that was on Travis CI which doesn't run anymore... it doesn't seem to be officially supported? The fix in dgraph-io/ristretto#309 was for arm64... so potentially this can resolve the bug on an rpi 4 but not for an orange pi zero? (No pressure to test, just leaving some more breadcrumbs for anyone else diving in...)

@decentral1se
Copy link
Member Author

Holy cow, even more scream dgraph-io/badger#1799

Oh wait a sec, that's a false alarm, they are releasing: https://github.com/dgraph-io/badger/releases/tag/v3.2103.3

So yeah in fact, I think that margaret was just using an old version of badger but go-ssb wasn't! Who knows what sort of chaos that was causing in the internals πŸ˜† These WIP PRs can still be tested as is but I can certainly clean them up a bit πŸ™ƒ

@mycognosist
Copy link
Member

@decentral1se

I followed the updated build instructions above (also had to include the dependency replacement line for go-muxrpc). Successful build but crash on execution:

go-sbot: failed to instantiate ssb server: During db.vlog.open error: Error while creating log file in valueLog.open error: while opening file: /home/pihole/.ssb-go/sublogs/shared-badger/000003.vlog error: cannot allocate memory
while mmapping /home/pihole/.ssb-go/sublogs/shared-badger/000003.vlog with size: 2147483646
github.com/dgraph-io/ristretto/z.OpenMmapFileUsing
	/home/pihole/go/pkg/mod/github.com/dgraph-io/[email protected]/z/file.go:59
github.com/dgraph-io/ristretto/z.OpenMmapFile
	/home/pihole/go/pkg/mod/github.com/dgraph-io/[email protected]/z/file.go:86
github.com/dgraph-io/badger/v3.(*logFile).open
	/home/pihole/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/memtable.go:556
github.com/dgraph-io/badger/v3.(*valueLog).createVlogFile
	/home/pihole/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/value.go:519
github.com/dgraph-io/badger/v3.(*valueLog).open
	/home/pihole/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/value.go:619
github.com/dgraph-io/badger/v3.Open
	/home/pihole/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/db.go:368
github.com/ssbc/go-ssb/repo.OpenBadgerDB
	/home/pihole/ssb/repo/multilogs.go:39
github.com/ssbc/go-ssb/sbot.New
	/home/pihole/ssb/sbot/new.go:294
main.runSbot
	/home/pihole/ssb/cmd/go-sbot/main.go:353
main.main
	/home/pihole/ssb/cmd/go-sbot/main.go:528
runtime.main
	/usr/local/go/src/runtime/proc.go:250
runtime.goexit
	/usr/local/go/src/runtime/asm_arm.s:831

@decentral1se
Copy link
Member Author

@mycognosist Aha, progress! πŸ˜†

It seems we need to load in the same -lite options over on https://github.com/ssbc/margaret/blob/7eba19fd7d894937cca650fee643d5c1735d1c07/multilog/roaring/badger/cmd/mbdump/main.go#L30 that we do here in go-ssb https://github.com/ssbc/go-ssb/blob/master/repo/badger_lite.go πŸ€”

Don't have time now but will try tomorrow! Idk if the build tags bubble down into the deps but I will see... i.e. if we pass -lite to go-ssb, will it pass it down to margaret?

@decentral1se
Copy link
Member Author

Took a run at it ssbc/margaret#26

You could probably git fetch -a in your margaret clone on the upgrade-deps branch and git merge lite-build / go build -v -tags lite ./... / replace github.com/ssbc/margaret => ./../margaret (in the go-ssb go.mod) if very determined πŸ€“

@mycognosist
Copy link
Member

@decentral1se

I think I followed all of your instructions correctly but unfortunately I'm still seeing the memory errors.

On first execution:

go-sbot: failed to instantiate ssb server: while opening memtables error: while opening fid: 1 error: while updating skiplist error: mremap size mismatch: requested: 20 got: 134217728

On second execution:

go-sbot: failed to instantiate ssb server: During db.vlog.open error: Open existing file: "/home/pihole/.ssb-go/sublogs/shared-badger/000001.vlog" error: while opening file: /home/pihole/.ssb-go/sublogs/shared-badger/000001.vlog error: cannot allocate memory
while mmapping /home/pihole/.ssb-go/sublogs/shared-badger/000001.vlog with size: 2147483646
github.com/dgraph-io/ristretto/z.OpenMmapFileUsing
	/home/pihole/go/pkg/mod/github.com/dgraph-io/[email protected]/z/file.go:59
github.com/dgraph-io/ristretto/z.OpenMmapFile
	/home/pihole/go/pkg/mod/github.com/dgraph-io/[email protected]/z/file.go:86
github.com/dgraph-io/badger/v3.(*logFile).open
	/home/pihole/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/memtable.go:556
github.com/dgraph-io/badger/v3.(*valueLog).open
	/home/pihole/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/value.go:585
github.com/dgraph-io/badger/v3.Open
	/home/pihole/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/db.go:368
github.com/ssbc/go-ssb/repo.OpenBadgerDB
	/home/pihole/ssb/repo/multilogs.go:39
github.com/ssbc/go-ssb/sbot.New
	/home/pihole/ssb/sbot/new.go:294
main.runSbot
	/home/pihole/ssb/cmd/go-sbot/main.go:353
main.main
	/home/pihole/ssb/cmd/go-sbot/main.go:528
runtime.main
	/usr/local/go/src/runtime/proc.go:250
runtime.goexit
	/usr/local/go/src/runtime/asm_arm.s:831

Deleting the .ssb-go/sublogs/shared-badger directory does not change the situation.

Steps I followed:

cd margaret
git fetch -a
git merge lite-build
go build -v -tags lite ./...
cd ../ssb
# margaret replace edit
vim go.mod
go build -v ./cmd/go-sbot
go build -v ./cmd/sbotcli

decentral1se added a commit to decentral1se/ssb that referenced this issue Nov 1, 2022
@decentral1se
Copy link
Member Author

decentral1se commented Nov 1, 2022

@mycognosist Nice! Pity it is still erroring... one thought is - did you start from scratch? I think the logs might be in a weird state now with the mismatched badger config in go-ssb / margaret? You can keep the private key tho, but wipe the rest? Perhaps muddling around with a -fsck ... / -repair on go-bot might dig you out but I doubt it. Will think about other approaches πŸ€”

@decentral1se
Copy link
Member Author

Think it's time for me to dust of a raspberry pi as well and test this thing πŸ§—β€β™€οΈ πŸ§—β€β™€οΈ πŸ§—β€β™€οΈ

@mycognosist
Copy link
Member

No luck starting from scratch or running repair. I think I should get my RPi4 back online and see if we encounter the same issues on ARM64. Maybe ARM32 is more of a nice-to-have whereas AMD64 and ARM64 are more important targets for support?

@decentral1se
Copy link
Member Author

decentral1se commented Nov 2, 2022

OK, so there is no rpi at hand but I'm running on a debian amd64 machine in my home network and it's been humming away nicely for 2 days without issue. Every time I open up Patchwork on my desktop I see replication running on the go-ssb pub as it pulls in hops=2 data. I've started/stopped/killed etc. and seen no error like the current one mentioned in this issue. Testing continues... up to 929M in ~/.ssb-go so far...

https://github.com/ssbc/go-ssb/blob/master/docs/faq.md#what-platforms-does-go-ssb-support does now reference 32 bit dragons also! Hopefully nobody runs into data loss woes while we're still working this out...

@mycognosist
Copy link
Member

@decentral1se

Brilliant! I'll aim to test on my RPi4 this evening / tomorrow so we have some verification of functionality on arm64.

@mycognosist
Copy link
Member

@decentral1se

Ok, testing now. I assume I can simply build go-sbot from master, now that the changes have been merged? No extra git acrobatics? 😸

@mycognosist
Copy link
Member

Ohhhh just got a tasty segfault when I killed the process:

^Clevel=warn t=38.587321388s event=killed msg="received signal, shutting down" signal=interrupt
level=debug t=38.587509478s unit=blobs event=onCall handler=createWants err="ssb: shutting down now"
level=debug t=38.587543181s event="network listen loop exited"
level=warn t=38.58771342s event="sbot node.Serve returned" err="ssb: shutting down now"
level=warn t=38.58778653s unit=gossip event="live qry on rxlog exited"
level=debug t=38.58923723s peer=<@HEqy.ed25519> conn="serve exited" err="muxrpc: serve failed to read from packer: muxrpc/packer: read packet canceled: ssb: shutting down now"
level=error t=39.588921828s conn="ssb-ws :8998 listen exited" err="accept tcp [::]:8987: use of closed network connection"
level=debug t=39.589049659s event="sbot closing" msg="connections closed"
level=debug t=39.589071881s event="sbot closing" msg="waited for indexes to close"
panic: runtime error: invalid memory address or nil pointer dereference
	panic: Unclosed iterator at time of Txn.Discard.
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5c5a98]

goroutine 134 [running]:
github.com/dgraph-io/badger/v3.(*Txn).Discard(0x40001db218?)
	/home/glyph/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/txn.go:523 +0xac
panic({0x8bf740, 0x10a00e0})
	/usr/local/go/src/runtime/panic.go:884 +0x20c
github.com/dgraph-io/badger/v3.(*memTable).IncrRef(...)
	/home/glyph/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/memtable.go:231
github.com/dgraph-io/badger/v3.(*DB).getMemTables(0x4002262000)
	/home/glyph/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/db.go:696 +0x128
github.com/dgraph-io/badger/v3.(*Txn).NewIterator(0x4009f03600, {0x64, 0x1, 0x0, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, ...})
	/home/glyph/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/iterator.go:485 +0xb8
github.com/ssbc/go-ssb/graph.(*BadgerBuilder).Follows.func1(0x4002262000?)
	/home/glyph/ssb/graph/builder.go:249 +0xb4
github.com/dgraph-io/badger/v3.(*DB).View(0x8cd9a0?, 0x40001db658)
	/home/glyph/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/txn.go:806 +0x9c
github.com/ssbc/go-ssb/graph.(*BadgerBuilder).Follows(0x400013eb60, {{0xc9, 0xef, 0x90, 0x33, 0x4f, 0x62, 0x3d, 0xc0, 0xc9, ...}, ...})
	/home/glyph/ssb/graph/builder.go:248 +0xe4
github.com/ssbc/go-ssb/graph.(*BadgerBuilder).recurseHops(0x400013eb60, 0x40118bb330, 0x40001dbc98, {{0x81, 0xa4, 0x30, 0xeb, 0x3d, 0xf4, 0x1a, ...}, ...}, ...)
	/home/glyph/ssb/graph/builder.go:454 +0x3f0
github.com/ssbc/go-ssb/graph.(*BadgerBuilder).recurseHops(0x400013eb60, 0x40118bb330, 0x40001dbc98, {{0x1c, 0x4a, 0xb2, 0xf7, 0x8d, 0x13, 0xea, ...}, ...}, ...)
	/home/glyph/ssb/graph/builder.go:461 +0x458
github.com/ssbc/go-ssb/graph.(*BadgerBuilder).recurseHops(0x400013eb60, 0x40118bb330, 0x40001dbc98, {{0xc1, 0x98, 0x54, 0xd3, 0xc2, 0x87, 0x46, ...}, ...}, ...)
	/home/glyph/ssb/graph/builder.go:461 +0x458
github.com/ssbc/go-ssb/graph.(*BadgerBuilder).Hops(0x400013eb60, {{0xc1, 0x98, 0x54, 0xd3, 0xc2, 0x87, 0x46, 0xb6, 0xce, ...}, ...}, ...)
	/home/glyph/ssb/graph/builder.go:367 +0x108
github.com/ssbc/go-ssb/sbot.(*graphReplicator).makeUpdater.func1()
	/home/glyph/ssb/sbot/replicate.go:82 +0x98
github.com/ssbc/go-ssb/sbot.debounce({0xb27fb0, 0x40023710b0}, 0xb2d05e00, {0xb26070, 0x4000197100}, 0x4000088d80)
	/home/glyph/ssb/sbot/replicate.go:143 +0x1c8
created by github.com/ssbc/go-ssb/sbot.(*Sbot).newGraphReplicator
	/home/glyph/ssb/sbot/replicate.go:73 +0x330

Going to start from scratch and see how I go.

@decentral1se
Copy link
Member Author

@mycognosist

Ok, testing now. I assume I can simply build go-sbot from master, now that the changes have been merged? No extra git acrobatics? smile_cat

Yep, can build directly from master now πŸ‘

Ohhhh just got a tasty segfault when I killed the process:

Unable to reproduce that segfaulting myself πŸ€” Root of it seems to be:

panic: runtime error: invalid memory address or nil pointer
panic: Unclosed iterator at time of Txn.Discard.
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5c5a98]

Doesn't seem to much clarity in what I can dig up on that dgraph-io/badger#522 https://discuss.dgraph.io/t/panic-unclosed-iterator-at-time-of-txn-discard/2855 (?) dgraph-io/badger#1250 (more ?) - unclear to me.

@decentral1se
Copy link
Member Author

decentral1se commented Nov 5, 2022

@mycognosist Still blowing up? Any further luck? πŸ™

@decentral1se decentral1se changed the title Shutting down causes "mremap size mismatch" Raspberry Pi / Orange Pi Zero stability Nov 7, 2022
@decentral1se decentral1se changed the title Raspberry Pi / Orange Pi Zero stability Shutting down causes "mremap size mismatch Nov 7, 2022
@decentral1se decentral1se added the bug Something isn't working label Nov 9, 2022
@mycognosist
Copy link
Member

@decentral1se

I hadn't seen any further blow-ups while testing in the UK. Still need to setup my Pis here in the new spot...need more plug points πŸ˜…

@decentral1se
Copy link
Member Author

OK let's close this off for now and if we see it again, can re-open. Thanks for all the testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

2 participants