Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tapgarden: batch restart fixes #941

Merged
merged 3 commits into from
Jun 13, 2024
Merged

tapgarden: batch restart fixes #941

merged 3 commits into from
Jun 13, 2024

Conversation

jharveyb
Copy link
Contributor

Should fix #940.

Likely caused by changes in #866.

@jharveyb jharveyb force-pushed the batch_restart_fixes branch from cb117ac to 3159ec9 Compare June 10, 2024 03:55
Copy link
Member

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, LGTM 🎉

@Liongrass
Copy link
Contributor

I ran this pull request as part of litd. I might need some help verifying that I really did run this code, though.

First I checked out this branch on the litd repo:
git checkout 0-19-staging

I'm on commit 980ee938aa1e7a45633216aff8775ca8d05c12aa

Then I modified the go.mod file. I changed the /taproot-assets line to github.com/lightninglabs/taproot-assets 3159ec9448670552a2c39ae94d7a1d6b5c3178e2

Then I ran go mod tidy. The go.mod file now read:
github.com/lightninglabs/taproot-assets v0.3.3-0.20240610035446-3159ec944867

Then I compiled litd with make install

I then ran litd with litd

Unfortunately, litd crashed:

2024-06-10 09:35:18.120 [DBG] GRPC: [core] [Channel #143 SubChannel #144] Subchannel created
2024-06-10 09:35:18.120 [DBG] GRPC: [core] [Channel #143] Channel Connectivity change to CONNECTING
2024-06-10 09:35:18.121 [INF] RPCS: Trader server is now active
2024-06-10 09:35:18.122 [DBG] STAT: Setting the pool sub-server as running
2024-06-10 09:35:18.122 [INF] SSVR: Opening sqlite3 database at: /home/ubuntu/.tapd/data/testnet/tapd.db
2024-06-10 09:35:18.130 [DBG] GRPC: [core] [Channel #143 SubChannel #144] Subchannel Connectivity change to CONNECTING
2024-06-10 09:35:18.131 [DBG] GRPC: [core] [Channel #143 SubChannel #144] Subchannel picks a new address "test.pool.lightning.finance:12010" to connect
2024-06-10 09:35:18.131 [DBG] GRPC: [core] [pick-first-lb 0xc003ea52c0] Received SubConn state update: 0xc003ea53e0, {ConnectivityState:CONNECTING ConnectionError:<nil>}
2024-06-10 09:35:18.131 [DBG] RPCS: [/lnrpc.Lightning/SubscribeChannelEvents] requested
2024-06-10 09:35:18.131 [DBG] RPCS: [/lnrpc.Lightning/ChannelAcceptor] requested
2024-06-10 09:35:18.160 [INF] TADB: Applying migrations from version=18
2024-06-10 09:35:18.173 [INF] SSVR: Configuring testnet.universe.lightning.finance:10029 as initial Universe federation server
2024-06-10 09:35:18.174 [INF] SRVR: Version: 0.3.2-alpha commit=v0.3.3-0.20240610035446-3159ec944867, build=production, logging=default, debuglevel=info
2024-06-10 09:35:18.175 [INF] SRVR: Active network: testnet3
2024-06-10 09:35:18.175 [INF] RPCS: Validating RPC requests based on macaroon at: /home/ubuntu/.tapd/data/testnet/admin.macaroon
2024-06-10 09:35:18.186 [INF] GRDN: Starting ChainPlanter
2024-06-10 09:35:18.194 [INF] GRDN: Retrieved 2 non-finalized batches from DB
2024-06-10 09:35:18.194 [INF] GRDN: Launching ChainCaretaker(024613161ec23c8140dc613e2d2b01d806c523a562db206b435afc9e7fdd7ff090)
2024-06-10 09:35:18.198 [INF] GRDN: Starting asset custodian
2024-06-10 09:35:18.200 [INF] GRDN: Starting re-org watcher
2024-06-10 09:35:18.198 [INF] GRDN: BatchCaretaker(024613161ec23c8140dc613e2d2b01d806c523a562db206b435afc9e7fdd7ff090), advancing from state=BatchStateFrozen to state=BatchStateBroadcast
2024-06-10 09:35:18.199 [INF] GRDN: Gardener for ChainPlanter now active!
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1a7ac14]

goroutine 3411 [running]:
github.com/lightninglabs/taproot-assets/tapgarden.(*BatchCaretaker).stateStep(0xc0002d5340, 0x60?)
	github.com/lightninglabs/[email protected]/tapgarden/caretaker.go:577 +0xb4
github.com/lightninglabs/taproot-assets/tapgarden.(*BatchCaretaker).advanceStateUntil(0xc0002d5340, 0x1, 0x3)
	github.com/lightninglabs/[email protected]/tapgarden/caretaker.go:292 +0x210
github.com/lightninglabs/taproot-assets/tapgarden.(*BatchCaretaker).assetCultivator(0xc0002d5340)
	github.com/lightninglabs/[email protected]/tapgarden/caretaker.go:351 +0x23e
created by github.com/lightninglabs/taproot-assets/tapgarden.(*BatchCaretaker).Start.func1 in goroutine 1
	github.com/lightninglabs/[email protected]/tapgarden/caretaker.go:155 +0x67
	```

@dstadulis
Copy link
Collaborator

@Roasbeef will review

it's uncertain if this PR genuinely fixes #940 but it is known that #941 is a genuine fix

@jharveyb
Copy link
Contributor Author

jharveyb commented Jun 12, 2024

So the issue seems to be that you have a batch on disk that is in state BatchStateFrozen, with an empty GenesisPacket.

On restart, we try to serialize that packet without having a non-nil check first.

This could happen if you crashed while minting with v0.3.3, as there the batch was frozen before funding and sealing.

This could also happen on main if you crashed while finalizing the batch, as we write the batch to disk as Frozen before funding and sealing are started and succeed.

On restart, if the batch has state Frozen, later code in the caretaker assumes that it has a GenesisPacket.

The open question IMO is what to do with a batch in state Frozen that still needs changes (like funding and sealing). My intent for this PR is to have that handled before the caretaker starts so that we don't need a bunch of extra checks in the caretaker.

@jharveyb jharveyb force-pushed the batch_restart_fixes branch from 5aceb0c to 76ec8ac Compare June 12, 2024 00:19
@Liongrass
Copy link
Contributor

Awesome, I'm running this patch, so far no issues!

@Liongrass
Copy link
Contributor

But running tapcli --tlscertpath ~/.lit/tls.cert --rpcserver=localhost:8443 --network=testnet assets mint batches still results in a crash:
[tapcli] unable to list batches: rpc error: code = Unavailable desc = error reading from server: EOF

2024-06-12 03:26:26.579 [INF] LITD: Handling gRPC request: /mintrpc.Mint/ListBatches
2024-06-12 03:26:26.583 [DBG] RPCS: [/mintrpc.Mint/ListBatches] requested
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x47b9fa]

goroutine 27193 [running]:
github.com/btcsuite/btcd/btcec/v2/schnorr.SerializePubKey(...)
	github.com/btcsuite/btcd/btcec/[email protected]/schnorr/pubkey.go:47
github.com/lightninglabs/taproot-assets/taprpc.MarshalScriptKey({0x0?, 0x0?})
	github.com/lightninglabs/[email protected]/taprpc/marshal.go:111 +0x4f
github.com/lightninglabs/taproot-assets.marshalSeedling(0xc00191ea00)
	github.com/lightninglabs/[email protected]/rpcserver.go:4010 +0x37e
github.com/lightninglabs/taproot-assets/fn.MapErr[...]({0xc006750740?, 0x2, 0xc003713350}, 0x352f2e0?)
	github.com/lightninglabs/[email protected]/fn/func.go:71 +0x5f
github.com/lightninglabs/taproot-assets.marshalSeedlings(0xc0090f2630)
	github.com/lightninglabs/[email protected]/rpcserver.go:4063 +0x14b
github.com/lightninglabs/taproot-assets.marshalMintingBatch(0xc00191e980, 0x0)
	github.com/lightninglabs/[email protected]/rpcserver.go:3947 +0x45d
github.com/lightninglabs/taproot-assets.marshalVerboseBatch({0x37c35b0, 0xc0025b9620}, 0xc006750620, 0x0, 0x0)
	github.com/lightninglabs/[email protected]/rpcserver.go:3859 +0x38
github.com/lightninglabs/taproot-assets.(*rpcServer).ListBatches.func1(0x5?)
	github.com/lightninglabs/[email protected]/rpcserver.go:860 +0x2b
github.com/lightninglabs/taproot-assets/fn.MapErr[...]({0xc00abb9470?, 0x5, 0x0}, 0xc003713600?)
	github.com/lightninglabs/[email protected]/fn/func.go:71 +0x5f
github.com/lightninglabs/taproot-assets.(*rpcServer).ListBatches(0xc000683810, {0x37c35b0, 0xc0025b9620}, 0xc002b75c00)
	github.com/lightninglabs/[email protected]/rpcserver.go:856 +0x3c5
github.com/lightninglabs/taproot-assets/taprpc/mintrpc._Mint_ListBatches_Handler.func1({0x37c35b0?, 0xc0025b9620?}, {0x2226ea0?, 0xc002b75c00?})
	github.com/lightninglabs/[email protected]/taprpc/mintrpc/mint_grpc.pb.go:335 +0xcb
github.com/lightningnetwork/lnd/rpcperms.(*InterceptorChain).CreateServerOpts.(*InterceptorChain).middlewareUnaryServerInterceptor.func7({0x37c35b0, 0xc0025b9620}, {0x2226ea0, 0xc002b75c00}, 0xc004703400, 0xc0048012c0)
	github.com/lightningnetwork/[email protected]/rpcperms/interceptor.go:832 +0x111
google.golang.org/grpc.getChainUnaryHandler.func1({0x37c35b0, 0xc0025b9620}, {0x2226ea0, 0xc002b75c00})
	google.golang.org/[email protected]/server.go:1163 +0xb2
github.com/lightningnetwork/lnd/rpcperms.(*InterceptorChain).CreateServerOpts.(*InterceptorChain).MacaroonUnaryServerInterceptor.func5({0x37c35b0, 0xc0025b9620}, {0x2226ea0, 0xc002b75c00}, 0xc004703400?, 0xc002b86980)
	github.com/lightningnetwork/[email protected]/rpcperms/interceptor.go:689 +0x7d
google.golang.org/grpc.getChainUnaryHandler.func1({0x37c35b0, 0xc0025b9620}, {0x2226ea0, 0xc002b75c00})
	google.golang.org/[email protected]/server.go:1163 +0xb2
github.com/lightningnetwork/lnd/rpcperms.(*InterceptorChain).CreateServerOpts.(*InterceptorChain).rpcStateUnaryServerInterceptor.func3({0x37c35b0, 0xc0025b9620}, {0x2226ea0, 0xc002b75c00}, 0xc004703400, 0xc002b86880)
	github.com/lightningnetwork/[email protected]/rpcperms/interceptor.go:781 +0xfc
google.golang.org/grpc.getChainUnaryHandler.func1({0x37c35b0, 0xc0025b9620}, {0x2226ea0, 0xc002b75c00})
	google.golang.org/[email protected]/server.go:1163 +0xb2
github.com/lightningnetwork/lnd/rpcperms.(*InterceptorChain).CreateServerOpts.errorLogUnaryServerInterceptor.func1({0x37c35b0?, 0xc0025b9620?}, {0x2226ea0?, 0xc002b75c00?}, 0xc004703400, 0xc0048012c0?)
	github.com/lightningnetwork/[email protected]/rpcperms/interceptor.go:605 +0x52
google.golang.org/grpc.NewServer.chainUnaryServerInterceptors.chainUnaryInterceptors.func1({0x37c35b0, 0xc0025b9620}, {0x2226ea0, 0xc002b75c00}, 0xc004703400, 0x78?)
	google.golang.org/[email protected]/server.go:1154 +0x85
github.com/lightninglabs/taproot-assets/taprpc/mintrpc._Mint_ListBatches_Handler({0x238fb80, 0xc000876330}, {0x37c35b0, 0xc0025b9620}, 0xc002ac9700, 0xc000873160)
	github.com/lightninglabs/[email protected]/taprpc/mintrpc/mint_grpc.pb.go:337 +0x143
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0003d2f00, {0x37c35b0, 0xc0025b9530}, {0x37d4ba0, 0xc0008cb520}, 0xc002d790e0, 0xc0008b5b00, 0x4ba3858, 0x0)
	google.golang.org/[email protected]/server.go:1343 +0xdd1
google.golang.org/grpc.(*Server).handleStream(0xc0003d2f00, {0x37d4ba0, 0xc0008cb520}, 0xc002d790e0)
	google.golang.org/[email protected]/server.go:1737 +0xc47
google.golang.org/grpc.(*Server).serveStreams.func1.1()
	google.golang.org/[email protected]/server.go:986 +0x86
created by google.golang.org/grpc.(*Server).serveStreams.func1 in goroutine 75
	google.golang.org/[email protected]/server.go:997 +0x136

@jharveyb jharveyb force-pushed the batch_restart_fixes branch from 76ec8ac to 3b3b486 Compare June 12, 2024 18:31
@jharveyb
Copy link
Contributor Author

But running tapcli --tlscertpath ~/.lit/tls.cert --rpcserver=localhost:8443 --network=testnet assets mint batches still results in a crash: [tapcli] unable to list batches: rpc error: code = Unavailable desc = error reading from server: EOF

I updated this branch, can you try it out again? I think marhsalling for batches made before #866, was broken by #866. Should be fixed now.

I had added some assumptions in the marshalling code that all seedlings would have a script key set, which wouldn't be true for batches already stored in the DB before that PR.

Also added an extra check at the beginning of the caretaker so we exit gracefully in case on restart resume issues.

@Liongrass
Copy link
Contributor

Yes I'm able to run tapcli assets mint batches without any issues on commit 3b3b4864778889cc1542062402e19cac1f66b44b
thank you!

@jharveyb jharveyb requested a review from ffranr June 13, 2024 15:36
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐴

}

if b.cfg.Batch.GenesisPacket == nil ||
b.cfg.Batch.GenesisPacket.Pkt == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Roasbeef Roasbeef merged commit 1b65b46 into main Jun 13, 2024
14 checks passed
@guggero guggero deleted the batch_restart_fixes branch June 14, 2024 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

[bug]: SIGSEGV after BatchCaretaker initializes non-finalized batches
5 participants