Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: Tapd crashed during minting some unexpected non-final batches #1125

Closed
btcwer opened this issue Sep 17, 2024 · 7 comments
Closed

[bug]: Tapd crashed during minting some unexpected non-final batches #1125

btcwer opened this issue Sep 17, 2024 · 7 comments
Assignees
Labels
bug Something isn't working needs triage
Milestone

Comments

@btcwer
Copy link

btcwer commented Sep 17, 2024

Background

Tapd crashed quickly when starting up in bitcoin testnet enviroment. It can be seen from below logs that this is from an nil pointer reference.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x12de1fd]

Your environment

Tapd version: 0.4.1-alpha
LND version: 0.18.2-beta
BTC Core version: v27.0.0, with full node on testnet3
OS: Ubuntu 20.04.6 LTS

Steps to reproduce

Tapd's logs as followed:

./tapd --network=testnet --debuglevel=trace --lnd.host=127.0.0.1:10009 --lnd.macaroonpath=/home/bittap/.lnd/data/chain/bitcoin/testnet/admin.macaroon --lnd.tlspath=/home/bittap/.lnd/tls.cert --databasebackend=postgres --postgres.host=127.0.0.1 --postgres.port=5432 --postgres.user=postgres --postgres.password=Abc666666 --postgres.dbname=tapd
2024-09-13 13:57:01.552 [WRN] CONF: open /home/bittap/.tapd/tapd.conf: no such file or directory
2024-09-13 13:57:01.552 [INF] CONF: Attempting to establish connection to lnd...
2024-09-13 13:57:01.558 [INF] CONF: lnd connection initialized
2024-09-13 13:57:01.558 [INF] CONF: Opening postgres database at: postgres://postgres:****@127.0.0.1:5432/tapd?sslmode=disable
2024-09-13 13:57:01.558 [INF] TADB: Using SQL database 'postgres://postgres:****@127.0.0.1:5432/tapd?sslmode=disable'
2024-09-13 13:57:01.564 [INF] TADB: Attempting to apply migration(s) (current_db_version=21, latest_migration_version=21)
2024-09-13 13:57:01.564 [INF] TADB: Database version after migration: 21
2024-09-13 13:57:01.564 [INF] CONF: Configuring testnet.universe.lightning.finance:10029 as initial Universe federation server
2024-09-13 13:57:01.565 [INF] TSVR: Version: 0.4.1-alpha commit=, build=production, logging=default, debuglevel=trace
2024-09-13 13:57:01.565 [INF] TSVR: Active network: testnet3
2024-09-13 13:57:01.565 [INF] RPCS: Validating RPC requests based on macaroon at: /home/bittap/.tapd/data/testnet/admin.macaroon
2024-09-13 13:57:01.568 [INF] GRDN: Starting ChainPlanter
2024-09-13 13:57:01.577 [INF] TSVR: Shutdown complete

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x12de1fd]

goroutine 1 [running]:
github.com/lightninglabs/taproot-assets/commitment.(*TapCommitment).CommittedAssets(0x0?)
        /home/bittap/bittap/source/taproot-assets-0.4.1/commitment/tap.go:529 +0x1d
github.com/lightninglabs/taproot-assets/tapdb.marshalMintingBatch({0x1f13dc0, 0xc00026dcc0}, {0x1f38668, 0xc0003898f0}, {0x4, 0x3, {0xc00013ac80, 0x278, 0x280}, {0x1, ...}, ...})
        /home/bittap/bittap/source/taproot-assets-0.4.1/tapdb/asset_minting.go:1211 +0x5e5
github.com/lightninglabs/taproot-assets/tapdb.(*AssetMintingStore).FetchNonFinalBatches.func1.1({0x4, 0x3, {0xc00013ac80, 0x278, 0x280}, {0x1, 0x1}, {0x3, 0x1}, 0x2bd796, ...})
        /home/bittap/bittap/source/taproot-assets-0.4.1/tapdb/asset_minting.go:1017 +0x5b
github.com/lightninglabs/taproot-assets/fn.MapErr[...]({0xc00018c280?, 0x2, 0xc000380005}, 0xc00046d408?)
        /home/bittap/bittap/source/taproot-assets-0.4.1/fn/func.go:83 +0xf8
github.com/lightninglabs/taproot-assets/tapdb.(*AssetMintingStore).FetchNonFinalBatches.func1({0x1f38668, 0xc0003898f0})
        /home/bittap/bittap/source/taproot-assets-0.4.1/tapdb/asset_minting.go:1020 +0x137
github.com/lightninglabs/taproot-assets/tapdb.(*TransactionExecutor[...]).ExecTx(0x1efe060, {0x1f13dc0, 0xc00026dcc0}, {0x1f016a0, 0xc00015643b}, 0xc0002a5ca0)
        /home/bittap/bittap/source/taproot-assets-0.4.1/tapdb/interfaces.go:241 +0x1c2
github.com/lightninglabs/taproot-assets/tapdb.(*AssetMintingStore).FetchNonFinalBatches(0xc0005ce700, {0x1f13dc0, 0xc00026dcc0})
        /home/bittap/bittap/source/taproot-assets-0.4.1/tapdb/asset_minting.go:1005 +0xc5
github.com/lightninglabs/taproot-assets/tapgarden.(*ChainPlanter).Start.func1()
        /home/bittap/bittap/source/taproot-assets-0.4.1/tapgarden/planter.go:332 +0xc5
sync.(*Once).doSlow(0x0?, 0xc000136ba0?)
        /usr/local/go/src/sync/once.go:74 +0xc2
sync.(*Once).Do(...)
        /usr/local/go/src/sync/once.go:65
github.com/lightninglabs/taproot-assets/tapgarden.(*ChainPlanter).Start(0xc0005c39d0?)
        /home/bittap/bittap/source/taproot-assets-0.4.1/tapgarden/planter.go:318 +0x50
github.com/lightninglabs/taproot-assets.(*Server).initialize(0xc00026dbd0, 0xc000136a80)
        /home/bittap/bittap/source/taproot-assets-0.4.1/server.go:172 +0x8b2
github.com/lightninglabs/taproot-assets.(*Server).RunUntilShutdown(0xc00026dbd0, 0xc0001360c0)
        /home/bittap/bittap/source/taproot-assets-0.4.1/server.go:325 +0x551
main.main()
        /home/bittap/bittap/source/taproot-assets-0.4.1/cmd/tapd/main.go:78 +0x5be

Cause

For some reason there were unfinished minting batches in db. The Tapd was running for a month properly, but crash during a restart recently. Don't know why this can't be handled properly but a crash.
屏幕截图 2024-09-13 224036

Solution

The func marshalMintingBatch() in tapdb/assets_store.go, should has a validation on assetRoot like below:

		assetRoot := batch.RootAssetCommitment
		if assetRoot != nil {
                   assetsInBatch := assetRoot.CommittedAssets()  //crash here if without a validation
                ...
               }
@jharveyb
Copy link
Contributor

Thanks for the detailed issue!

I initially thought this could be an issue with the on-disk state of one of those Broadcast batches, but I think it's simpler. A Broadcast batch must have a root Commitment, but the err here is unchecked due to reuse of the err variable (which the linter will not flag as unchecked rn).

Could you try out this branch with that DB? Or apply the patch in some other way if you prefer.

https://github.com/lightninglabs/taproot-assets/tree/batch_marshal_fixes

@dstadulis dstadulis moved this from 🆕 New to 👀 In review in Taproot-Assets Project Board Sep 17, 2024
@dstadulis dstadulis added this to the v0.4.2 milestone Sep 17, 2024
@dstadulis
Copy link
Collaborator

@btcwer thank you for such a well-written review

@btcwer
Copy link
Author

btcwer commented Sep 18, 2024

Thanks for the detailed issue!

I initially thought this could be an issue with the on-disk state of one of those Broadcast batches, but I think it's simpler. A Broadcast batch must have a root Commitment, but the err here is unchecked due to reuse of the err variable (which the linter will not flag as unchecked rn).

Could you try out this branch with that DB? Or apply the patch in some other way if you prefer.

https://github.com/lightninglabs/taproot-assets/tree/batch_marshal_fixes

Applied this patch but it didn't work. When marshalMintingBatch() failed, the daemon quitted too.

./tapd --network=testnet --debuglevel=trace --lnd.host=127.0.0.1:10009 --lnd.macaroonpath=/home/bittap/.lnd/data/chain/bitcoin/testnet/admin.macaroon --lnd.tlspath=/home/bittap/.lnd/tls.cert --databasebackend=postgres --postgres.host=127.0.0.1 --postgres.port=5432 --postgres.user=postgres --postgres.password=Abc666666 --postgres.dbname=tapd
2024-09-18 01:56:38.442 [WRN] CONF: open /home/bittap/.tapd/tapd.conf: no such file or directory
2024-09-18 01:56:38.442 [INF] CONF: Attempting to establish connection to lnd...
2024-09-18 01:56:38.448 [INF] CONF: lnd connection initialized
2024-09-18 01:56:38.448 [INF] CONF: Opening postgres database at: postgres://postgres:****@127.0.0.1:5432/tapd?sslmode=disable
2024-09-18 01:56:38.448 [INF] TADB: Using SQL database 'postgres://postgres:****@127.0.0.1:5432/tapd?sslmode=disable'
2024-09-18 01:56:38.454 [INF] TADB: Attempting to apply migration(s) (current_db_version=21, latest_migration_version=21)
2024-09-18 01:56:38.454 [INF] TADB: Database version after migration: 21
2024-09-18 01:56:38.456 [INF] CONF: Configuring testnet.universe.lightning.finance:10029 as initial Universe federation server
2024-09-18 01:56:38.456 [INF] TSVR: Version: 0.4.1-alpha commit=, build=production, logging=default, debuglevel=trace
2024-09-18 01:56:38.456 [INF] TSVR: Active network: testnet3
2024-09-18 01:56:38.456 [INF] RPCS: Validating RPC requests based on macaroon at: /home/bittap/.tapd/data/testnet/admin.macaroon
2024-09-18 01:56:38.460 [INF] GRDN: Starting ChainPlanter
2024-09-18 01:56:38.469 [ERR] TSVR: Shutting down because error in main method: unable to initialize RPC server: unable to start asset minter: unable to parse batch: invalid commitment to asset sprouts: batch 02231d758f0132fb3d25e950534996a2c0449a1cf9452275d3aa2d7663792c5ce3
2024-09-18 01:56:38.469 [INF] TSVR: Shutdown complete

unable to initialize RPC server: unable to start asset minter: unable to parse batch: invalid commitment to asset sprouts: batch 02231d758f0132fb3d25e950534996a2c0449a1cf9452275d3aa2d7663792c5ce3

@jharveyb
Copy link
Contributor

Yes, I expected a quit to still happen; just wanted to make sure the panic was now prevented.

Can you inspect the DB entries for that batch? A first step would be to see if a TX was made, and if so if it ever got confirmed.

Alsoc, could you provide more details about that batch, such as if you created it with an older version, maybe it had a large # of assets, etc.?

Perhaps we should adjust the minter to skip batches that were stored in a bad state, but I'd prefer finding out how we got there.

@dstadulis
Copy link
Collaborator

@btcwer Would you be able to provide the requested information? #1125 (comment)

@dstadulis
Copy link
Collaborator

will close if we can't get more info

@dstadulis
Copy link
Collaborator

Wont panic now, Will reopen if we get more information from @bitwer

@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Taproot-Assets Project Board Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
Status: ✅ Done
Development

No branches or pull requests

3 participants