Reduce number of database transactions to increase throughput #5186
Labels
htlcswitch
I/O
invoices
optimization
P2
should be fixed if one has time
payments
Related to invoices/payments
One result of recent benchmarking is that it could very well be that the use of
fsync
to flush database writes to disk is the number one factor that influences node performance. It seems that the actual speed of the disk (mb/s) is hardly relevant because the write time is dwarfed by the sync latency.On a google cloud persistent disk (ssd), the syncs/sec score on a file is about 400. That really puts a cap to the maximum transaction rate of a node.
Measuring syncs/sec can be done via the
fio
tool (looks at IOPS):fio --rw=write --ioengine=sync --fdatasync=1 --size=200m --bs=4k --name=mytest
A
bbolt
update transaction requires two sync calls.bbolt
uses global locks, which means that each update transaction on a database blocks all other transactions for at least the time that it takes to execute those two sync calls.In
lnd
, there are three database that are actively used:channel.db
,wallet.db
andsphinxreplay.db
. These databases can be locked independently, which is better for performance than if it were a single database. Still it would be better to further isolate independent data in separate database files. An example could be to create a database per channel.Furthermore it could be worth to consolidate multiple transactions into one. Either via batching or by combining existing transactions. And with that reduce the number of those expensive sync calls.
Below is an overview of the transactions that are currently required to complete a payment on two nodes that have a direct channel. It looks like there is a lot of potential to reduce the tx count.
Sender db update transactions:
channeldb.(*PaymentControl).InitPayment
(batched)htlcswitch.(*persistentSequencer).NextID
channeldb.(*PaymentControl).RegisterAttempt
(batched)htlcswitch.(*circuitMap).CommitCircuits
(batched)htlcswitch.(*circuitMap).OpenCircuits
lnwallet.(*LightningChannel).SignNextCommitment
htlcswitch.(*circuitMap).DeleteCircuits
(batched)lnwallet.(*LightningChannel).ReceiveRevocation
htlcswitch/hop.(*OnionProcessor).DecodeHopIterators
(batched)channeldb.(*OpenChannel).SetFwdFilter
lnwallet.(*LightningChannel).RevokeCurrentCommitment
htlcswitch.(*networkResultStore).storeResult
(batched)htlcswitch.(*circuitMap).DeleteCircuits
(batched)routing.(*missionControlStore).AddResult
channeldb.(*PaymentControl).SettleAttempt
(batched)lnd.(*preimageBeacon).AddPreimages
(batched)lnwallet.(*LightningChannel).RevokeCurrentCommitment
lnwallet.(*LightningChannel).SignNextCommitment
htlcswitch.(*circuitMap).DeleteCircuits
(batched)lnwallet.(*LightningChannel).ReceiveRevocation
hop.(*OnionProcessor).DecodeHopIterators
(batched)channeldb.(*OpenChannel).SetFwdFilter
htlcswitch.(*Switch).ackSettleFail
(batched)Receiver db update transactions:
lnwallet.(*LightningChannel).RevokeCurrentCommitment
lnwallet.(*LightningChannel).SignNextCommitment / btcwallet.(*BtcWallet).SignOutputRaw
lnwallet.(*LightningChannel).SignNextCommitment / channeldb.(*OpenChannel).AppendRemoteCommitChain
htlcswitch.(*circuitMap).DeleteCircuits
(batched)lnwallet.(*LightningChannel).ReceiveRevocation
lightning-onion.(*Router).generateSharedSecret
htlcswitch/hop.(*OnionProcessor).DecodeHopIterators
(batched)lightning-onion.(*Router).generateSharedSecret
invoices.(*InvoiceRegistry).AddInvoice
invoices.(*InvoiceRegistry).UpdateInvoice
channeldb.(*OpenChannel).SetFwdFilter
lnwallet.(*LightningChannel).SignNextCommitment / btcwallet.(*BtcWallet).SignOutputRaw
lnwallet.(*LightningChannel).SignNextCommitment / channeldb.(*OpenChannel).AppendRemoteCommitChain
htlcswitch.(*circuitMap).DeleteCircuits
(batched)lnwallet.(*LightningChannel).ReceiveRevocation
htlcswitch/hop.(*OnionProcessor).DecodeHopIterators
(batched)channeldb.(*OpenChannel).SetFwdFilter
lnwallet.(*LightningChannel).RevokeCurrentCommitment
To find out what the dynamic behavior of batch transactions does to the fsync rate, the tool
bfgtrace
can be used. It includes a scriptsyncsnoop.bt
that captures the sync calls.To get a rate, the following command can be used:
syncsnoop.bt | grep lnd | pv -rl -i 10 > /dev/null
If you run this tool with the https://github.com/bottlepay/lightning-benchmark benchmark (config
lnd-bbolt-keysend
), I am getting the following results on my machine:Transactions/second: 18
Fsyncs/second: 400
That is 22 fsyncs per payment for sender and receiver together. It is less than the 41 fsyncs above for a single payment, but still feels that there is potential to reduce this.
The text was updated successfully, but these errors were encountered: