-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wire: cache the non-witness serialization of MsgTx to memoize part of… #1376
Conversation
There's a bug here in that it doesn't detect mutations in the underlying |
@jcvernaleo (as per #1530)
|
Here's a recent run late in the chain: From this we can see that when checking signatures, we actually spend most of our time just encoding the tx again to compute the sighash. Here's a flame graph for another view: |
I need a refresher on this but if memory serves right, doing away with the channel stuff in binaryFreeList and using a sync.Pool got rid of a big chunk of the time taken.
But yeah if this is something that doesn't need to happen in the first place with memoization that'd be much better. |
Related PR re binary free list: #1426 |
… TxHash In this commit, we add a new field to the `MsgTx` struct: `cachedSeralizedNoWitness`. As we decode the main transaction, we use an `io.TeeReader` to copy over the non-witness bytes into this new field. As a result, we can fully cache all tx serialization when computing the TxHash. This has been shown to show up on profiles during IBD. Caching this value allows us to optimize TxHash calculation across the entire daemon as a whole.
c73570b
to
0924825
Compare
Rebased! |
Dug a bit, and I realize the issue is in the test itself, it mutates transactions, with stuff like |
Pushed a tx implementing the above, don't think it's ideal though...worried about weird edge cases where transaction generation code uses a tx as a scratch pad and makes multiple versions to sign, with this it'll give the wrong Also added a commit that'll re-use the non-witness serialization for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked in both btcd and lnd to see if we modify wire.MsgTx
after calculating the hash, but couldn't find any instances. The code looks good, but I think we need to be extra careful and see if there are any instances across our codebases
@@ -877,6 +877,10 @@ func (msg *MsgTx) Serialize(w io.Writer) error { | |||
// Serialize, however even if the source transaction has inputs with witness | |||
// data, the old serialization format will still be used. | |||
func (msg *MsgTx) SerializeNoWitness(w io.Writer) error { | |||
if msg.cachedSeralizedNoWitness != nil { | |||
w.Write(msg.cachedSeralizedNoWitness) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing return?
// the rawTxTeeReader here, as these are segwit specific bytes. | ||
var ( | ||
flag [1]byte | ||
hasWitneess bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/hasWitneess/hasWitness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I like the elegance of using a io.TeeReader
(TIL by the way), I'm not sure if this would break a bunch of things outside of btcd
, namely in lnd
and related projects.
What if we make the caching optional with an additional struct member boolean (or if memory footprint is a concern, use a bit within the Version
field) that turns the caching on?
That way we could both benefit from the optimization within btcd
while not breaking any assumptions for outside code.
|
||
// Write out the actual number of inputs as this won't be the very byte | ||
// series after the versino of segwit transactions. | ||
if WriteVarInt(&rawTxBuf, pver, count); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add err := WriteVarInt()
.
} | ||
|
||
// Write out the actual number of inputs as this won't be the very byte | ||
// series after the versino of segwit transactions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/versino/version
// this transaction without witness data. When we decode a transaction, | ||
// we'll write out the non-witness bytes to this so we can quickly | ||
// calculate the TxHash later if needed. | ||
cachedSeralizedNoWitness []byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/cachedSeralizedNoWitness/cachedSerializedNoWitness/
// useful to be able to get the correct txid after mutating a transaction's | ||
// state. | ||
func (msg *MsgTx) WipeCache() { | ||
msg.cachedSeralizedNoWitness = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we also need to call this in the helper methods like AddTxIn()
and AddTxOut()
? Or is the assumption that TxHash()
would in practice only be called once the transaction is fully built, so the cache doesn't need to be invalidated?
I'm mostly worrying about uses of MsgTx
outside of btcd
, where I'm not sure we can 100% guarantee that we're always using this pattern...
Not a bad idea, agree that this could have a lot of unintended consequences. The other approach uses |
Closing in favor of #2023, see this comment for some details: #2023 (comment) TL;DR: I think we need to remove/optimize the binary free list instead. |
… TxHash
In this commit, we add a new field to the
MsgTx
struct:cachedSeralizedNoWitness
. As we decode the main transaction, we use anio.TeeReader
to copy over the non-witness bytes into this new field.As a result, we can fully cache all tx serialization when computing the
TxHash. This has been shown to show up on profiles during IBD. Caching
this value allows us to optimize TxHash calculation across the entire
daemon as a whole.