-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Replace-by-Fee if the previous sent checkpoint has not been included for long time #176
Conversation
submitter/relayer/relayer.go
Outdated
tag btctxformatter.BabylonTag | ||
version btctxformatter.FormatVersion | ||
submitterAddress sdk.AccAddress | ||
submittedCheckpoints map[uint64]*types.CheckpointInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep submission information about all the past checkpoints? I feel this is just using extra memory.
From my understanding, we only need to keep information about the last submission that we sent and the epoch number it corresponds to. Then,
- If the last
SEALED
epoch remains the same as our last submission and the resend interval has not passed: Do nothing - If the last
SEALED
epoch remains the same as our last submission and the resend interval has passed: Bump the fee of the last submission - If the last
SEALED
epoch changed, then create a new checkpoint and reset the variables to the new checkpoints and epochs.
Unless there's another utility of having a list of the submitted checkpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. After a second thought, indeed there's no need to keep a list of the submitted checkpoints as the checkpoint is submitted in the monotonically increasing order. Further, the next checkpoint will not be submitted if the previous one is still Sealed.
submitter/relayer/relayer.go
Outdated
log.Logger.Debugf("Skip submitting the raw checkpoint for epoch %v", ckpt.Ckpt.EpochNum) | ||
|
||
epoch := ckpt.Ckpt.EpochNum | ||
ckptInfo, hasSubmitted := rl.submittedCheckpoints[epoch] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we only care about the last epoch, so I guess we can remove information about any epoch' < epoch
etc. Overall, I suggest we do not keep a mapping, but only the last submitted checkpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic can be converted to:
- Get the last checkpoint submitted.
- If the epoch of the last checkpoint submitted does not match the latest sealed checkpoint of Babylon, we have a new checkpoint to submit and we submit it. Set the last checkpoint submitted to the new one.
- If the epoch matches, check if the resend interval has passed. If yes, bump the fee of the checkpoint and set the last checkpoint submitted to the new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
types/ckpt_info.go
Outdated
Tx *wire.MsgTx | ||
ChangeAddress btcutil.Address | ||
Size uint64 // the size of the BTC tx | ||
UtxoAmount uint64 // the amount of the UTXO used in the BTC tx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't those values be calculated by the transaction itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For calculating the Size
, we need the UTXO and the priv key etc., see here. So I thought it would be easier to just save size
and utxoAmount
in the BtcTxInfo
submitter/relayer/relayer.go
Outdated
tag btctxformatter.BabylonTag | ||
version btctxformatter.FormatVersion | ||
submitterAddress sdk.AccAddress | ||
resendIntervals uint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resendIntervals uint | |
resendIntervalSeconds uint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way (that might be a bit cleaner) is
resendIntervals uint | |
resendIntervals time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that changing the interval to time.Duration
is cleaner and more flexible, but it increases the chance of having invalid configurations as it requires the time unit specified, for example, 100s
vs. 100
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's a big issue, we already do that in other places as well (e.g. the API).
submitter/relayer/relayer.go
Outdated
@@ -37,11 +39,12 @@ func New( | |||
resendIntervals uint, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resendIntervals uint, | |
resendIntervalSeconds uint, |
Thanks, @vitsalis and @SebastianElvis for your comments. I have replaced the submitted checkpoint list with the last submitted checkpoint as we agree that there's no need to keep all the historical checkpoints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than some minor stuff, looks good! How did you test this? Once this is merged is it ready for deployment?
submitter/relayer/relayer.go
Outdated
log.Logger.Debugf("The checkpoint for epoch %v was sent more than %v seconds ago but not included on BTC, resending the checkpoint", | ||
ckptEpoch, rl.resendIntervalSeconds) | ||
|
||
err := rl.resendCheckpointToBTC(rl.lastSubmittedCheckpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that this method returns the checkpoint and we set rl.lastSubmittedCheckpoint
here.
submitter/relayer/relayer.go
Outdated
if rl.lastSubmittedCheckpoint == nil || rl.lastSubmittedCheckpoint.Epoch < ckptEpoch { | ||
log.Logger.Debugf("Submitting a raw checkpoint for epoch %v for the first time", ckptEpoch) | ||
|
||
err := rl.convertCkptToTwoTxAndSubmit(ckpt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest this method returns the checkpoint and we set rl.lastSubmittedCheckpoint
here.
Thanks, @vitsalis for your suggestions! I tested this PR in local deployment using Anyway, I think we can deploy it in the devnet first and see how it goes. Is it what the devnet is for? Can we create scenarios where the BTC mempool is busy? |
submitter/relayer/relayer.go
Outdated
// resend tx1 of the checkpoint | ||
tx1Fee := rl.GetTxFee(ckptInfo.Tx1.Size) | ||
tx1 := ckptInfo.Tx1 | ||
tx1.Tx.TxOut[1].Value = int64(ckptInfo.Tx1.UtxoAmount - tx1Fee) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this work ? Doesn't changing output value require signing transaction once again before sending?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this is a nice catch. We definitely need to sign the tx again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to check if the new fee is larger than the previous fee. If it is not, then the transaction won't be accepted as RBF only works if we set a higher fee.
Probably the easiest way to test (at least partially) it would be with
|
did a quick experiment with this and got logs in form of:
So maybe there is some node config option which needs to be set up ? |
// if the resend interval has passed, should resend | ||
durSeconds := uint(time.Since(rl.lastSubmittedCheckpoint.Ts).Seconds()) | ||
if durSeconds >= rl.resendIntervalSeconds { | ||
log.Logger.Debugf("The checkpoint for epoch %v was sent more than %v seconds ago but not included on BTC, resending the checkpoint", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we known that at this point the checkpoint was really not included in btc ? Is this implicit, due too the fact that babylon still perceive this checkpoint as SEALED ?
If yes, I would add a comment explaining that and also mentioning that it is not perfect way, but it is also possible that submission succeeded and was yet not reported to Babylon.
The only foolproof way would be to have separate go routine which checks wheter sent checkpoint transactions have been included into the block. And tracking those submission from k blocks to defend against rollbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, we can only be sure that the checkpoint has not been reported without knowing whether the BTC txs have been included on BTC. Maybe for now it's not an issue because we run both the reporter and the submitter by ourselves (we will know which part is broken).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, but given that typically the resend interval is in the range of hours, this will not be an issue. Even if the resend interval was low, we would only bump the fee (to a probably similar value or one that more closely relates to the latest reality). If the transaction has been included and we're not aware of it yet (because Babylon still considers it as SEALED), we will bump the fee of an existing checkpoint so our transaction will be rejected.
Another set of experiment results:
Conclusion: We need to somehow make sure that we provide enough fee for replacement.
Conclusion: It seems the we need to re-sign transaction after changing fee. I kinda anticipated that, as it would be weird if one could just change tx in mempool without proper signature. |
How did you trigger this experiment? Given that we have replace by fee enabled we should be able to trigger the replacement I think, so this needs some more investigation. Overall, I suggest we update the e2e tests as well as verify that this works based on some local experiments (e.g. through the local deployment?) as we want to deploy this on the mainnet node soon. We will of course initially try it on the devnet with a |
So the experiment I did was exeactly that:
I setup up It is enough to trigger re-submission. As mentioned in next comment this error was kinda red herring due to test enviroment:
|
Thanks @KonradStaniec for the quick experiments and conclusion. The error is much likely caused by not signing the changed tx. Will fix this and test it locally to make sure the re-sending works |
@gitferry I have pushed commit which shows test which I would expect to work. I saw you wanted to have an example before you eddited the comment 😅 |
Hi @vitsalis and @KonradStaniec, thanks for your previous feedback. Now the PR passed the e2e tests and is ready for review again. Some changes I made:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only bumping the second transaction needs to take care of the effective fee, not only the fee of the second transaction.
return errors.New("checkpoint is not Sealed") | ||
log.Logger.Errorf("The checkpoint for epoch %v is not sealed", ckptEpoch) | ||
// we do not consider this case as a failed submission but a software bug | ||
// TODO: add metrics for alerting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add this alert metric as part of this PR given that metrics infrastructure is already in place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I missed something, what do you mean by the metrics infrastructure already in place? I suppose we still need another PR to export BTC usage metrics to the dashboard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep sure we can do it in another PR
log.Logger.Errorf("The checkpoint for epoch %v is lower than the last submission for epoch %v", | ||
ckptEpoch, lastSubmittedEpoch) | ||
// we do not consider this case as a failed submission but a software bug | ||
// TODO: add metrics for alerting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
submitter/relayer/relayer.go
Outdated
func (rl *Relayer) resendSecondTxOfCheckpointToBTC(btcTxInfo *types.BtcTxInfo) (*types.BtcTxInfo, error) { | ||
// re-estimate the tx fee based on the current load | ||
// TODO: re-estimate the fee for the real network | ||
// fee := rl.GetTxFee(btcTxInfo.Size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we resend only the second transaction, then this does not bump the fee as we want it to.
When having chained transactions that are not included, the miners see their fee as not their individual fee, but as their effective fee.
For example if we have transactions A->B
and A
has fee a
and B
has fee b
, then the effective fee of both transactions is (a + b) / (size(A) + size(B))
. Therefore, if we want to bump the fee of only the second transaction, the fee should be bumped as a' - a + b' - b
, where a'
and b'
are the new fees that were calculated. Now we only bump the fee by b' - b
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could either resend both transactions or implement the more complex logic I described above. I'm ok with both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very nice explanation, thanks! I think only bumping the second tx can also be effective if we estimate the fee of the second tx by taking both size(tx1)
and size(tx2)
into account. For example, bumpedFee := rl.GetTxFee(tx1_size + tx2_size)
.
Further, I think bumping only the second tx has several benefits compared with bumping the two:
- the logic would be simpler because if we bump the two we need to let
tx2'
referencetx1'
nottx1
- we can have less cost in the case where
tx1
is already included in a BTC block buttx2
isn't. In this case, we cannot replace either of them if we bump the two txs becausetx2'
referencestx1'
, nottx1
. But, if we only bump the second tx, we could replacetx2
and savetx1
.
submitter/relayer/relayer.go
Outdated
// TODO: re-estimate the fee for the real network | ||
// fee := rl.GetTxFee(btcTxInfo.Size) | ||
// hack: make sure the fee higher than the previous fee for testing purpose | ||
fee := btcTxInfo.Fee * 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that the above is leftover.
submitter/relayer/relayer.go
Outdated
// ensure that the fee is not greater than the output value | ||
fee = outputValue | ||
} | ||
tx.TxOut[1].Value = int64(outputValue - fee) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If fee == outputValue
we will have an output with 0 satoshi. Is this allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it is allowed. Recall that we have set TxOut[0].Value=0
already for storing data in OP_RETURN
. Also, see this
submitter/relayer/relayer.go
Outdated
// if not segwit, add signature; otherwise, add witness | ||
segwit, err := isSegWit(utxo.Addr) | ||
if err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we panic instead of returning the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning the error indicates an unsupported address type in the UTXO. Yep, returning the error is better. Thanks!
submitter/relayer/utils.go
Outdated
// if not segwit, add signature; otherwise, add witness | ||
segwit, err := isSegWit(utxo.Addr) | ||
if err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why panic instead of returning error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Only a slight concern about the precision. Let's wait for @KonradStaniec input as well as this is an important change.
return errors.New("checkpoint is not Sealed") | ||
log.Logger.Errorf("The checkpoint for epoch %v is not sealed", ckptEpoch) | ||
// we do not consider this case as a failed submission but a software bug | ||
// TODO: add metrics for alerting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep sure we can do it in another PR
submitter/relayer/relayer.go
Outdated
func (rl *Relayer) resendSecondTxOfCheckpointToBTC(btcTxInfo *types.BtcTxInfo) (*types.BtcTxInfo, error) { | ||
// re-estimate the tx fee based on the current load considering the size of both tx1 and tx2 | ||
// here we double the fee for simplicity as the size of the two txs is roughly the same | ||
fee := rl.GetTxFee(btcTxInfo.Size) * 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we be more precise and not just double the fee. We can do rl.GetTxFee(btcTxInfo1.Size + btcTxInfo2.Size)
outputValue := uint64(utxo.Amount.ToUnit(btcutil.AmountSatoshi)) | ||
if outputValue < fee { | ||
// ensure that the fee is not greater than the output value | ||
fee = outputValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A concern might be that in times of congestion we might get stuck trying to resubmit a low UTXO transaction because we won't have enough funds to pay for the fees. But not sure if we can do much about it without introducing complex logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will happen often because when we pick UTXO we pick the highest one. Maybe this can be addressed if we periodically aggregate UTXOs as you suggested earlier. We'll see whether it's a real concern from forthcoming metrics in our BTC dashboard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good ! Few general remarks for the future:
SendCheckpointToBTC
is starting becoming a bit too complex for my taste and handles too much. ImoresendSecondTxOfCheckpointToBTC
should be extracted to separate function and handled in main event loop.- before doing rbf, we could check if previous tx is in mempool still. This way we could make sure we are really sending tx with larger fee
All this things can be done in next pr-s imo
Thanks @vitsalis and @KonradStaniec for your final reviews. As I'm merging the PR, is it ready for testing in our dev net? What should I do? |
This PR fixes #175.
A previous PR #170 enables the RBF feature but is not used. This PR allows the use of RBF if a previous checkpoint has been sent but not included on BTC for a long period of time. Particularly, we save the meta information (i.e., timestamp, BTC txs) of each submitted checkpoint. Whenever a submitted-but-not-included-for-long checkpoint is found, we will resend the checkpoint using the same BTC txs used in the last submission. Note that we will re-estimate the tx fee and change the value part in the second output of each BTC tx.
Please note that this PR does not include end-to-end tests. @KonradStaniec could you please take a look and see how we can simulate some rough scenarios (e.g., BTC txs stuck in the mempool) and add relevant e2e test?