Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix watching already mined transactions #997

Merged
merged 2 commits into from
Jul 8, 2024

Conversation

popzxc
Copy link
Contributor

@popzxc popzxc commented Jun 30, 2024

Potentially fixes #941

Changes provider and Heartbeat in a way that accounts for recently mined transactions.

Motivation

I've met a problem where .watch() would get stuck even though a transaction is mined on the node.
When I checked the code, I noticed that we only start looking for it in blocks that come after we subscribe for transaction.

Solution

  • Check if the transaction is already mined in the Provider trait.
    • ⚠️ There is an edge case where the transaction is mined but also has custom number of confirmations. I didn't find a way to extract the block number from an abstract transaction receipt, so just left a TODO there. I hope that it's an edge case that can be tackled separately, though -- the solution here would cover all the "simple" cases.
  • Add a lookbehind buffer into Heartbeat, so that we can check if the transaction was confirmed recently.
  • Make sure that the heartbeat is spawned before we submit the first transaction.

Added test was failing before the fix, now it passes.

PR Checklist

  • Added Tests
  • Added Documentation
  • Breaking changes

Copy link
Member

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I believe this should work.

left some comments/qs

}
}
self.past_blocks.push_back((*block_height, block.transactions.hashes().copied().collect()));
error!(?self.past_blocks, "lookbehind");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prototyping artifact, fixed.

return Ok(PendingTransaction::ready(*config.tx_hash()));
}
// TODO: There exists a corner case, if we subscribe to an already mined transaction and
// require a custom number of confirmations. The `TransactionReceipt` trait
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds like something we want to add to the trait.
we can do that separately though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented via #1003, thanks.

warn!(%block_height, last_height, "non-continuous chain");
}
}
self.past_blocks.push_back((*block_height, block.transactions.hashes().copied().collect()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't handle reorgs, right?
here we'd append blocks with an existing height?
and we could find a tx confirmed in a reorged block in L473

so I think this should replace the block with same number if it exists

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattsse That's right!

However, turns out the issue with reorgs is much deeper.
Once reorg occurs, we must also drop all the existing subscriptions received at block >= *block_height.
This part is easy, but it doesn't end here.
ChainStreamPoller doesn't seem to account for reorgs at all, e.g. here it always expects the chain to be continuous.
And to fix that we probably need to compare block tip against next_yield here, and if block_number < next_yield, we should do some kind of search for the last non-diverging block (so probably we need to poll not only for block number, but rather for block header to use root hashes for comparison).

There is also an issue with the poller stream sporadically ending on transient network error, but that's a separate problem.

I've pushed a WIP commit with reorg handling in the Heartbeat and a (very flaky due to issues above) test.
PTAL, and let me know how you want to proceed.

If it works, I suggest the following:

  • I will polish the commit, so that heartbeat supports reorgs, but will remove the test for now.
  • We will merge this PR as-is, and I will create an issue for reorg handling in ChainStreamPoller.
  • I will create a separate pull request for said issue, and will restore the test for reorg detection. (I cannot fully commit that I will do it due to limited capacity; but given that I already did some research and the problem seems interesting, I'll try my best to deliver it this week).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks,
suggestions sgtm!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the parts that are not relevant for this PR and created an issue for the remaining work.
PTAL.

Copy link
Member

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more q re reap_reorg

Comment on lines 466 to 469
/// Reap transactions overridden by the reorg.
fn reap_reorg(&mut self, new_height: u64) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?
needs some additional docs, not obvious what this does?

looks like this drops the channel? which would mean on reorg, the listeners is unable to obtain the receipt because the watcher dropped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?

It erases all the subscriptions for the blocks that were affected by the reorg. Basically, it works the same as reap_timeouts above.

looks like this drops the channel?

It does, similarly to how it does on timeout. If the block with your tx was reorganized, you are not guaranteed to receive a receipt for it, so IMHO it makes sense to drop it.

A dedicated error type would be nice (even if just for timeout), but it's a change in public interfaces, and not sure if it belongs to this PR (I can submit it in a separate PR though).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more information to the method doc-comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so IMHO it makes sense to drop it.

I don't think this makes sense, because there's a high chance this tx will also be included in the new chain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so would putting it back to unconfirmed make sense then?

Comment on lines 408 to 409
/// Lookbehind blocks in form of mapping block number -> vector of transaction hashes.
past_blocks: VecDeque<(u64, Vec<B256>)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be a hashmap of hashes?

otherwise with >200txs per block we'll spend some time searching those vecs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean HashMap<u64, HashSet<B256>>?
I agree about HashSet, but for the wrapper IMHO we have a nice pop_back / push_front flow here, so probably I'd suggest making it VecDeque<(u64, HashSet<B256>)>. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to VecDeque<HashSet<(u64, HashSet<B256>)>.

Comment on lines 545 to 555
// Check that the chain is continuous.
if *last_height + 1 != *block_height {
// Remove all the transactions that were reset by the reorg.
warn!(%block_height, last_height, "reorg detected");
self.reap_reorg(*block_height);
// Remove past blocks that are now invalid.
self.past_blocks.retain(|(h, _)| h < block_height);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe block_height <= last_height is more accurate here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, block_height is the first block of the "new" chain after reorg, so transactions that belong to this block should also be removed. We will push the block with block_height right below.

Copy link
Member

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should behave as expected now.

but ngl, handling all of this is more complex than it should be -.-

@mattsse mattsse merged commit e6b3edd into alloy-rs:main Jul 8, 2024
22 checks passed
@popzxc popzxc deleted the popzxc-fix-watch branch July 8, 2024 12:36
ben186 pushed a commit to ben186/alloy that referenced this pull request Jul 27, 2024
* fix: Fix watching already mined transactions

* fix: Account for reorgs in Heart
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Execution halting on .watch() for pending transactions.
2 participants