Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epoch Sync] Fix GC error spam and invalid height problem #12288

Merged
merged 1 commit into from
Oct 28, 2024

Conversation

robin-near
Copy link
Contributor

  • GC prints error because the head is at genesis. Return early in that case. Otherwise while we're doing epoch sync because head remains at genesis, this keeps spamming the logs.
  • See Header sync incorrectly disregards peers at startup #11930 for a header sync problem; a recently received block is marked as invalid because its height is too far ahead of where our head is (genesis). We fix this by not marking blocks as invalid if it's just because of height. The reason why we mark a block as invalid is because we think that the block can never be valid (e.g. due to some state disagreement) so we should not bother trying it again, but if invalid height is the issue, this is counterproductive.

Closes #11930
Closes #11936

@robin-near robin-near requested a review from a team as a code owner October 23, 2024 04:25
Copy link

codecov bot commented Oct 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.60%. Comparing base (a9dc3c1) to head (91f0da1).
Report is 24 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12288      +/-   ##
==========================================
- Coverage   71.60%   71.60%   -0.01%     
==========================================
  Files         836      836              
  Lines      167856   167861       +5     
  Branches   167856   167861       +5     
==========================================
- Hits       120195   120193       -2     
- Misses      42408    42417       +9     
+ Partials     5253     5251       -2     
Flag Coverage Δ
backward-compatibility 0.16% <0.00%> (-0.01%) ⬇️
db-migration 0.16% <0.00%> (-0.01%) ⬇️
genesis-check 1.24% <0.00%> (-0.01%) ⬇️
integration-tests 38.82% <100.00%> (+0.02%) ⬆️
linux 71.25% <100.00%> (-0.01%) ⬇️
linux-nightly 71.19% <100.00%> (-0.01%) ⬇️
macos 54.30% <83.33%> (+<0.01%) ⬆️
pytests 1.56% <0.00%> (-0.01%) ⬇️
sanity-checks 1.36% <0.00%> (-0.01%) ⬇️
unittests 65.36% <83.33%> (-0.01%) ⬇️
upgradability 0.21% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -754,7 +754,9 @@ impl Chain {
// the block might not have been what the block producer originally produced. Either way, it's
// OK if we miss some cases here because this is just an optimization to avoid reprocessing
// known invalid blocks so the network recovers faster in case of any issues.
if error.is_bad_data() && !matches!(error, Error::InvalidSignature) {
if error.is_bad_data()
&& !matches!(error, Error::InvalidSignature | Error::InvalidBlockHeight(_))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point but just checking, in our sequence of checks that we do for block, how far down the list do we have InvalidBlockHeight? Because if it's quite up in the stack of validation checks, this can be a source of spam as we don't mark such blocks as invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty early. However, the is_bad_data() check here is not for the purpose of preventing spam. It's for the purpose of dealing with a class of bugs where we the network gets stuck and the recovery is slowed down by repeatedly processing invalid blocks. Invalid height is not quite relevant here because these invalid blocks are talking about recent blocks. But honestly, I can't say that I still have enough context to reason about this. It's possible that this isn't even relevant anymore due to stateless validation. Either way, logically it is incorrect to mark a block as invalid just because its height is too high due to where the receiving node currently is, so we should stop doing that anyway.

@robin-near robin-near added this pull request to the merge queue Oct 28, 2024
Merged via the queue into near:master with commit 7de36d1 Oct 28, 2024
29 checks passed
@robin-near robin-near deleted the esync14 branch October 28, 2024 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants