-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some BlockFetch decline decisions are not revisited after completeFetchBatch
#1147
Labels
Milestone
Comments
nfrisby
added
consensus
issues related to ouroboros-consensus
bug
Something isn't working
labels
Oct 18, 2019
I added the Bug label. That might be an overstatement, but at the very least Duncan agreed on Slack that the current behavior is not as prompt as we'd prefer. |
@nfrisby What's the status on this? |
mrBliss
added
networking
and removed
consensus
issues related to ouroboros-consensus
labels
Nov 7, 2019
I responded on Slack. Summary: Duncan had planned to do it, though I have a commit on my branch that is more "surgical" than what he had discussed, if we want to go with that for now. |
dcoutts
added a commit
that referenced
this issue
Jan 6, 2020
Fixes #1147 The point is that we should only base decisions (to do or not do a thing, as opposed to things that only affect how it is done) on things that are in the fingerprint that changes. The nub of the bug was that we were deciding to not do this request based on peerFetchReqsInFlight == 0 which is not part of the fingerprint. By using PeerFetchStatusReady Set.empty instead we're making a decision on a slightly different condition, but one that's within the info covered by the fingerprint. The condition is different in that instead of saying that the requests are completely done, the number of outstanding replies is empty. The distinction is the message after the last block in a range request. But this difference really doesn't matter at all. Our decision is simply to avoid too much network concurrency, so it doesn't matter if there's one remaining small message to arrive.
dcoutts
added
byron
Required for a Byron mainnet: replace the old core nodes with cardano-node
priority high
and removed
priority medium
labels
Jan 7, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
While developing infra-slot network latencies (PR #1131), I'm seeing a violation of our current variant of Chain Growth. This Issue is complete when that test case is no longer a failure.
The current repro, which might not work directly on future
master
, is as follows inTest.Dynamic.Praos
. (If it unexpectedly does not fail on a branch with latencies, my only suggestion is to let the latency seed vary and hope it discovers the failure.)The specific failure happens because network latency introduces a non-negligible duration (never before seen in the test suite) between
MsgBlock
andMsgBatchDone
. A particular node makes aFetchDeclineConcurrencyLimit
decision (due topeerFetchReqsInFlight = 1
) after receivingMsgBlock
but before receivingMsgBatchDone
(which setspeerFetchReqsInFlight = 0
). The BlockFetch decision re-evaluation trigger currently does not includepeerFetchReqsInFlight
:MsgBlock
would trigger a re-evaluation butMsgBatchDone
does not. Because of that decision, nodesc1
andc2
do not adopt the chain thatc0
forged, which is the longest chain in the network. That's a violation of our currently-very-strict variant of Chain Growth.Apparent ways to resolve this Issue:
MsgBatchDone
triggers re-evaluation (i.e. expand the content ofFetchStateFingerprint
to include thepeerFetchReqsInFlight
fields; and perhaps similar for other bases for decline decisions?)PeerStatus
, which is already part of theFetchStateFingerprint
.Property
to anticipate this.The text was updated successfully, but these errors were encountered: