Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test with infra-slot message latency #1131

Closed
wants to merge 10 commits into from

Conversation

nfrisby
Copy link
Contributor

@nfrisby nfrisby commented Oct 14, 2019

This Draft PR is my first attempt at injecting networking latency into the test-consensus tests while simultaneously ensuring that every mini protocol message arrives in the same slot it was sent. The purpose is to permute the interleaving of various events without adding new difficult-to-predict mechanisms for Common Prefix violations.

This PR is partial progress towards Issue IntersectMBO/ouroboros-consensus#802. This intermediate milestone is motivated by the test plan in (pending) PR #1128.

@nfrisby nfrisby requested a review from mrBliss October 14, 2019 14:55
@nfrisby
Copy link
Contributor Author

nfrisby commented Oct 14, 2019

Currently, this implementation only requires the CS and BF mini protocols to settle down. The transaction submission mini protocols never seem to quiet down -- if I wait for those mini protocol instances to settle down, the test execution live locks.

Edit: It now waits for all three mini protocols to settle. I had a bug where I had the state forget now-dead live pipes after blocking until the next slot instead of forgetting them before blocking. That was deadlock, since "unforgotten" "dead" live pipes can prevent the next slot from starting.

@nfrisby nfrisby added consensus issues related to ouroboros-consensus testing labels Oct 14, 2019
@nfrisby nfrisby force-pushed the nfrisby/test-infraslot-delays branch from 8413f0c to ae84f1a Compare October 14, 2019 15:01
@nfrisby
Copy link
Contributor Author

nfrisby commented Oct 14, 2019

It seems like a good idea to open a specific Issue for this PR, but it also seems like that should wait until PR 1128 is resolved.

@nfrisby nfrisby changed the title Test with infra-slot delays Test with infra-slot "paper message" delays Oct 14, 2019
@nfrisby nfrisby changed the title Test with infra-slot "paper message" delays Test with infra-slot message latency Oct 14, 2019
=> ResourceRegistry m
-> NumSlots -- ^ Number of slots
-> DiffTime -- ^ Slot duration
-> (SlotNo -> m ()) -- ^ Blocks until slot is finished
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment confused me. After a slot is finished, this function is called and the next slot will not start until the call (successfully) terminates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your sentence conveys a lot of the right intuition. However, I have a pedantic objection: one slot finishes exactly when the next starts, there's no duration in between.

I hope to rework this function (see other PR comments), but I will at the very least improve the names and comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about now?

}
onSlotChange btime $ \s -> do
blockUntilQuiescent livePipesVar quiescenceThreshold
atomically $ writeTVar latestDoneSlot s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer if we didn't need to go through latestDoneSlot to tell the btime to advance. I suppose that otherwise blockUntilQuiescent have to move to where the btime is created. Maybe the creation of the btime can be done in runNodeNetwork instead? Is there any reason left to do it in runTestNetwork?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main reasons for now:

  • Test.Dynamic.General is not the only call-site for newTestBlockchainTime
  • This current approach is pretty awkward, still. I'm not even sure "the slot length" is relevant anymore?
  • So I thought we'd figure it out at this call-site before I attempt to alter a function with multiple call-sites.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I'm thinking of for leaving it in runTestNetwork is because it's free to introduce further details (beyond latestDoneSlot interactions) that runNodeNetwork does not care about, so defining btime outside seems right.

ouroboros-consensus/test-consensus/Test/Dynamic/Network.hs Outdated Show resolved Hide resolved
sig2 <- get
if sig1 == sig2 then pure () else go sig2

-- | Create a pipe backed by a 'LazyTMVar', add it to the live pipes, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/LazyTMVar/StrictTMVar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only used LazyTMVar because that's what was being used. Since these mvars are simulating a network, it seems like making it strict would be in a sense more accurate. Though "time" is somewhat weird here due to io-sim, so I don't think it ultimately matters -- maybe only for tracing error calls perhaps?

I'll change to StrictTMVar and see if someone else objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, no, I think you're already using a StrictTMVar, since you're using Control.Monad.Class.MonadSTM.Strict. A partial type signature on buffer confirms this. So my comment only applies to the docstring 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It now uses TQueues. Though the function's comment is now pretty big and does not mention the exact data type.

@nfrisby nfrisby force-pushed the nfrisby/test-infraslot-delays branch from 1badc07 to 2dc06f8 Compare October 16, 2019 16:08
@nfrisby
Copy link
Contributor Author

nfrisby commented Oct 16, 2019

OK: I fixed bugs and squashed it down to a single commit. I still haven't addressed your prior comments. FYI I intend to improve the organization, but I wanted to push up the first version that seems to be correct even if not yet good :)

@nfrisby
Copy link
Contributor Author

nfrisby commented Oct 18, 2019

I rebased onto master, improved the organization of the commits, simplified the code some, and commented it some. The Test.Dynamic.* tests passed 10,000 QuickCheck iterations as of 4e1180c.

I still plan to break some of the new code (the "live pipes" stuff) out into its own module.

@nfrisby nfrisby force-pushed the nfrisby/test-infraslot-delays branch 2 times, most recently from 18268c2 to 4a59748 Compare October 21, 2019 13:20
@nfrisby nfrisby marked this pull request as ready for review October 21, 2019 13:23
@nfrisby
Copy link
Contributor Author

nfrisby commented Oct 21, 2019

I believe these commits are ready for review.

* 9dc45513 - TOSQUASH comments (40 seconds ago) <Nicolas Frisby>
* 4a59748f - TODROP: anticipate Issue 1147 (8 minutes ago) <Nicolas Frisby>
* 93d15eb5 - test-consensus: add Issue 1147 repro (8 minutes ago) <Nicolas Frisby>
* 1d3ba876 - test-consensus: improve the shrinking in Test.Dynamic.General (8 minutes ago) <Nicolas Frisby>
* 7238a43d - test-consensus: add infra-slot delays (8 minutes ago) <Nicolas Frisby>
* 5ad27a46 - ouroboros-consensus: generalize newTestBlockchainTime (3 days ago) <Nicolas Frisby>
* d55cb13a - test-consensus: introduce NodeNetworkArgs (3 days ago) <Nicolas Frisby>
*   d3e71fac - (origin/master, origin/bors/staging, origin/HEAD) Merge input-output-hk/ouroboros-network#1122 (4 days ago) <iohk-bors[bot]>

mrBliss
mrBliss previously approved these changes Oct 24, 2019
Copy link
Contributor

@mrBliss mrBliss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 👍 Is it correct that this helped us find one bug, namely #1147?

@nfrisby nfrisby requested a review from mrBliss October 24, 2019 13:11
@nfrisby
Copy link
Contributor Author

nfrisby commented Oct 24, 2019

Yep, these tests revealed Issue 1147 👍

Related: I re-requested review just now only to clear your Approval. I haven't made any changes yet, but the commits currently include a workaround for Issue 1147 that does not belong on master as-is. I'd switch the PR back to Draft status if I could.

@nfrisby
Copy link
Contributor Author

nfrisby commented Oct 24, 2019

OK, I think I addressed your latest comments and then some.

This PR is blocked by a proper fix for Issue 1147 -- its current commits contain a workaround. Once a fix is on master, I'll rebase (and squash) and then we can try to merge.

@mrBliss mrBliss dismissed their stale review October 24, 2019 14:47

Block this PR until #1147 is fixed

In particular, it no longer assumes that all slots have the same duration.
We add random network latencies but crucially also ensure that each test slot
cannot end until all network channels have been empty for some duration that
reasonable dwarfs any computational delays. The goal is for the node network to
always reach a steady state before the next slot begins, despite random network
latencies.
  * stylish-haskell fixes to this PR's diff

  * typos

  * change vestigial mb prefix to li

  * add ioSimSecondsToDiffTime for self-documenting purposes

  * refactor `diffCtor` for maintainability
@nfrisby nfrisby force-pushed the nfrisby/test-infraslot-delays branch from f47d83f to 1ed2a89 Compare November 8, 2019 20:04
@nfrisby nfrisby force-pushed the nfrisby/test-infraslot-delays branch from 1ed2a89 to 0d2fa18 Compare November 10, 2019 16:31
@nfrisby
Copy link
Contributor Author

nfrisby commented Nov 10, 2019

Status: I just rebased and tweaked the commits. I think 0d2fa18 is mergeable, but the last commit is a "surgical" minimal-diff simplification of what Duncan is planning to do, if I understand.

@nfrisby
Copy link
Contributor Author

nfrisby commented Jan 6, 2020

@dcoutts On my local copy of this PR, I reverted 0d2fa18, which is the old, demonstrative commit for the Issue 1147 bug, and experimentally added the following commit which you suggested this morning on Slack.

$ git show aefa2a4a
commit aefa2a4add464f7e330b0611f709c8e41cc1099d
Author: Nicolas Frisby <[email protected]>
Date:   Mon Jan 6 07:01:09 2020 -0800

    new fix

diff --git a/ouroboros-network/src/Ouroboros/Network/BlockFetch/Decision.hs b/ouroboros-network/src/Ouroboros/Network/BlockFetch/Decision.hs
index 692c35a1..ab3db0e9 100644
--- a/ouroboros-network/src/Ouroboros/Network/BlockFetch/Decision.hs
+++ b/ouroboros-network/src/Ouroboros/Network/BlockFetch/Decision.hs
@@ -920,7 +920,7 @@ fetchRequestDecision FetchDecisionPolicy {
              inFlightBytesLowWatermark
              inFlightBytesHighWatermark
 
-  | peerFetchReqsInFlight == 0
+  | peerFetchStatus == PeerFetchStatusReady Set.empty
   , let maxConcurrentFetchPeers = case fetchMode of
                                     FetchModeBulkSync -> maxConcurrencyBulkSync
                                     FetchModeDeadline -> maxConcurrencyDeadline

The 1147 repro failed :( I haven't yet investigated why -- just FYI.

Edit: Here's a trace of a failure. It's the original repro, but required a different RNG seed for the message latencies. So: topology is c0 <-> c1 <-> c2, c0 joins and leads in s24, c1 and c2 join in s29, and c0 and c2 lead in s29 (but c2 hasn't sync'd anything before leading, so its new block is too short and so will be discarded eventually). The failure is that c1 (and therefore c2) do not adopt the block that c0 forges in s29 (well, at least they don't during s29). Here is a trace of s29 of Forges and Switches of all three nodes as well as the BF decisions and traffic for just c1.

TICK SlotNo 29
CoreId 0 TraceForgeEvent (SlotNo 29) (SimpleBlock {simpleHeader = SimpleHeader {simpleHeaderHash = 613437f7, simpleHeaderStd = SimpleStdHeader {simplePrev = BlockHash 25b1ae66, simpleSlotNo = SlotNo 29, simpleBlockNo = BlockNo 2, simpleBodyHash = e4e0e620, simpleBlockSize = 82}, simpleHeaderExt = SimplePraosExt {simplePraosExt = PraosFields {praosSignature = SignedKES {getSig = SigMockKES 192071407222283496340282674609397094613 (SignKeyMockKES (VerKeyMockKES 0) 29 1000000)}, praosExtraFields = PraosExtraFields {praosCreator = CoreId 0, praosRho = CertifiedVRF {certifiedNatural = 268989868028154701625513075929944158380, certifiedProof = CertMockVRF 0}, praosY = CertifiedVRF {certifiedNatural = 22537260893503605589691117142147148122, certifiedProof = CertMockVRF 0}}}}}, simpleBody = SimpleBody {simpleTxs = [UnsafeTx (fromList [(b378192f,0)]) [("c",644),("a",356)],UnsafeTx (fromList [(b378192f,2)]) [("a",678),("c",322)],UnsafeTx (fromList [(b378192f,1)]) [("c",670),("b",330)]]}})
CoreId 0 SwitchToChain (Just (BlockNo 1),At (Block {blockPointSlot = SlotNo 24, blockPointHash = 25b1ae66})) (Just (BlockNo 2),At (Block {blockPointSlot = SlotNo 29, blockPointHash = 613437f7}))
CoreId 1 []
CoreId 2 TraceForgeEvent (SlotNo 29) (SimpleBlock {simpleHeader = SimpleHeader {simpleHeaderHash = e44ad819, simpleHeaderStd = SimpleStdHeader {simplePrev = GenesisHash, simpleSlotNo = SlotNo 29, simpleBlockNo = BlockNo 1, simpleBodyHash = 6720d660, simpleBlockSize = 3}, simpleHeaderExt = SimplePraosExt {simplePraosExt = PraosFields {praosSignature = SignedKES {getSig = SigMockKES 281117183420071284315832754297023637025 (SignKeyMockKES (VerKeyMockKES 2) 29 1000000)}, praosExtraFields = PraosExtraFields {praosCreator = CoreId 2, praosRho = CertifiedVRF {certifiedNatural = 188088890777148388162503948613687488904, certifiedProof = CertMockVRF 2}, praosY = CertifiedVRF {certifiedNatural = 43892829808500798462927256150386975282, certifiedProof = CertMockVRF 2}}}}}, simpleBody = SimpleBody {simpleTxs = []}})
CoreId 1 []
CoreId 1 [TraceLabelPeer (CoreId 0) (Left FetchDeclineChainNotPlausible)]
CoreId 2 SwitchToChain (Nothing,Origin) (Just (BlockNo 1),At (Block {blockPointSlot = SlotNo 29, blockPointHash = e44ad819}))
CoreId 1 [TraceLabelPeer (CoreId 0) (Left FetchDeclineChainNotPlausible)]
CoreId 1 [TraceLabelPeer (CoreId 0) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (CoreId 2) (Left FetchDeclineChainNotPlausible)]
CoreId 1 [TraceLabelPeer (CoreId 2) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (CoreId 0) (Right [At (Block {blockPointSlot = SlotNo 24, blockPointHash = 25b1ae66})])]
CoreId 1 Send CoreId 0 MsgRequestRange ChainRange (At (Block {blockPointSlot = SlotNo 24, blockPointHash = 25b1ae66})) (At (Block {blockPointSlot = SlotNo 24, blockPointHash = 25b1ae66}))
CoreId 1 [TraceLabelPeer (CoreId 0) (Left FetchDeclineInFlightThisPeer),TraceLabelPeer (CoreId 2) (Right [At (Block {blockPointSlot = SlotNo 29, blockPointHash = e44ad819})])]
CoreId 1 Send CoreId 2 MsgRequestRange ChainRange (At (Block {blockPointSlot = SlotNo 29, blockPointHash = e44ad819})) (At (Block {blockPointSlot = SlotNo 29, blockPointHash = e44ad819}))
CoreId 1 Recv CoreId 0 MsgStartBatch
CoreId 1 Recv CoreId 0 MsgBlock SimpleBlock {simpleHeader = SimpleHeader {simpleHeaderHash = 25b1ae66, simpleHeaderStd = SimpleStdHeader {simplePrev = GenesisHash, simpleSlotNo = SlotNo 24, simpleBlockNo = BlockNo 1, simpleBodyHash = 6720d660, simpleBlockSize = 3}, simpleHeaderExt = SimplePraosExt {simplePraosExt = PraosFields {praosSignature = SignedKES {getSig = SigMockKES 79153084487575395325654052123023531118 (SignKeyMockKES (VerKeyMockKES 0) 24 1000000)}, praosExtraFields = PraosExtraFields {praosCreator = CoreId 0, praosRho = CertifiedVRF {certifiedNatural = 192073865295245739852082027563274242533, certifiedProof = CertMockVRF 0}, praosY = CertifiedVRF {certifiedNatural = 8736354422877217896250586508442215061, certifiedProof = CertMockVRF 0}}}}}, simpleBody = SimpleBody {simpleTxs = []}}
CoreId 1 SwitchToChain (Nothing,Origin) (Just (BlockNo 1),At (Block {blockPointSlot = SlotNo 24, blockPointHash = 25b1ae66}))
CoreId 1 [TraceLabelPeer (CoreId 0) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (CoreId 2) (Left FetchDeclineChainNotPlausible)]
CoreId 1 [TraceLabelPeer (CoreId 0) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (CoreId 2) (Left FetchDeclineChainNotPlausible)]
CoreId 1 Recv CoreId 2 MsgStartBatch
CoreId 1 [TraceLabelPeer (CoreId 2) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (CoreId 0) (Left (FetchDeclineConcurrencyLimit FetchModeBulkSync 2))]
CoreId 1 Recv CoreId 2 MsgBlock SimpleBlock {simpleHeader = SimpleHeader {simpleHeaderHash = e44ad819, simpleHeaderStd = SimpleStdHeader {simplePrev = GenesisHash, simpleSlotNo = SlotNo 29, simpleBlockNo = BlockNo 1, simpleBodyHash = 6720d660, simpleBlockSize = 3}, simpleHeaderExt = SimplePraosExt {simplePraosExt = PraosFields {praosSignature = SignedKES {getSig = SigMockKES 281117183420071284315832754297023637025 (SignKeyMockKES (VerKeyMockKES 2) 29 1000000)}, praosExtraFields = PraosExtraFields {praosCreator = CoreId 2, praosRho = CertifiedVRF {certifiedNatural = 188088890777148388162503948613687488904, certifiedProof = CertMockVRF 2}, praosY = CertifiedVRF {certifiedNatural = 43892829808500798462927256150386975282, certifiedProof = CertMockVRF 2}}}}}, simpleBody = SimpleBody {simpleTxs = []}}
CoreId 1 [TraceLabelPeer (CoreId 2) (Left FetchDeclineChainNotPlausible),TraceLabelPeer (CoreId 0) (Left (FetchDeclineConcurrencyLimit FetchModeBulkSync 2))]
CoreId 1 Recv CoreId 0 MsgBatchDone
CoreId 1 Recv CoreId 2 MsgBatchDone
TICK SlotNo 30

I suspect node 2 is irrelevant, so here is the same trace with the node 2 mentions removed.

TICK SlotNo 29
CoreId 0 TraceForgeEvent (SlotNo 29) (SimpleBlock {simpleHeader = SimpleHeader {simpleHeaderHash = 613437f7, simpleHeaderStd = SimpleStdHeader {simplePrev = BlockHash 25b1ae66, simpleSlotNo = SlotNo 29, simpleBlockNo = BlockNo 2, simpleBodyHash = e4e0e620, simpleBlockSize = 82}, simpleHeaderExt = SimplePraosExt {simplePraosExt = PraosFields {praosSignature = SignedKES {getSig = SigMockKES 192071407222283496340282674609397094613 (SignKeyMockKES (VerKeyMockKES 0) 29 1000000)}, praosExtraFields = PraosExtraFields {praosCreator = CoreId 0, praosRho = CertifiedVRF {certifiedNatural = 268989868028154701625513075929944158380, certifiedProof = CertMockVRF 0}, praosY = CertifiedVRF {certifiedNatural = 22537260893503605589691117142147148122, certifiedProof = CertMockVRF 0}}}}}, simpleBody = SimpleBody {simpleTxs = [UnsafeTx (fromList [(b378192f,0)]) [("c",644),("a",356)],UnsafeTx (fromList [(b378192f,2)]) [("a",678),("c",322)],UnsafeTx (fromList [(b378192f,1)]) [("c",670),("b",330)]]}})
CoreId 0 SwitchToChain (Just (BlockNo 1),At (Block {blockPointSlot = SlotNo 24, blockPointHash = 25b1ae66})) (Just (BlockNo 2),At (Block {blockPointSlot = SlotNo 29, blockPointHash = 613437f7}))
CoreId 1 []
CoreId 1 []
CoreId 1 [TraceLabelPeer (CoreId 0) (Left FetchDeclineChainNotPlausible)]
CoreId 1 [TraceLabelPeer (CoreId 0) (Left FetchDeclineChainNotPlausible)]
CoreId 1 [TraceLabelPeer (CoreId 0) (Left FetchDeclineChainNotPlausible)]
CoreId 1 [TraceLabelPeer (CoreId 0) (Right [At (Block {blockPointSlot = SlotNo 24, blockPointHash = 25b1ae66})])]
CoreId 1 Send CoreId 0 MsgRequestRange ChainRange (At (Block {blockPointSlot = SlotNo 24, blockPointHash = 25b1ae66})) (At (Block {blockPointSlot = SlotNo 24, blockPointHash = 25b1ae66}))
CoreId 1 [TraceLabelPeer (CoreId 0) (Left FetchDeclineInFlightThisPeer)]
CoreId 1 Recv CoreId 0 MsgStartBatch
CoreId 1 Recv CoreId 0 MsgBlock SimpleBlock {simpleHeader = SimpleHeader {simpleHeaderHash = 25b1ae66, simpleHeaderStd = SimpleStdHeader {simplePrev = GenesisHash, simpleSlotNo = SlotNo 24, simpleBlockNo = BlockNo 1, simpleBodyHash = 6720d660, simpleBlockSize = 3}, simpleHeaderExt = SimplePraosExt {simplePraosExt = PraosFields {praosSignature = SignedKES {getSig = SigMockKES 79153084487575395325654052123023531118 (SignKeyMockKES (VerKeyMockKES 0) 24 1000000)}, praosExtraFields = PraosExtraFields {praosCreator = CoreId 0, praosRho = CertifiedVRF {certifiedNatural = 192073865295245739852082027563274242533, certifiedProof = CertMockVRF 0}, praosY = CertifiedVRF {certifiedNatural = 8736354422877217896250586508442215061, certifiedProof = CertMockVRF 0}}}}}, simpleBody = SimpleBody {simpleTxs = []}}
CoreId 1 SwitchToChain (Nothing,Origin) (Just (BlockNo 1),At (Block {blockPointSlot = SlotNo 24, blockPointHash = 25b1ae66}))
CoreId 1 [TraceLabelPeer (CoreId 0) (Left FetchDeclineChainNotPlausible)]
CoreId 1 [TraceLabelPeer (CoreId 0) (Left FetchDeclineChainNotPlausible)]
CoreId 1 [TraceLabelPeer (CoreId 0) (Left (FetchDeclineConcurrencyLimit FetchModeBulkSync 2))]
CoreId 1 [TraceLabelPeer (CoreId 0) (Left (FetchDeclineConcurrencyLimit FetchModeBulkSync 2))]
CoreId 1 Recv CoreId 0 MsgBatchDone
TICK SlotNo 30

I haven't revisited the BF logic in detail, but note the likely suspects FetchDeclineInFlightThisPeer and FetchDeclineConcurrencyLimit . From what I remember of Issue 1147, I suspect one of those decisions (probably the later one, ConcurrencyLimit) should be revisited after the MsgBatchDone and before the onset of s30 but is apparently not. HTH.

@dcoutts
Copy link
Contributor

dcoutts commented Jan 10, 2020

Thanks! I still think my basic approach should work, but I noticed a probable bug in my solution. I've updated branch dcoutts/issue-1147. I think it's worth checking if that solves these failure cases.

@nfrisby
Copy link
Contributor Author

nfrisby commented Jan 10, 2020

Thanks! I still think my basic approach should work, but I noticed a probable bug in my solution. I've updated branch dcoutts/issue-1147. I think it's worth checking if that solves these failure cases.

It fails with the same trace as in my previous comment :( Here's what I tested with:

$ git cherry-pick f3d43680 924e818e a269cfcf
[nfrisby/test-infraslot-delays 634f8bc6] Fix for block fetch decline decisions not being revisited appropriately
 Author: Duncan Coutts <[email protected]>
 Date: Mon Jan 6 13:50:57 2020 +0000
 1 file changed, 1 insertion(+), 1 deletion(-)
[nfrisby/test-infraslot-delays 59796bc2] Improve comment doc on PeerFetchStatusReady constructor
 Author: Duncan Coutts <[email protected]>
 Date: Mon Jan 6 13:54:40 2020 +0000
 1 file changed, 4 insertions(+)
[nfrisby/test-infraslot-delays 737716b1] Try and fix it properly this time.
 Author: Duncan Coutts <[email protected]>
 Date: Fri Jan 10 10:05:48 2020 +0000
 2 files changed, 12 insertions(+), 2 deletions(-)
$ git lg -10
* 737716b1 - (HEAD -> nfrisby/test-infraslot-delays) Try and fix it properly this time. (31 seconds ago) <Duncan Coutts>
* 59796bc2 - Improve comment doc on PeerFetchStatusReady constructor (31 seconds ago) <Duncan Coutts>
* 634f8bc6 - Fix for block fetch decline decisions not being revisited appropriately (31 seconds ago) <Duncan Coutts>
* 3d459374 - DO NOT MERGE tracing for advising Duncan (55 seconds ago) <Nicolas Frisby>
* 9f2e1483 - DISCARD narrow tests (4 days ago) <Nicolas Frisby>
* 16f4ca7e - Revert "TODROP: anticipate Issue 1147" (4 days ago) <Nicolas Frisby>
* 0d2fa187 - (origin/nfrisby/test-infraslot-delays) TODROP: anticipate Issue 1147 (9 weeks ago) <Nicolas Frisby>
$

I just pushed 3d45937 to origin/nfrisby/issue-229-issue-1147-for-duncan. Does that build locally for you, @dcoutts? Because of 9f2e148, test-consensus only runs the 1147 repro, and because of 3d45937 it dumps a bunch of trace events to stderr.

@nfrisby
Copy link
Contributor Author

nfrisby commented May 19, 2020

I'm Closing this PR. The upcoming work for Praos is higher priority and will supersede it.

Status: PR #1705 resolved the BlockFetch bug that this PR had revealed (Issue #1147).

@nfrisby nfrisby closed this May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus issues related to ouroboros-consensus testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants