Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine the BF logic for updating the PeerFetchState with async block adds #1845

Closed
nfrisby opened this issue Mar 24, 2020 · 1 comment · Fixed by #1850
Closed

Refine the BF logic for updating the PeerFetchState with async block adds #1845

nfrisby opened this issue Mar 24, 2020 · 1 comment · Fixed by #1850
Assignees
Labels
block-fetch Issues related to block fetch component. bug Something isn't working consensus issues related to ouroboros-consensus
Milestone

Comments

@nfrisby
Copy link
Contributor

nfrisby commented Mar 24, 2020

While developing a new approach to the ThreadNet tests, I noticed something unexpected in the BlockFetch threads' Tracer output.

As of PR #1709, the BF client adds blocks to CDB asynchronously instead of synchronously. Since the addFetchedBlock call in the BF client now returns immediately, the BF client now updates its PeerFetchStatus (to indicate the fetched block is no-longer in flight) before the CDB updates its current chain (to include the new fetched block) (or is it ChainDB.getIsFetched and not the current chain?).

As a result I'm seeing the node's BF fetch logic thread immediately add a duplicate fetch request for the same block. Once the BF client updates its peer status, the relevant inputs to the fetch logic thread now look the same as they did before it originally made the fetch request: the (fetched) block is not in flight and it is not in the CDB's current chain -- it's about to be, but the fetch logic doesn't know that.

This is causing the BF client in the ThreadNet to fetch every (?) block twice. This Issue is resolved when the the BF client no longer causes the node's fetch logic thread to repeat its requests.

@duncan shared some thoughts here https://input-output-rnd.slack.com/archives/GCW2S19CM/p1585076440303700

@nfrisby nfrisby added bug Something isn't working consensus issues related to ouroboros-consensus labels Mar 24, 2020
@mrBliss
Copy link
Contributor

mrBliss commented Mar 25, 2020

@nfrisby Changing the following line: https://github.com/input-output-hk/ouroboros-network/blob/e71aa72adc7e876cda4ff19b5cc0900167fa9804/ouroboros-consensus/src/Ouroboros/Consensus/NodeKernel.hs#L338 to

  addFetchedBlock _pt = void . ChainDB.addBlock chainDB

should fix it. This will block until the block has been written to disk and processed by chain selection (until the block is from the future).

Note that doing things async gave us a 20-30% boost in bulk sync speed. This change will undo that. Maybe a better compromise is to add the ability to wait until a block has been written to the VolatileDB so that getIsFetched will say yes for the block, but not until the block has been processed by chain selection. I'd hope this fixes the double fetches and doesn't give up too much performance in the process.

@dcoutts dcoutts added the block-fetch Issues related to block fetch component. label Mar 25, 2020
@mrBliss mrBliss self-assigned this Mar 25, 2020
@mrBliss mrBliss added this to the S9 2020-03-26 milestone Mar 25, 2020
iohk-bors bot added a commit that referenced this issue Mar 25, 2020
1850: BlockFetch: wait until the block has been written to disk r=mrBliss a=mrBliss

Fixes #1845.

Co-authored-by: Thomas Winant <[email protected]>
@iohk-bors iohk-bors bot closed this as completed in acea7f8 Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
block-fetch Issues related to block fetch component. bug Something isn't working consensus issues related to ouroboros-consensus
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants