Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compensate for block processing time in DBFT timer #1959

Closed
roman-khimov opened this issue Sep 24, 2020 · 1 comment · Fixed by #1960
Closed

Compensate for block processing time in DBFT timer #1959

roman-khimov opened this issue Sep 24, 2020 · 1 comment · Fixed by #1960
Labels
Discussion Initial issue state - proposed but not yet accepted

Comments

@roman-khimov
Copy link
Contributor

Introduction
One of the core Neo consensus characteristics is its target inter-block time distance (SecondsPerBlock/MillisecondsPerBlock). In general we want every block to be offset from the previous one by exactly the time span configured. But even if we're not considering view changes, usually it's not the case for a real network. There are various sources for additional delays, but we want to concentrate on one particular aspect here --- block processing.

How it works now
DBFT's InitializeConsensus() sets the timer using configured value adjusted by block_received_time. block_received_time value is set in OnPersistCompleted, so it happens after the block processing and not a lot of things happen between block_receive_time setting and ChangeTimer() invocation. I would even suggest that removing this adjustment won't affect the node in any noticeable way (backups are not adjusted, just to mention).

Related work
#626 tried to use in-block timestamps as a reference, but as there are no guarantees over nodes clocks synchronization they're not really reliable, we can only use local clocks for any decisions.

#819 approached this problem very aggressively by moving block_receive_time right after InitializeConsensus(), but this created even more problems (any view change makes this timestamp waaaay off immediately, for example).

Block processing and consensus
One thing we've noticed during our testing is that under load block processing time increases up to the level where it can exceed the target time between blocks. It's not very surprising, but what's more interesting is that current consensus logic doesn't respond to that in any way. If we're using 1 second as an interval and the block takes 3 seconds to persist, the timer is set for 1 second anyway, even though it would make more sense to start creating the next block immediately. Even under moderate load block processing time can be big enough to affect the time interval between blocks.

The proposal
We can move block_received_time setting to the earliest point in time where we definitely know that the new block exists, that is in the CheckCommits(). Subsequent consensus initialization via OnPersistCompleted() will use that time as a reference the same way it currently does making an adjustment to configured timeout value.

One additional check needs to be added though and that is the height check, because OnPersistCompleted() can also be called without the node ever going through CheckCommits(). So to avoid using wrong timestamp we also save the height of the block created with CheckCommits().

I think this change is safe, backwards compatible and applicable to both Neo versions. It doesn't give us perfect block time, but it makes us a little closer to that by accounting for one particular source of delays. The more loaded the network, the more visible will be the effect of this patch.

It has some relation to TPS performance characteristics. For the case when block processing time doesn't exceed the target interval it's not noticeable, but it makes the node produce a little more of smaller blocks. For the case when block processing time exceeds the target interval it improves TPS a little by making the node work more instead of waiting for the timer.

Potential future improvements
Theoretically we can move block_receive_time assignment to PrepareRequest handler to try eliminating most of network delays associated with PrepareResponse and Commit collection, but that's more invasive and dangerous, thus requerying more thorough testing under different conditions (neo-resilience might probably help here).

Neo Version

  • Neo 2
  • Neo 3

Where in the software does this update applies to?

  • Consensus
@roman-khimov roman-khimov added the Discussion Initial issue state - proposed but not yet accepted label Sep 24, 2020
@AnnaShaleva
Copy link
Member

Testing setup

To test proposed changes we've used the current master branch of Neo and the patch branch. For Golang node we've tested the master branch of neo-go against the similar patch branch. The following nodes setups were taken into consideration for both implementations:

  1. Single-node network with 1000ms per block under the load of
    • 30 worker threads (maximum RPS)
    • 1 worker thread with fixed request rate of 1000 RPS
  2. Four-nodes network connected with RPC node with 5000ms per block under the load of
    • 30 worker threads (maximum RPS)
    • 1 worker thread with fixed request rate of 200 RPS
  3. Four-nodes network connected with RPC node with 15000ms per block under the load of
    • 30 worker threads (maximum RPS)
    • 1 worker thread with fixed request rate of 200 RPS

In these setups, we're mostly interested in the delta time between accepted blocks (or milliseconds per block), which is measured as the difference between corresponding blocks timestamps. Specifically, we're looking at how close this value is to the one provided in nodes configuration. This configurable value is denoted as target value on plots below.

Benchmarks

For the single-node networks, both bench setups are showing the expected results. Proposed changes make delta time between blocks very close to 1000 ms for both C# and Golang nodes, which is about 10% closer to the target value for the C# node in both 30wrk and 1000rate setups:

ms_per_block_single_30_wrk_1000_ms

ms_per_block_single_1000_rate_1000_ms

The results of the four-nodes network tests show the same tendency of reducing the delta time between blocks. It's worth to mention that these results are not so stable because we're restricted by the hardware. Still, we have several percent improvements for the four-nodes test with 5000 milliseconds per block target value:

ms_per_block_four_nodes_30_wrk_5000_ms

ms_per_block_four_nodes_200_rate_5000_ms

The picture is even better with the target value = 15000 milliseconds per block:

ms_per_block_four_nodes_30_wrk_15000_ms

ms_per_block_four_nodes_200_rate_15000_ms

It's also necessary to emphasize that proposed changes almost do not affect the average TPS values.

roman-khimov added a commit to nspcc-dev/dbft that referenced this issue Sep 25, 2020
Make block timing more accurate under load. BlockIndex is needed to
distinguish between consensus-produced block and the one added via regular
block addition mechanism.

See neo-project/neo#1959 and neo-project/neo#1960.
AnnaShaleva added a commit to AnnaShaleva/neo that referenced this issue Sep 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Initial issue state - proposed but not yet accepted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants