Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UponRule deduplication issue in QBFT #1493

Closed
corverroos opened this issue Nov 29, 2022 · 0 comments
Closed

UponRule deduplication issue in QBFT #1493

corverroos opened this issue Nov 29, 2022 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@corverroos
Copy link
Contributor

corverroos commented Nov 29, 2022

Problem to be solved

qbft unit test failed due to qbft sanity check: bug: duplicate rule, but different round. See https://github.com/ObolNetwork/charon/actions/runs/3575841940/jobs/6012896513.

This is however incorrect since it is expected. The flow was:

        qbft_internal_test.go:269: 00:11.041 1 => pre_prepare@12
        qbft_internal_test.go:146: 00:11.042 1 => pre_prepare@12 => 2 (dropped)
        qbft_internal_test.go:146: 00:11.042 1 => pre_prepare@12 => 3 (dropped)
        qbft_internal_test.go:243: 00:11.042 1 => pre_prepare@12 -> 4@12 ~= justified_pre_prepare
        qbft_internal_test.go:269: 00:11.042 4 => prepare@12
        qbft_internal_test.go:146: 00:11.043 4 => prepare@12 => 2 (dropped)
        qbft_internal_test.go:243: 00:11.046 1 => pre_prepare@12 -> 1@12 ~= justified_pre_prepare
        qbft_internal_test.go:269: 00:11.046 1 => prepare@12
        qbft_internal_test.go:146: 00:11.047 1 => prepare@12 => 4 (dropped)
        qbft_internal_test.go:240: 00:12.032 2@12 change to 13 ~= round_timeout
        qbft_internal_test.go:269: 00:12.032 2 => round_change@13
        qbft_internal_test.go:146: 00:12.033 2 => round_change@13 => 4 (dropped)
        qbft_internal_test.go:146: 00:12.033 2 => round_change@13 => 1 (dropped)
        qbft_internal_test.go:240: 00:12.037 3@12 change to 13 ~= round_timeout
        qbft_internal_test.go:269: 00:12.037 3 => round_change@13
        qbft_internal_test.go:146: 00:12.038 3 => round_change@13 => 1 (dropped)
        qbft_internal_test.go:240: 00:12.042 4@12 change to 13 ~= round_timeout
        qbft_internal_test.go:269: 00:12.042 4 => round_change@13
        qbft_internal_test.go:146: 00:12.043 4 => round_change@13 => 3 (dropped)
        qbft_internal_test.go:243: 00:12.043 4 => round_change@13 -> 2@13 ~= quorum_round_changes
        qbft_internal_test.go:269: 00:12.043 2 => pre_prepare@13
        qbft_internal_test.go:146: 00:12.044 2 => pre_prepare@13 => 3 (dropped)
        qbft_internal_test.go:243: 00:12.044 2 => pre_prepare@13 -> 2@13 ~= justified_pre_prepare
        qbft_internal_test.go:243: 00:12.044 2 => pre_prepare@13 -> 4@13 ~= justified_pre_prepare
        qbft_internal_test.go:269: 00:12.044 4 => prepare@13
        qbft_internal_test.go:146: 
            	Error Trace:	/home/runner/work/charon/charon/core/qbft/qbft_internal_test.go:351
            	            				/home/runner/work/charon/charon/core/qbft/qbft_internal_test.go:146
            	Error:      	qbft sanity check: bug: duplicate rule, but different round
            	Test:       	TestQBFT/drop_30%_const
  • Peer 1 was on round 12 and had a justified_pre_prepare from itself for that round.

  • The other peers timed-out and went to round 13.

  • It didn't receive any round 13 round change message (all dropped), so was still on round 12.

  • Peer 2 got f+1 round changes, so sent a pre_prepare for round 13.

  • Peer 1 paniced when it received this. It should not.

This is not correct:

		if prevRound != msgRound {
			// Upon rules are either for the current round,
			// or for a future round followed by a round change (which clears this map).
			panic("bug: duplicate rule, but different round")
		}

Proposed solution

Since messages are justified, any rule can be triggered for a future (or a previous) round at any point.
Rather include round in the dedup key:

map[struct{UponRule,Round}]bool
@OisinKyne OisinKyne added the bug Something isn't working label Nov 29, 2022
@corverroos corverroos self-assigned this Dec 2, 2022
obol-bulldozer bot pushed a commit that referenced this issue Dec 6, 2022
Since some messages are justified, justified rules can be triggered for a future (or a previous) round at any point. So include round when deduping.

category: bug
ticket: #1493
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants