fixed an error in use smartbft #5024

pfi79 · 2024-10-11T08:44:00Z

fixed an error after deleting a node when it participated in block signing immediately before deletion

Recently the TestAddAndRemoveNodeWithoutStop test has been terminating with an error very often in unit tests.

It went like this:

prepared a new config with node 5 removed from the cluster
send it to the orderers
The 5th node participated in the consensus to accept this block
deleted the 5th node
sent a simple transaction to check that everything was working.
In the very first stage of consensus, the smartBFT library takes the previous commit with signatures and checks for correctness. It sends it to the fabric function VerifyConsenterSig. In the previous commit the signature of node 5 is present, but in the current config it is not. This causes an error here.

My suggestion for a fix:

keep the previous set of nodes in memory
if we don't find a signature in the current set and the block is equal and/or less than the current block number, check in the previous set

Perhaps there is a better and more beautiful solution.

…gning immediately before deletion Signed-off-by: Fedor Partanskiy <[email protected]>

yacovm

In the very first stage of consensus, the smartBFT library takes the previous commit with signatures and checks for correctness.

If I remember correctly, we only add the previous signatures, when we have leader rotation enabled, and in Fabric we don't have leader rotation enabled. If the leader doesn't send the previous signatures, then the followers won't verify them.

However in the tests, the decisions per leader is 0... it is therefore odd that the previous signatures verification is the cause this behavior.

Can you perhaps add a stack trace dump in the place where we verify the consenter signature?

Just to give some background - we introduced the previous commit signatures in order to know how to prune the blacklist. However, we don't use leader rotation nor the blacklist in Fabric, so before we go ahead and fix something, let's be 100% sure of it's cause.

My suggestion for a fix:

keep the previous set of nodes in memory

if we don't find a signature in the current set and the block is equal and/or less than the current block number, check in the previous set

Regarding your fix - I don't think we should leniently just check the previous set if we don't find a signature in the current set. Moreover, in your commit, you check the previous set if you don't find an identity, and not a signature. This function is used also to collect commit messages, so we should not carelessly change it like this.

I think there is a more "holistic" fix:

When verifying a block, we inspect the Fabric header number and look at the block number. If the last block we have committed (we check the runtime config) is the previous block to the block we're verifying, then we proceed as usual.
Else, if it is a block we have not committed yet (I know this is not possible, but for completeness, I am writing this case) we return an error.
Else, it's a block from the past, so we fetch the block with this number from the ledger, and look at its last config block. We then fetch that config block, and proceed to use it, to verify the requested signature.

The above "fix" is very easy to test, as it only changes a single function and preserves the intended verification logic.

pfi79 · 2024-10-11T12:31:12Z

Can you perhaps add a stack trace dump in the place where we verify the consenter signature?

easily

goroutine 126 [running]:
github.com/hyperledger/fabric/orderer/consensus/smartbft.(*Verifier).VerifyConsenterSig(0xc000032e80, {0x5, {0xc000447340, 0x6f, 0x6f}, {0xc0009a2b80, 0x77, 0x77}}, {{0xc000c2c000, 0xa5cf, ...}, ...})
	/Users/f/go/src/github.com/hyperledger/fabric/orderer/consensus/smartbft/verifier.go:218  0xb14
github.com/hyperledger-labs/SmartBFT/internal/bft.(*View).verifyPrevCommitSignatures(0xc0000f6600, {0xc000e9a240, 0x4, 0x4}, 0x0)
	/Users/f/go/src/github.com/hyperledger/fabric/vendor/github.com/hyperledger-labs/SmartBFT/internal/bft/view.go:631  0x7c6
github.com/hyperledger-labs/SmartBFT/internal/bft.(*View).verifyProposal(0xc0000f6600, {{0xc000efe580, 0x75, 0x75}, {0xc00063e230, 0x49, 0x49}, {0xc0009567b0, 0x4, 0x4}, ...}, ...)
	/Users/f/go/src/github.com/hyperledger/fabric/vendor/github.com/hyperledger-labs/SmartBFT/internal/bft/view.go:588  0xb45
github.com/hyperledger-labs/SmartBFT/internal/bft.(*View).processProposal(0xc0000f6600)
	/Users/f/go/src/github.com/hyperledger/fabric/vendor/github.com/hyperledger-labs/SmartBFT/internal/bft/view.go:386  0x5e5
github.com/hyperledger-labs/SmartBFT/internal/bft.(*View).doPhase(0xc0000f6600)
	/Users/f/go/src/github.com/hyperledger/fabric/vendor/github.com/hyperledger-labs/SmartBFT/internal/bft/view.go:291  0x46
github.com/hyperledger-labs/SmartBFT/internal/bft.(*View).run(0xc0000f6600)
	/Users/f/go/src/github.com/hyperledger/fabric/vendor/github.com/hyperledger-labs/SmartBFT/internal/bft/view.go:277  0x215
github.com/hyperledger-labs/SmartBFT/internal/bft.(*View).Start.func1()
	/Users/f/go/src/github.com/hyperledger/fabric/vendor/github.com/hyperledger-labs/SmartBFT/internal/bft/view.go:140  0x1c
created by github.com/hyperledger-labs/SmartBFT/internal/bft.(*View).Start in goroutine 85
	/Users/f/go/src/github.com/hyperledger/fabric/vendor/github.com/hyperledger-labs/SmartBFT/internal/bft/view.go:139  0x20d

pfi79 · 2024-10-11T12:34:12Z

In the very first stage of consensus, the smartBFT library takes the previous commit with signatures and checks for correctness.

If I remember correctly, we only add the previous signatures, when we have leader rotation enabled, and in Fabric we don't have leader rotation enabled. If the leader doesn't send the previous signatures, then the followers won't verify them.

However in the tests, the decisions per leader is 0... it is therefore odd that the previous signatures verification is the cause this behavior.

Can you perhaps add a stack trace dump in the place where we verify the consenter signature?

Just to give some background - we introduced the previous commit signatures in order to know how to prune the blacklist. However, we don't use leader rotation nor the blacklist in Fabric, so before we go ahead and fix something, let's be 100% sure of it's cause.

My suggestion for a fix:

keep the previous set of nodes in memory

if we don't find a signature in the current set and the block is equal and/or less than the current block number, check in the previous set

Regarding your fix - I don't think we should leniently just check the previous set if we don't find a signature in the current set. Moreover, in your commit, you check the previous set if you don't find an identity, and not a signature. This function is used also to collect commit messages, so we should not carelessly change it like this.

I think there is a more "holistic" fix:

When verifying a block, we inspect the Fabric header number and look at the block number. If the last block we have committed (we check the runtime config) is the previous block to the block we're verifying, then we proceed as usual.

Else, if it is a block we have not committed yet (I know this is not possible, but for completeness, I am writing this case) we return an error.

Else, it's a block from the past, so we fetch the block with this number from the ledger, and look at its last config block. We then fetch that config block, and proceed to use it, to verify the requested signature.

The above "fix" is very easy to test, as it only changes a single function and preserves the intended verification logic.

error itself:

2024-10-11 15:32:51.289 MSK 181e WARN [orderer.consensus.smartbft.consensus] processProposal -> 1 received bad proposal from 1: failed verifying consenter signature of 5: node with id of 5 doesn't exist channel=testchannel

pfi79 · 2024-10-11T12:46:30Z

In the very first stage of consensus, the smartBFT library takes the previous commit with signatures and checks for correctness.

If I remember correctly, we only add the previous signatures, when we have leader rotation enabled, and in Fabric we don't have leader rotation enabled. If the leader doesn't send the previous signatures, then the followers won't verify them.
However in the tests, the decisions per leader is 0... it is therefore odd that the previous signatures verification is the cause this behavior.
Can you perhaps add a stack trace dump in the place where we verify the consenter signature?
Just to give some background - we introduced the previous commit signatures in order to know how to prune the blacklist. However, we don't use leader rotation nor the blacklist in Fabric, so before we go ahead and fix something, let's be 100% sure of it's cause.

My suggestion for a fix:

keep the previous set of nodes in memory

if we don't find a signature in the current set and the block is equal and/or less than the current block number, check in the previous set

Regarding your fix - I don't think we should leniently just check the previous set if we don't find a signature in the current set. Moreover, in your commit, you check the previous set if you don't find an identity, and not a signature. This function is used also to collect commit messages, so we should not carelessly change it like this.
I think there is a more "holistic" fix:

When verifying a block, we inspect the Fabric header number and look at the block number. If the last block we have committed (we check the runtime config) is the previous block to the block we're verifying, then we proceed as usual.

Else, if it is a block we have not committed yet (I know this is not possible, but for completeness, I am writing this case) we return an error.

Else, it's a block from the past, so we fetch the block with this number from the ledger, and look at its last config block. We then fetch that config block, and proceed to use it, to verify the requested signature.

The above "fix" is very easy to test, as it only changes a single function and preserves the intended verification logic.

error itself:
2024-10-11 15:32:51.289 MSK 181e WARN [orderer.consensus.smartbft.consensus] processProposal -> 1 received bad proposal from 1: failed verifying consenter signature of 5: node with id of 5 doesn't exist channel=testchannel

Total:

The processProposal() function is executed in the COMMITTED phase
after PrePrepare is generated and received, verifyProposal is executed, where PrevCommitSignatures are passed to
after that everything is passed to the verifyPrevCommitSignatures function without any conditions.
there each signer with a signature is verified by calling VerifyConsenterSig from the fabric
and already VerifyConsenterSig ends with an error.

pfi79 · 2024-10-11T12:50:10Z

@yacovm you can see for yourself.

put the stopper here
and run this test in debug

out of 10 runs, one of them will definitely work.

pfi79 · 2024-10-11T13:18:28Z

Regarding your fix - I don't think we should leniently just check the previous set if we don't find a signature in the current set. Moreover, in your commit, you check the previous set if you don't find an identity, and not a signature. This function is used also to collect commit messages, so we should not carelessly change it like this.

I think there is a more "holistic" fix:

When verifying a block, we inspect the Fabric header number and look at the block number. If the last block we have committed (we check the runtime config) is the previous block to the block we're verifying, then we proceed as usual.

Else, if it is a block we have not committed yet (I know this is not possible, but for completeness, I am writing this case) we return an error.

Else, it's a block from the past, so we fetch the block with this number from the ledger, and look at its last config block. We then fetch that config block, and proceed to use it, to verify the requested signature.

The above "fix" is very easy to test, as it only changes a single function and preserves the intended verification logic.

I have the block number check.

Do you want me to get the config and participant set in the same place?

yacovm · 2024-10-11T14:06:57Z

I don't think we should be sending the previous commit signatures if we don't need them. Why do that? It's just excess CPU cycles, network bandwidth and disk storage.

I think we should not attach them if leader rotation is disabled.

pfi79 · 2024-10-11T14:16:28Z

I don't think we should be sending the previous commit signatures if we don't need them. Why do that? It's just excess CPU cycles, network bandwidth and disk storage.

I think we should not attach them if leader rotation is disabled.

What if at the time the 5th node is removed, the leader is the 5th node?

leader change
the previous signatures would be
The 5th node can participate.

and we'll be in the same situation again

yacovm · 2024-10-11T15:02:27Z

I don't think we should be sending the previous commit signatures if we don't need them. Why do that? It's just excess CPU cycles, network bandwidth and disk storage.
I think we should not attach them if leader rotation is disabled.

What if at the time the 5th node is removed, the leader is the 5th node?

leader change

the previous signatures would be

The 5th node can participate.

and we'll be in the same situation again

If we're not attaching signatures of the previous block then how can we have a problem in verifying the signatures of the previous block which we do not attach?

pfi79 · 2024-10-11T15:04:57Z

I don't think we should be sending the previous commit signatures if we don't need them. Why do that? It's just excess CPU cycles, network bandwidth and disk storage.
I think we should not attach them if leader rotation is disabled.

What if at the time the 5th node is removed, the leader is the 5th node?

leader change

the previous signatures would be

The 5th node can participate.

and we'll be in the same situation again

If we're not attaching signatures of the previous block then how can we have a problem in verifying the signatures of the previous block which we do not attach?

during the change of leadership?

pfi79 · 2024-10-11T15:08:25Z

I don't think we should be sending the previous commit signatures if we don't need them. Why do that? It's just excess CPU cycles, network bandwidth and disk storage.
I think we should not attach them if leader rotation is disabled.

What if at the time the 5th node is removed, the leader is the 5th node?

leader change

the previous signatures would be

The 5th node can participate.

and we'll be in the same situation again

If we're not attaching signatures of the previous block then how can we have a problem in verifying the signatures of the previous block which we do not attach?

Or is it only needed when rotation is on?
and when rotation is disabled it is not necessary, even when the leader has changed?

yacovm · 2024-10-11T15:10:26Z

exactly, it's only used when periodical leader rotation is active. A regular leader failover has nothing to do with it.

pfi79 · 2024-10-11T15:15:16Z

exactly, it's only used when periodical leader rotation is active. A regular leader failover has nothing to do with it.

Okay.
then I'll refine the task.
to leave adding signatures of the previous comitee only if leader rotation is enabled and DecisionsInView==0

right?

yacovm · 2024-10-11T15:22:52Z

exactly, it's only used when periodical leader rotation is active. A regular leader failover has nothing to do with it.

Okay. then I'll refine the task. to leave adding signatures of the previous comitee only if leader rotation is enabled and DecisionsInView==0

right?

I don't understand how decisions in view is related to what we're talking about. Did you mean DecisionsPerLeader?

pfi79 · 2024-10-11T15:31:59Z

I don't understand how decisions in view is related to what we're talking about. Did you mean DecisionsPerLeader?

DecisionsPerLeader > 0 - indicator that rotation is enabled

DecisionsInView==0 - indicator that the leader has just been changed
Or is the second one unnecessary?

yacovm · 2024-10-11T15:45:53Z

I don't understand how decisions in view is related to what we're talking about. Did you mean DecisionsPerLeader?

DecisionsPerLeader > 0 - indicator that rotation is enabled

DecisionsInView==0 - indicator that the leader has just been changed Or is the second one unnecessary?

The former is a configuration parameter.
The latter is a byproduct of the consensus protocol.

So we need to consider the former, not the latter.

pfi79 · 2024-10-11T15:47:33Z

I don't understand how decisions in view is related to what we're talking about. Did you mean DecisionsPerLeader?

DecisionsPerLeader > 0 - indicator that rotation is enabled
DecisionsInView==0 - indicator that the leader has just been changed Or is the second one unnecessary?

The former is a configuration parameter. The latter is a byproduct of the consensus protocol.

So we need to consider the former, not the latter.

hyperledger-labs/SmartBFT#599

take a look, please.

pfi79 requested a review from a team as a code owner October 11, 2024 08:44

fixed an error after deleting a node when it participated in block si…

3cc8f0f

…gning immediately before deletion Signed-off-by: Fedor Partanskiy <[email protected]>

pfi79 force-pushed the fix-error-remove-node-bft branch from d7c45ac to 3cc8f0f Compare October 11, 2024 08:51

yacovm requested changes Oct 11, 2024

View reviewed changes

pfi79 closed this Oct 11, 2024

pfi79 deleted the fix-error-remove-node-bft branch October 11, 2024 15:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed an error in use smartbft #5024

fixed an error in use smartbft #5024

pfi79 commented Oct 11, 2024

yacovm left a comment •

edited

Loading

pfi79 commented Oct 11, 2024

pfi79 commented Oct 11, 2024

pfi79 commented Oct 11, 2024

pfi79 commented Oct 11, 2024

pfi79 commented Oct 11, 2024

yacovm commented Oct 11, 2024

pfi79 commented Oct 11, 2024

yacovm commented Oct 11, 2024

pfi79 commented Oct 11, 2024

pfi79 commented Oct 11, 2024

yacovm commented Oct 11, 2024

pfi79 commented Oct 11, 2024

yacovm commented Oct 11, 2024

pfi79 commented Oct 11, 2024 •

edited

Loading

yacovm commented Oct 11, 2024

pfi79 commented Oct 11, 2024

fixed an error in use smartbft #5024

fixed an error in use smartbft #5024

Conversation

pfi79 commented Oct 11, 2024

yacovm left a comment • edited Loading

Choose a reason for hiding this comment

pfi79 commented Oct 11, 2024

pfi79 commented Oct 11, 2024

pfi79 commented Oct 11, 2024

pfi79 commented Oct 11, 2024

pfi79 commented Oct 11, 2024

yacovm commented Oct 11, 2024

pfi79 commented Oct 11, 2024

yacovm commented Oct 11, 2024

pfi79 commented Oct 11, 2024

pfi79 commented Oct 11, 2024

yacovm commented Oct 11, 2024

pfi79 commented Oct 11, 2024

yacovm commented Oct 11, 2024

pfi79 commented Oct 11, 2024 • edited Loading

yacovm commented Oct 11, 2024

pfi79 commented Oct 11, 2024

yacovm left a comment •

edited

Loading

pfi79 commented Oct 11, 2024 •

edited

Loading