Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLS Consensus Structure #637

Open
Qiao-Jin opened this issue Aug 27, 2021 · 33 comments
Open

BLS Consensus Structure #637

Qiao-Jin opened this issue Aug 27, 2021 · 33 comments

Comments

@Qiao-Jin
Copy link
Contributor

Qiao-Jin commented Aug 27, 2021

Summary or problem description

Since we decide to import on-chain random number into Neo system, there has been a lot of work on this topic such as neo-project/neo#1657, neo-project/neo#2456, neo-project/neo#2019, neo-project/neo#2477, neo-project/neo#2481, neo-project/neo#2476, neo-project/neo#2470, #590,
#596, etc.

However, currently the random number function in use is imported in #596, which is simply generating pseudo random number by a C# random number generator. We know this is only a template scheme and need to be replaced soon or later.

 

Do you have any solution you want to propose?

One choice is BLS random number which is introduced in neo-project/neo#1657 (comment). In this solution, BLS signature upon a certain message (say, prevHash) of multiple nodes are adopted as seeds to create a nonce as random number, which cannot be decided by any single node or nodes less than certain amount (m). On the other hand, signatures of any m nodes can be used to calculate the unique nonce.

There are 2 tasks in this option:

  1. Is there available BLS algorithm?
  2. How to distribute BLS shared private keys?

For the first one, we have already developed BLS algorithm in this repo: https://github.com/Qiao-Jin/BLSTest.

For the second one, we would like to introduce some updates to current consensus algorithm as follows:

  1. Pre-consensus steps, WHICH IS ONLY EXECUTED AT EPOCH START IF THE LIST OF VALIDATOR CHANGES IN NEW EPOCH
    image

(1) Each validator should broadcast a consensus payload, BLSSecretKey, which includes 2 sections:

a. Shared BLS private keys to other validators, which is encrypted by receiver's ECC public keys respectively (this feature will be imported in another issue),
b. Corresponding BLS public keys which can be used to check BLS private keys & BLS signatures.

Meanwhile each validator would also collect & check BLSSecretKey payloads from others and keep a local trusted validator list.

(2) After certain time (say, one block) each validator should broadcast a consensus payload, BLSTrustee, which includes a list of signatures of PrevHash, by shared private keys it trusted. This payload can be regarded as a "Vote" towards the private key sharers. Meanwhile each validator would also collect & check BLSTrustee payloads from others (signature & public key checking, etc).

(3) After certain time (i.e. one block), the speaker will broadcast a consensus payload, BLSPrepareRequest, which includes a node list with the length of (f + 1). Typically this list should be the f + 1 validators with the most "votes", within the BLSTrustee payloads the speaker collected.

(4) The other validators will vote upon this BLSPrepareRequest by sending BLSPrepareResponse, and broadcast BLSCommit upon receiving enough BLSPrepareResponses.

(5) Eventually BLS signer list would be decided when enough BLSCommits received.

In these steps changeview & recovery strategy will also be adopted just similiar to current consensus logic. Consensus will continue to next step if and only if BLS signer list is agreed. Please note this is consensus upon BLS signer list and will not affect validator list.

  1. Consensus steps for each block

(1) Speaker will broadcast PrepareRequest which contains transaction list, while other validators reply with PrepareResponse. PrepareRequest and PrepareResponse will include BLS signatures which is used for verification instead of original ECC signature.

(2) Each validator will still broadcast Commit message upon receiving enough PrepareRequest/Response. Commit message will contain ECC signature of block header as well as calculated nonce.

(3) Each validator will broadcast new block upon receiving enough Commit messages as before. Changeview & Recovery strategy are still the same.

 

The main changes in this strategy:

a. Adding consensus steps for BLS signer list
b. Adding BLS signature into PrepareRequest/Response, and move original ECC signature to Commit messages

Detailed consensus timeline are shown as below:

image

And logical steps of BLS consensus:

image

 

Where in the software does this update applies to?

  • Consensus
  • Plugins
@roman-khimov
Copy link
Contributor

  1. What if elected node misses pre-consensus setup (it can be down)?
  2. What if elected node is restarted from the genesis (which happens from time to time)?
  3. Can we derive shared things from on-chain data (validators can save something in the NEO contract for example)?

Not an easy thing, it touches a lot of core things, so we need to be very careful here.

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Sep 13, 2021

  1. What if elected node misses pre-consensus setup (it can be down)?

As we assume that at least 2/3 of the consensus nodes should act normally during consensus, this rule should hold as well
in this section.

  1. What if elected node is restarted from the genesis (which happens from time to time)?

In these steps changeview & recovery strategy will also be adopted just similiar to current consensus logic, which means that backward nodes can sync consensus messages of these steps too.

  1. Can we derive shared things from on-chain data (validators can save something in the NEO contract for example)?

That's actually what we come up with at the start, but this would probably need modification to neo-core (i.e. more native contracts & syscalls), so we switched to another direction instead

@roman-khimov
Copy link
Contributor

As we assume that at least 2/3 of the consensus nodes should act normally during consensus, this rule should hold as well
in this section.

I don't think the protocol allows for that in key distribution (pre-consensus, joint random secret sharing) phase. It works fine with 2/3 assumption after pre-consensus, but if we're missing some node during pre-consensus we can't really proceed, try to see what happens if there is no data from node C in neo-project/neo#1657 or it intentionally lies sending broken data to other parties.

It seems to be a critical problem to me because in general we expect BFT. And I've actually tried to find some alternatives to this BLS scheme, there are some, but they all suffer from this weakness, having just a set of 256r1 keys (one for each node as we already have) is not enough and to set things up correctly all participants must behave correctly initially.

Can we derive shared things from on-chain data (validators can save something in the NEO contract for example)?

That's actually what we come up with at the start, but this would probably need modification to neo-core (i.e. more native contracts & syscalls), so we switched to another direction instead

It looks to me as long-term state sharing problem, doing it on-chain is probably more appropriate exactly because it easily solves "restart from genesis" case, the node picks up everything from the chain and just works.

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Nov 2, 2021

@shargon @vncoelho @igormcoelho Could you please provide opinions upon this issue?

@shargon
Copy link
Member

shargon commented Nov 2, 2021

Not an easy thing, it touches a lot of core things, so we need to be very careful here.

I think that it's a big and difficult change for only random numbers, and something that maybe is not used by projects in thounsands of blocks, and in all of them, will be processed. I think that each project can create his own random algorithm. Is not a great feature, what is the use case? loteries only?

@Jim8y
Copy link
Contributor

Jim8y commented Nov 2, 2021

@shargon heavy and complex as it is, it is a price worth paying. Random number is critical for NFT, Meteverse, and GameFI. Eth2.0 is even investing VDF calculating hardware for random number.

@vncoelho
Copy link
Member

vncoelho commented Nov 2, 2021

I could try to generate a new mathematical model for it, like the Mixed Intenger Programming Model we introduced for dBFT 2.0 and 3.0 with double speakers.

I believe it is a complex change and we need time. However, random numbers are useful for plenty of applications.

I also believe we need double speakers as soon as possible in order to improve stability.

Let me discuss with @igormcoelho and return to you soon.

@vncoelho
Copy link
Member

vncoelho commented Nov 4, 2021

We started the discussions here and are trying to understand it more carefully before we advance.
@igormcoelho will also soon reply here.


Why BLSSigners list of voted signers is f+1?
What happens if f of these elected ones are offline?


  • " Adding BLS signature into PrepareRequest/Response, and move original ECC signature to Commit messages". What is this about moving original ECC signature to Commit? Currently we use the original header ECC signed by validators in the Commit.

@igormcoelho
Copy link
Contributor

Very interesting issue @Qiao-Jin and interesting points from @roman-khimov, let me try to understand better this idea. I agree that random number is a fundamental feature.
Is this process going to replace dbft 2.0 or just operate after new node elections?
From what I understood, all nodes should be prepared to continue, otherwise they wont be able to participate in this BLS scheme, right? The issue with 3f+1 here is that, the "f" that may be broken during this pre-phase may not be the "f" that become broken during consensus time... so by starting with bad nodes it's not good in my perspective, there must be some fix to that.
Regarding nodes being restarted, they must save important data like this to disk, in order to be recovered later. And in worst case scenario, nodes could fallback into legacy number generator if situation gets critical, but I believe that disk saving feature and guarantee of good starting nodes could fix that.

@roman-khimov
Copy link
Contributor

Regarding nodes being restarted, they must save important data like this to disk, in order to be recovered later.

The disk can crash in unrecoverable way, the server can physically burn into ashes, all of these scenarios must be handled appropriately.

@igormcoelho
Copy link
Contributor

igormcoelho commented Nov 4, 2021

Fully agree @roman-khimov, we need to handle all scenarios. Perhaps we first need to be clear of the usefulness of a guaranteed random generation.
(U1) for smart contracts
(U2) to guarantee that ALL proposed blocks (during spork events) will have same random nonce

For me, one of the most important feature is for number (2), I don't know if this is @Qiao-Jin original intention... but this may impact our recovery strategies

R1) keep signatures on disk, for "light crashes"
R2) fallback to legacy random, if not enough BLS-capable nodes (and inform this on block header) - this may break (U2)
R3) request antecipate elections, as non BLS-capable nodes may be considered "permanently failed"

I don't know if you all discussed R3, but anyway we need to proceed generating blocks even during an election period, so this may need (R2) anyway.

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Nov 5, 2021

Why BLSSigners list of voted signers is f+1?

This is because if signer limit is set to equal or less than f, there is a chance that f malicious consensus nodes (speaker included) can possibly collude and get the random number before consensus.

What happens if f of these elected ones are offline?

You can see that there are 2 stages in new algorithm, pre-consensus stage and consensus stage. In pre-consensus stage shared BLS private keys are exchanged between nodes, and a node is elected if and only if more than 2f+1 nodes vote for this node. And consequentially, a (f + 1) elected node set is also approved by at least (2f + 1) nodes together in pre-consensus stage. A node is elected in pre-consensus stage doesn't necessarily means that the node itself is elected but the its shared private keys are approved by at least (2f + 1) nodes. So it doesn't matter if the elected nodes crash in consensus stage, as its shared private keys are already received by other consensus nodes and can be exchanged within different nodes, and used to generate BLS signatures in dbft stage.

So if and only if all consensus nodes that have received the shared secret keys of the elected nodes in pre-consensus stage, fail in consensus stage, consensus will fail. But that is inconsistent with current assumption. As the amount of nodes that have received the shared secret keys of the elected nodes in pre-consensus stage must be bigger or equal to (2f + 1) - at most f malicious nodes = f + 1(Otherwise we cannot get an elected set & pass pre-consensus stage), if all of them fail in conensus stage, that would mean the "good" nodes in consensus stage would be less than 2f + 1.

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Nov 5, 2021

What is this about moving original ECC signature to Commit? Currently we use the original header ECC signed by validators in the Commit.

Currently, as block content is already certain from prepare-request step, ECC signature can be assigned into prepare-request & response payloads by consensus nodes. However, in new algorithm, we don't know the hash of the block in prepare-request & reponse step as we won't know the block nonce (final BLS signature) until BLS signatures are exchanged between nodes.

So in new algorithm BLS signature is inserted into prepare-request & reponse payloads for 2 usage: (1) BLS signature exchanging (2) payload verification. And once a node receives (2f + 1) prepare requests & responses it will finally know the hash of next block, and assign ECC signature to the commit payload it broadcasts. Multi-signature will be put to block payload itself once a consensus node receives enough commit payloads and broadcasts corresponding block.

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Nov 5, 2021

Is this process going to replace dbft 2.0 or just operate after new node elections?

Yes this is a replacement, some features in original dbft steps are also changed (i.e signature)

all nodes should be prepared to continue, otherwise they wont be able to participate in this BLS scheme

Yes, actually things are similiar in dbft, if a node is not prepared it cannot participate in dbft.

the "f" that may be broken during this pre-phase may not be the "f" that become broken during consensus time

Yes they are probably different, but it will not matter as long as shared secret keys of elected nodes have been spread over different nodes. It is their shared secret key that is used in dbft stage, not the nodes themselves. So these "elected nodes" can fail without interrupting dbft stage, as long as there are no less than 2f + 1 "good" nodes in dbft stage. Detailed safety proof please refer to the last paragraph of #637 (comment).

Regarding nodes being restarted, they must save important data like this to disk, in order to be recovered later.

Yes, we can treat newly added BLS consensus payloads in same way as current consensus payloads, i.e. relaying, storage, view changing, recovery.

@vncoelho
Copy link
Member

vncoelho commented Nov 8, 2021

Thanks @Qiao-Jin, for your careful explanation and dedication to this advancement.

  • During the pre-consensus:
    • All 3f + 1 will possibly share their BLSSecretKey payloads, encrypting their BLS private keys to their own ECC public keys (which are known as validators)? What will guarantee that the corresponding BLS private key is from a specific node i? Will we have a public list of BLS Public Keys associated with nodes?
    • On step (2), every node will itself sign an array with its vote, forming the BLSTrustee? It does not need to be f+1, right (it can be less or more, even 3f+1)? Because the BLSPrepareRequest will count and select, at least, f+1?

@Qiao-Jin
Copy link
Contributor Author

Qiao-Jin commented Nov 9, 2021

@vncoelho Thanks for your time & interest! Below are my thoughts upon these questions:

All 3f + 1 will possibly share their BLSSecretKey payloads, encrypting their BLS private keys to their own ECC public keys (which are known as validators)?

No, the BLS shared secret keys are encrypted by others public keys. For example, let's say that there are all together 4 validators, A, B, C, D. For A, it would need to broadcast 3 set of keys, P(a -> b), P(a -> c) & P(a -> d) for each of the other validators, B, C and D respectively. P(a -> b) is encrypted by the public ECC key of B so that only B can see this BLS shared key. And similiarly, P(a -> c) would be encrypted by C's public ECC key and P(a -> d) by D's. And on the other hand, the BLSSecretKey payload that contains P(a -> b), P(a -> c) & P(a -> d) would be signed by A's private ECC key so that B, C and D can verify this payload and be sure that this payload is from A.

image

Will we have a public list of BLS Public Keys associated with nodes?

Yes, it is included in BLSSecretKey payload. It is used for other nodes to verify corresponding secret keys and signatures.

On step (2), every node will itself sign an array with its vote, forming the BLSTrustee? It does not need to be f+1, right (it can be less or more, even 3f+1)? Because the BLSPrepareRequest will count and select, at least, f+1?

Yes exactly.

@vncoelho
Copy link
Member

vncoelho commented Jan 5, 2023

What is the current status, @Qiao-Jin @doubiliu?

Now that @erikzhang pushed BLS12-381.
What is the idea for integrating BLS to consensus validators signatures?

@Jim8y
Copy link
Contributor

Jim8y commented Jan 5, 2023

Now that @erikzhang pushed BLS12-381. What is the idea for integrating BLS to consensus validators signatures?

This is halted for now, along with the random number. Community has concerns adding BLS to the consensus. Might push forward later.

@vncoelho
Copy link
Member

vncoelho commented Jan 5, 2023

Community has concerns?

Which community, @Liaojinghui? Maybe I missed some discussions.
As far as I know it had been successfully added in the Ethereum Beacon Chain for allowing Consensus to communicate more compactly.

If the BLS12-381 is now easy accessible, I believe it is more straightforward for us now. We just need more tests on the curve itself, if they are not enough.

@Jim8y
Copy link
Contributor

Jim8y commented Jan 5, 2023

@vncoelho the problems with BLS are basically: 1. it requires trusted setup process, malicious node may block the DKG process, that is also the major reason why solutions discussed in this issue is so complex. 2. updating the consensus is not only a technique problem but more importantly a finantial problem.

@vncoelho
Copy link
Member

vncoelho commented Jan 5, 2023

1. it requires trusted setup process

this is for m-n signatures, @Liaojinghui.
In case of consensus the Validators privkeys does not change often (just one new nodes join). We just need to do that once.

malicious node may block the DKG process, that is also the major reason why solutions discussed in this issue is so complex.

This is quite easy to detect, because each node can check the parings. If someone is propagating a wrong paring we have a proof of it that is very clear.

@Jim8y
Copy link
Contributor

Jim8y commented Jan 5, 2023

@vncoelho I know what you mean, but validators will change, therefore we need to process BLS setup regularly. Indeed we can detect malicious key, but still, it is possible that they can delay the setup process to make new validators unable to start the consensu on time... as you can see the discussion from here #637 (comment)

@vncoelho
Copy link
Member

vncoelho commented Jan 5, 2023

@Liaojinghui, this process you are seeing too complex is not.
It is just a set of key exchanges by signing an integer, which represents nodes position in the aggregated keys.
Validators does not change so often.
We can even have all the committee already registered and we just filter the valid keys. After PostPersist we all know which CN are valid.
Thus, if CN change inside the committee itself we do not need a new setup.


Even if we are considering dBFT 3.0 with one consensus node open to community we can still use a Hybrid version.
We check the Community Node as a ECDSA and other ones with BLS.

@Jim8y
Copy link
Contributor

Jim8y commented Jan 5, 2023

If you think the validators updating is not a problem, then all issues are gone, we got planty of time to address all problems that may happen.......

@Jim8y
Copy link
Contributor

Jim8y commented Jan 5, 2023

Oh BTW,

We can even have all the committee already registered and we just filter the valid keys. After PostPersist we all know which CN are valid.

This is impossible, DKG requires nodes to have index,,,,,,it is impossible to do a general setup.

@vncoelho
Copy link
Member

vncoelho commented Jan 5, 2023

@Liaojinghui, I really do not see this an issue because it has a solution.
We know nodes who signed, we can even design with a m-to-m scheme and not a threshold signatures, because NEO number of nodes are few.

@Jim8y
Copy link
Contributor

Jim8y commented Jan 5, 2023

@Liaojinghui, I really do not see this an issue because it has a solution. We know nodes who signed, we can even design with a m-to-m scheme and not a threshold signatures, because NEO number of nodes are few.

Solution we do have one, which is presented in this issue. I do wish to add BLS to the consensus, that is exactly what i wish to use to generate random number.

@vncoelho
Copy link
Member

vncoelho commented Jan 5, 2023

This is impossible, DKG requires nodes to have index,,,,,,it is impossible to do a general setup.

@Liaojinghui, as I had understood the index has a correspondent point in G_2, let's say MK_3 for when node_3 signed.
MK_3 is created by signing BLS_privKey_3 and a random integer.
By knowing who signed we use the correspond pre-agreed MK_i values to verify.

@vncoelho
Copy link
Member

vncoelho commented Jan 5, 2023

@Liaojinghui, maybe this is a good reference: https://medium.com/cryptoadvance/bls-signatures-better-than-schnorr-5a7fe30ea716
If you want to check "Subgroup multisignature scheme (m-of-n multisig)"

Upon registering they will need to present their BLS pubkey.
Thus, we would ask all CN to sign the updated index.
Just when process is done (all nodes signs all (ai⋅pk_i)×H(P, i)) the new node would be enabled to join.
Meanwhile, CN keep running normally because The New CN is not completely registered.

@vncoelho
Copy link
Member

vncoelho commented Jan 5, 2023

But, in fact, @Liaojinghui, this is an upgrade that is not urgent, because our number of nodes is not large enough to obtain a good trade-off in the benefit of using BLS.

Furthermore, if we move to the idea of Community Node with Double Speakers possibility, we will need a Hybrid mode to make the process even easier.

@Jim8y
Copy link
Contributor

Jim8y commented Jan 5, 2023

@Liaojinghui, maybe this is a good reference: https://medium.com/cryptoadvance/bls-signatures-better-than-schnorr-5a7fe30ea716 If you want to check "Subgroup multisignature scheme (m-of-n multisig)"

Upon registering they will need to present their BLS pubkey. Thus, we would ask all CN to sign the updated index. Just when process is done (all nodes signs all (ai⋅pk_i)×H(P, i)) the new node would be enabled to join. Meanwhile, CN keep running normally because The New CN is not completely registered.

Actually, if we could leverage the chain, it would be much easier. But as you can find out in this issue, we are trying our best to avoid touching the chain.

@vncoelho
Copy link
Member

vncoelho commented Jan 5, 2023

I did not understand what you mean by Leveraging the Chain and Not Touching.

@Jim8y
Copy link
Contributor

Jim8y commented Jan 5, 2023

I did not understand what you mean by Leveraging the Chain and Not Touching.

Never mind, that is our previous discussion that I dont even remember where it happend.... too long ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants