-
Notifications
You must be signed in to change notification settings - Fork 59
ABCI++ RFC #254
ABCI++ RFC #254
Changes from 1 commit
ec1a4fb
d9a57bd
ef2c4be
493e37d
644d517
d30bc99
fd57d67
d312734
472715d
b98e524
6d9ece0
f5a0aa1
b3f46fa
d8dffc2
baf8f31
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,220 @@ | ||||||
# RFC 004: ABCI++ | ||||||
|
||||||
## Changelog | ||||||
|
||||||
- January 11, 2020: initialized | ||||||
|
||||||
## Author(s) | ||||||
|
||||||
- Dev (@valardragon) | ||||||
- Sunny (@sunnya97) | ||||||
|
||||||
## Context | ||||||
|
||||||
ABCI is the interface between the consensus engine and the application. | ||||||
It defines when the application can talk to consensus during the execution of a blockchain. | ||||||
At the moment, the application can only act at one phase in consensus, immediately after a block has been finalized. | ||||||
|
||||||
This restriction on the application prohibits numerous features for the application, including many scalability improvements that are now better understood than whan ABCI was first written. | ||||||
sunnya97 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
For example, many of the scalability proposals can be boiled down to "Make the miner / block proposers / validators do work, so the network does not have to". | ||||||
This includes optimizations such as tx-level signature aggregation, state transition proofs, etc. | ||||||
Furthermore, many new security properties cannot be achieved in the current paradigm, as the application cannot enforce validators do more than just finalize txs. | ||||||
This includes features such as threshold cryptography, and guaranteed IBC connection attempts. | ||||||
We propose introducing three new phases to ABCI to enable these new features. | ||||||
|
||||||
#### Prepare Proposal phase | ||||||
|
||||||
This phase aims to allow the block proposer to perform more computation, to reduce load on all other full nodes, and lite clients in the network. | ||||||
ValarDragon marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
It is intended to enable features such as batch optimizations on the transaction data (e.g. signature aggregation, zk rollup style validity proofs, etc.), enabling stateless blockchains with validator provided authentication paths, etc. | ||||||
|
||||||
This new phase will only be executed by the block proposer. The application will take in the block header and raw transaction data output by the consensus engine's mempool. It will then return block data that is prepared for gossip on the network, and additional fields to include into the block header. | ||||||
|
||||||
#### Process Proposal Phase | ||||||
|
||||||
This phase aims to allow applications to determine validity of a new block proposal, and execute computation on the block data, prior to the blocks finalization. | ||||||
It is intended to enable applications to reject block proposals with invalid data, and to enable alternate pipelined execution models. (Such as Ethereum-style immediate execution) | ||||||
|
||||||
This phase will be executed by all full nodes upon receiving a block, though on the application side it can do more work in the even that the current node is a validator. | ||||||
ValarDragon marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
#### Vote Extension Phase | ||||||
|
||||||
This phase aims to allow applications to require their validators do more than just validate blocks. | ||||||
Example usecases of this include validator determined price oracles, validator guaranteed IBC connection attempts, and validator based threshold crypto. | ||||||
|
||||||
This adds an app-determined data field that every validator must include with their vote, and these will thus appear in the header. | ||||||
|
||||||
We include a more detailed list of features / scaling improvements that are blocked, and which new phases resolve them at the end of this document. | ||||||
|
||||||
<image src="images/abci.png" style="float: left; width: 40%;" /> <image src="images/abci++.png" style="float: right; width: 40%;" /> | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe a silly question, but why are there three new phases and four new arrows? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Its because now Tendermint has to do a query to the application to verify the additional application data in a vote. However, its kind of odd to call that a new phase, since its a part of 'vote extension' |
||||||
On the top is the existing definition of ABCI, and on the bottom is the proposed ABCI++. | ||||||
|
||||||
## Proposal | ||||||
|
||||||
Below we suggest an API to add these three new phases. | ||||||
In this document, sometimes the final round of voting is referred to as precommit for clarity in how it acts in the Tendermint case. | ||||||
|
||||||
### Prepare Proposal | ||||||
|
||||||
*Note, APIs in this section will change after Vote Extensions, we list the adjusted APIs further in the proposal.* | ||||||
|
||||||
Prepare proposal allows the block proposer to perform application-dependent work in a block, to lower the amount of work the rest of the network must do. batch optimizations to a block, which is a key optimization for scaling. This introduces the following method | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
(Or otherwise standardize the way that these function names are used in this section) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, there's a sentence fragment in the middle of this paragraph There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although As a result, all validators have to go through the block processing cycle, only to find that the block cannot be accepted. In this case, the only validator to hold accountable is the faulty proposer. I hope that I have overlooked something here. Is there any protection mechanism against malicious proposers that misuse There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is all resolved by signatures. The proposer today can already propose whatever bytes they want. It is up to other nodes in the network to parse that these bytes are valid. So If you mean that if other nodes expect a batch optimization (e.g. signature aggregation) and the proposer doesn't do it, what do they do? In that case they should reject the proposal (and potentially slash that proposer depending on economics desired). They can reject pretty quickly, since this is a parsing / signature verification problem. |
||||||
```rust | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sooooo... not to be That Person, but the rest of our RFCs use Go syntax for things like this. This particular example is pretty straightforward, but there's some stuff in Rust syntax that isn't obvious to non-Rust programmers (e.g., Or even in proto? I think all ABCI methods/types need to ultimately be specified in proto, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Happy to switch this to |
||||||
fn PrepareProposal(Block) -> BlockData | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Above in
While it seems that this returns the prepared BlockData (which gives the app the power to modify BlockData as it pleases) there doesn't seem to be a way for the app to add additional data to the header? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for pointing that out. The API for that is postponed to Vote Extensions, let me move that description accordingly. (This was separated to ease a potential implementation order, since its relatively straightforward without altering the header, but when altering the header many types change) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This would be a great thing to note in the RFC body as well! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this is NOT the actual proposed API, correct? The actual proposal is below, in the vote extension section? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit confused by this section. What is |
||||||
``` | ||||||
where `BlockData` is a type alias for however data is internally stored within the consensus engine. In Tendermint Core today, this is `[]Tx`. | ||||||
|
||||||
The application may read the entire block proposal, and mutate the block data field. Mutated transactions will still get removed from the mempool later on, as the mempool rechecks all transactions after a block is executed. | ||||||
ValarDragon marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
### Process Proposal | ||||||
|
||||||
Process proposal sends the block data to the state machine, prior to running the last round of votes on the state machine. This enables features such as allowing validators to reject a block according to whether state machine deems it valid, and changing block execution pipeline. | ||||||
|
||||||
We introduce three new methods, | ||||||
```rust | ||||||
fn VerifyHeader(header: Header, isValidator: bool) -> ResponseVerifyHeader {...} | ||||||
fn ProcessProposal(block: Block) -> ResponseProcessProposal {...} | ||||||
fn RevertProposal(height: usize, round: usize) {...} | ||||||
``` | ||||||
where | ||||||
```rust | ||||||
struct ResponseVerifyHeader { | ||||||
accept_header: bool, | ||||||
evidence: Vec<Evidence> | ||||||
} | ||||||
struct ResponseProcessProposal { | ||||||
accept_block: bool, | ||||||
evidence: Vec<Evidence> | ||||||
} | ||||||
``` | ||||||
|
||||||
Upon receiving a block header, every validator runs `VerifyHeader(header, isValidator)`. The reason for why `VerifyHeader` is split from `ProcessProposal` is due to the later sections for Preprocess Proposal and Vote Extensions, where there may be application dependent data in the header that must be verified before accepting the header. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's still not clear to me why you have split up these functions? When will There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The block header is received before the entirety of the block data. IIRC, the flow at the moment is you receive the block header, then you verify the commit, and then acquire block parts. However verifying the commit's application data requires an ABCI request. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At the moment, you receive the proposal which contains the I think we could sync up on this though. I've had the idea for some time that we should restructure how the block is sent across i.e. we can still break it down into parts but send the header (and possibly the commit and evidence) as one complete part (sent first) and the txs split across into the remaining parts. Bucky first mentioned this in an issue somewhere. I believe it might have been when we talked about removing validator address from the commit sigs This would allow us to deduplicate some of the bytes that get sent and saved as well as break up There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh so is the commit for the prior block not verified until you've already gotten the entire block? That seems like a potential DOS concern. Happy to sync up on this! I've spent a bit of time analyzing the different erasure encoding schemes for block gossip protocols in the past, which may be useful to consider in any API redesigns here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I believe so |
||||||
If the returned `ResponseVerifyHeader.accept_header` is false, then the validator must precommit nil on this block, and reject all other precommits on this block. `ResponseVerifyHeader.evidence` is appended to the validators local `EvidencePool`. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To me this implies that applications can now add evidence to the consensus engine which will then be committed in blocks. I think this should be made more explicit alongside with what are the advantages / use cases of it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, why can evidence be added in both There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the evidence? An invalid header and invalid proposal? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I see. There is no guarantee that it will get committed into blocks, I'll try to clarify the text for this. The usecase is essentially if an application wants to slash a proposer who proposes something invalid that the network should never agree upon. (This makes sense when the application wants to guarantee extremely high throughput, and considers getting in the way of this a slashable offence) The actual evidence is left for the application to construct, and theres multiple designs they could do. (If they had a VM execution model, then the standard watchtower like fault detection / proof suffices. Otherwise, they can fall back to more of an economic security type slash. E.g. have each validator produce a signature on this evidence claiming its bad along with the node whose at fault, and if theres a sufficient number of signatures, then slash the corresponding proposer) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Opened a discussion here if we wish to continue: #263 |
||||||
|
||||||
Upon receiving an entire block proposal (in the current implementation, all "block parts"), every validator runs `ProcessProposal(block)`. If the returned `ResponseProcessProposal.accept_block` is false, then the validator must precommit nil on this block, and reject all other precommits on this block. `ResponseProcessProposal.evidence` is appended to the validators local `EvidencePool`. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I though the idea was that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It must return whether the block is valid or not before the final round of voting. (So it runs in parallel with prevoting, and must return prior to precommitting) This is to enable validity checks to occur on the tx data. The application can have an async process that continues to run in parallel with consensus after this method has returned though. |
||||||
|
||||||
Once a validator knows that consensus has failed to be achieved for a given block, it must run `RevertProposal(block.height, block.round)`, in order to signal to the application to revert any potentially mutative state changes it may have made. In Tendermint, this occurs when incrementing rounds. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure we need this method. Can't application revert proposal if it doesn't receive There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. It is not clear when the revert should happen. If in the new round the same block (modulo time, round number) is proposed, we might not even need to revert. If the values of time and round are used to process a proposal then we have to redo. However, if the round number is used in process proposal, this seems problematic anyway: we have the problem that in Tendermint two validators may decide in different rounds, and the "canonic" decision round is only fixed when the canonical commit is fixed in the next block. If the canonical commit is for a different round than my decision round, I might need to revert quite late. Not sure that this can be done in a sound way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a great point. A consequence of this is that immediate execution cannot happen safely if the execution depends on time or round number (/cc @liamsi ) I think it makes sense to remove There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Definitely agree.
Could you explain this a little more? Also, generally, immediate execution is not safe in the sense that your are applying state before consensus has been reached There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Why would this be the case?
IMO, execution or rather computing the state root for inclusion in a proposed block, should only depend on included Tx.
It is not really applied (as in persisted in the state machine) when the block is proposed, the state root after applying the state transitions (txs) is included in the block with those Tx. The state machine actually applies and persisits the state after consensus still (it's only that the nodes reached consensus on the already updated state root as well). Am I missing something obvious? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with @liamsi that the safe approach is that computing depend on Tx only. However, from discussions I had, my impression was that there are use-cases where round and/or time should also be used. I just wanted to raise that this adds complications. We need to be precise and explicit about at what point processing is started/reverted/persisted. I don't think it is obvious, and different people might have slightly different ideas where/when this should happen. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am lost as to why round or time is needed here as well. It seems we can do away with the Revert API and solely rely on data that is not subject to subjectivity of validators (i.e. round and time). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, but block execution can depend on the latest time, an example is timestamp based inflations, or smart contracts using the latest time. I don't know of a use case for using the round number in the state machine though, so perhaps thats not needed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For these kinds of applications, would be the time of the previous block work as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That should work if they adjust their time-based computation (e.g. inflation calculations) to occur at the beginning of the block rather than at the end. |
||||||
|
||||||
RFC: How do we handle the scenario where honest node A finalized on round x, and honest node B finalized on round x + 1? (e.g. when 2f precommits are publicly known, and a validator precommits themself but doesn't broadcast, but they increment rounds) Is this a real concern? The state root derived could change if everyone finalizes on round x+1, not round x, as the state machine can depend non-uniformly on timestamp. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. RFC flag (How do we make all validators agree on what round they ran process proposal on. This primarily matters for immediate execution) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You might bold "RFC" here, since these comment flags don't show up in certain view modes and they get kind of buried in all the other discussion. By the way, the context for all my formatting comments is that we're planning on sending this around the community a bit, and I want to make sure everything as as easy to parse as possible! 😁 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm lost as to why this even matters. Why would a node have to revert if it finalized at a different round? Why is that part of the state root? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would matter if the round number is used in the execution of the block in an immediate execution scheme. Seems from later comments that this won't matter though |
||||||
|
||||||
The application is expected to cache the block data for later execution. | ||||||
|
||||||
The `isValidator` flag is set according to whether the current node is a validator of a full node. This is intended to allow for beginning validator dependent computation that will be included later in vote extensions. (An example of this is threshold decryptions of ciphertexts) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably not too important, but I would imagine the application already knows who the validators are (and if it is a validator) as it needs to keep track for validator updates and staking and so forth There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At least at the moment, I don't think the application knows if this node is a validator, since I'm not aware of it being given the consensus public key over ABCI There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, well ABCI needs to have access to all the validators public keys else how does it update validator power in
I guess the application might hide whether the particular node corresponds to one of these validators
ValarDragon marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
### DeliverTx rename to FinalizeBlock | ||||||
|
||||||
After implementing `ProcessProposal`, txs no longer need to be delivered during the block execution phase. Instead, they are already in the state machine. Thus `BeginBlock, DeliverTx, EndBlock` can all be replaced with a single ABCI method for `ExecuteBlock`. Internally the application may still structure its method for executing the block as `BeginBlock, DeliverTx, EndBlock`. However, it is overly restrictive to enforce that the block be executed after it is finalized. There are multiple other, very reasonable pipelined execution models one can go for. So instead we suggest calling this succession of methods `FinalizeBlock`. We propose the following API | ||||||
|
||||||
Replace the `BeginBlock, DeliverTx, EndBlock` ABCI methods with the following method | ||||||
```rust | ||||||
fn FinalizeBlock() -> ResponseFinalizeBlock | ||||||
``` | ||||||
where `ResponseFinalizeBlock` has the following API, in terms of what already exists | ||||||
```rust | ||||||
struct ResponseFinalizeBlock { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should also include There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And perhaps also app hash for our folks who want to remain with delayed execution models 😄 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's to tell tendermint which blocks to keep and which can be deleted:
https://docs.tendermint.com/master/spec/abci/abci.html#commit See also: https://github.com/tendermint/tendermint/blob/master/UPGRADING.md#block-retention But I do not know much more either 😬 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah that's a pretty accurate description. Tendermint will just prune blocks below that height There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll add it accordingly! Is this node local (e.g. generated from their app.toml), or is part of the honesty assumption that a node satisfies this? |
||||||
updates: ResponseEndBlock, | ||||||
tx_results: Vec<ResponseDeliverTx> | ||||||
} | ||||||
``` | ||||||
`ResponseEndBlock` should then be renamed to `ConsensusEngineUpdates` and `ResponseDeliverTx` should be renamed to `ResponseTx`. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is exactly goes in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Renamed |
||||||
|
||||||
### Vote Extensions | ||||||
|
||||||
Vote Extensions allow applications to force their validators to do more than just validate within consensus. This is done by allowing the application to add more data to their votes, in the final round of voting. (Namely the precommit) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should be wary about how this may affect timeouts in consensus There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, I think this new delay is 1 IPC round trip delays + application vote extension time. Should we add knobs for the application to indicate how long they expect the vote extensions should take in ms? AFAIU, consensus timeouts aren't super tuned, like they are currently set as an estimate of whats reasonable, but aren't all consistently based on parameters like IPC overhead, gossip delay, application execution time. (Though we are probably going a bit in this direction for variable block time) |
||||||
This additional application data will then appear in the block header. | ||||||
|
||||||
First we discuss the API changes to the vote struct directly | ||||||
```rust | ||||||
fn ExtendVote(height: u64, round: u64) -> (UnsignedAppVoteData, SelfAuthenticatingAppData) | ||||||
fn VerifyVoteExtension(signed_app_vote_data: Vec<u8>, self_authenticating_app_vote_data: Vec<u8>) -> bool | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are vote extensions expected to affect application state? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not when they are created, but yes for when they are included in the header. E.g. if they are self-authenticating, an application may not want them in the header but instead within the block data. This is the case for threshold decryption. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Right but this is will be included in the following header then? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes they will be included in the following header. (But they may be batch optimized due to ProcessProposal) |
||||||
``` | ||||||
|
||||||
There are two types of data that the application can enforce validators to include with their vote. | ||||||
There is data that the app needs the validator to sign over in their vote, and there can be self-authenticating vote data. Self-authenticating here means that the application upon seeing these bytes, knows its valid, came from the validator and is non-malleable. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't really understand these data types - can you explain them more? What is an example of a type of self-authenticating vote data? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, please 🙏 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added an example for self-authenticating vote data, using threshold random beacons as the example! (Its also needed / more important for threshold decryption, but I thought the random beacon may be a simpler example.) I think I'll do a second pass on that writing to explain what a threshold random beacon is in place though, since I assumed its known in whats currently written. |
||||||
|
||||||
The `CanonicalVote` struct will acommodate the `UnsignedAppVoteData` field by adding another string to its encoding, after the `chain-id`. This should not interfere with existing hardware signing integrations, as it does not affect the constant offset for the `height` and `round`, and the vote size does not have an explicit upper bound. (So adding this unsigned app vote data field is equivalent from the HSM's perspective as having a superlong chain-ID) | ||||||
|
||||||
RFC: Please comment if you think it will be fine to have elongate the message the HSM signs, or if we need to explore pre-hashing the app vote data. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. RFC: (Essentially) can the HSM input be arbitrarily long, or do we require pre-hashing data. |
||||||
|
||||||
The flow of these methods is that when a validator has to precommit, Tendermint will first produce a precommit canonical vote without the application vote data. It will then pass it to the application, which will return unsigned application vote data, and self authenticating application vote data. It will bundle the `unsigned_application_vote_data` into the canonical vote, and pass it to the HSM to sign. Finally it will package the self-authenticating app vote data, and the `signed_vote_data` together, into one final Vote struct to be passed around the network. | ||||||
|
||||||
#### Changes to Prepare Proposal Phase | ||||||
|
||||||
There are many use cases where the additional data from vote extensions can be batch optimized. | ||||||
This is mainly of interest when the votes include self-authenticating app vote data that be batched together, or the unsigned app vote data is the same across all votes. | ||||||
To allow for this, we change the PrepareProposal API to the following | ||||||
|
||||||
```rust | ||||||
fn PrepareProposal(Block, UnbatchedHeader) -> (BlockData, Header) | ||||||
``` | ||||||
|
||||||
where `UnbatchedHeader` essentially contains a "RawCommit", the `Header` contains a batch-optimized `commit` and an additional "Application Data" field in its root. This will involve a number of changes to core data structures, which will be gone over in the ADR. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to verify my understanding here: the input currently If so, if you pass in the Also, I would suggest to rename the input to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Re: Variable naming, I think thats a much better naming, will switch to that! Or perhaps would The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Who will validate the batch optimized commit? Tendermint or the application? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Both Tendermint and the application have to validate components of the batch-optimized commit. Tendermint will verify each of the signatures on the votes, whereas the application will verify the validity of the vote extensions. The application data allows for arbitrary inclusion of other application information in the header. (This is actually useful even without vote extensions, e.g. for LazyLedger intermediate state roots)
|
||||||
The `Unbatched` header and `rawcommit` will never be broadcasted, they will be completely internal to consensus. | ||||||
|
||||||
#### IPC communication affects | ||||||
ValarDragon marked this conversation as resolved.
Show resolved
Hide resolved
ValarDragon marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
For brevity in exposition above, we did not discuss the trade-offs that may occur in interprocess communication delays that these changs will introduce. | ||||||
These new ABCI methods add more locations where the application must communicate with the consensus engine. | ||||||
In most configurations, we expect that the consensus engine and the application will be either statically or dynamically linked, so all communication is a matter of at most adjusting the memory model the data is layed out within. | ||||||
This memory model conversion is typically considered negligible, as delay here is measured on the order of microseconds at most, whereas we face milisecond delays due to cryptography and network overheads. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should probably do some performance benchmarking here and make sure that the ABCI++ implementation will be comparably performant to the current implementation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We definitely should, though these IPC concerns at least don't come up in the current case where Tendermint and the SDK are ran in process. I think they'd only come up and be of concern if people used non-golang state machines and did not want to deal with linking the binaries. (Or conversely, used tendermint-rs and didn't want to statically link the binary) When statically /dynamically linked, the overheads are typically pretty negligible, though should be checked empirically. (e.g. slowest cgo function call overhead I could find was 200ns, can't find benchmarks for converting the memory models unfortunately) |
||||||
Thus we ignore the overhead in the case of linked libraries. | ||||||
|
||||||
In the case where the consensus engine and the application are ran out of process, delays can easily become on the order of miliseconds based upon the data sent, thus its important to consider whats happening here. | ||||||
We go through this phase by phase. | ||||||
|
||||||
##### Prepare proposal IPC overhead | ||||||
|
||||||
This requires a round of IPC communication, where both directions are quite large. Namely the proposer communicating an entire block to the application. | ||||||
However, this can be mitigated by splitting up `PrepareProposal` into two distinct, async methods, one for the block IPC communication, and one for the Header IPC communication. | ||||||
|
||||||
Then for chains where the block data does not depend on the header data, the block data IPC communication can proceed in parallel to the prior block's voting phase. (As a node can know whether or not its the leader in the next round) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This parallelism smells like a recipe for a race condition. We'll need to be careful here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, this being important only depends on IPC overhead being high. (I still don't really see a reason to ever run tendermint and the app out of process, since you can statically link them) |
||||||
|
||||||
Furthermore, this IPC communication is expected to be quite low relative to the amount of p2p gossip time it takes to send the block data around the network, so this is perhaps a premature concern until more sophisticated block gossip protocols are implemented. | ||||||
|
||||||
##### Process Proposal IPC overhead | ||||||
|
||||||
This phase changes the amount of time available for the consensus engine to deliver a block's data to the state machine. | ||||||
Before, the block data for block N would be delivered to the state machine upon receiving a commit for block N and then be executed. | ||||||
The state machine would respond after executing the txs and before prevoting. | ||||||
The time for block delivery from the consensus engine to the state machine after this change is the time of receiving block proposal N to the to time precommit on proposal N. | ||||||
It is expected that this difference is unimportant in practice, as this time is in parallel to one round of p2p communication for prevoting, which is expected to be significantly less than the time for the consensus engine to deliver a block to the state machine. | ||||||
|
||||||
##### Vote Extension IPC overhead | ||||||
|
||||||
This has a small amount of data, but does incur an IPC round trip delay. This IPC round trip delay is pretty negligible as compared the variance in vote gossip time. (the IPC delay is typically on the order of 10 microseconds) | ||||||
|
||||||
## Status | ||||||
|
||||||
Proposed | ||||||
|
||||||
## Consequences | ||||||
|
||||||
### Positive | ||||||
|
||||||
* Enables a large number of new features for applications | ||||||
|
||||||
### Negative | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd add one more negative here: I anticipate that things like alternate execution models will lead to a proliferation of new ways for applications to mess things up. Consider how many times we already have users send us consensus errors that stem from non-deterministic applications, for example. Creating more flexibility here will enable a large number of new features, but it will also enable a large number of new ways for things to go wrong. I think that tradeoff is probably worth it, but we'll want to be able to help users debug issues in their applications. Specifically, we'll want an easy way to tell if a problem is a bug in Tendermint consensus or just an issue in the way that the application is using ABCI. We already have tooling like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There were some discussions in the past around writing a test suite to ensure application compliance to ABCI expectations. (Especially around tx replay protection, which we kept on getting user issues about the time) I wonder if we something along those lines that tries to test expected guarantees would help There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, and one more potential negative - requiring applications to revert state may require them to use data stores that have ACID guarantees/transactions/some way to rollback changes. I think many applications do not do this at the moment. |
||||||
|
||||||
* This is a breaking change to all existing ABCI clients, however this should be a thin | ||||||
ValarDragon marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
* Vote Extensions adds more complexity to core Tendermint Data Structures | ||||||
|
||||||
### Neutral | ||||||
|
||||||
* IPC overhead considerations change, but mostly for the better | ||||||
|
||||||
## References | ||||||
|
||||||
Reference for IPC delay constants: http://pages.cs.wisc.edu/~adityav/Evaluation_of_Inter_Process_Communication_Mechanisms.pdf | ||||||
|
||||||
### Short list of blocked features / scaling improvements with required ABCI++ Phases | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is pretty cool! |
||||||
|
||||||
| Feature | PrepareProposal | ProcessProposal | Vote Extensions | | ||||||
| :--- | :---: | :---: | :---: | | ||||||
| Tx based signature aggregation | X | | | | ||||||
| SNARK proof of valid state transition | X | | | | ||||||
| Validator provided authentication paths in stateless blockchains | X | | | | ||||||
| Immediate Execution | | X | | | ||||||
| Simple soft forks | | X | | | ||||||
| Validator guaranteed IBC connection attempts | | | X | | ||||||
| Validator based price oracles | | | X | | ||||||
| Immediate Execution with increased time for block execution | X | X | X | | ||||||
| Threshold Encrypted txs | X | X | X | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context should make note of what the new (renamed?)
Finalize Block
step indicates. It should probably also make note of what happens to the old phases (prevote & precommit).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevote and precommit aren't phases over ABCI. (ABCI doesn't know anything about them)
I'll add comments for Finalize block!