Sept 10, 2019
Hang Yin, Shunfan Zhou, Jun Jiang
[TOC]
Nowadays the security of the permissionless blockchain is typically guaranteed by state replication over consensus algorithms. Though this approach works well for blockchain, it also means everything on the blockchain is publish, which brings a problem: confidential information can't be handled by the blockchain.
Several projects addressed the privacy problem. Monero and Zcash implemented private transaction by ring signature and zk-SNARK technology but it's limited to cryptocurrencies and hard to extend as general-purpose smart contracts. MPC (multi-party computing) can theoretically run arbitrary programs without revealing intermediate state to the participants but also introduces a performance overhead of
Currently, pure software solutions are not viable. A new approach is to utilize special hardware. TEE (Trusted Execution Environment) is a special area in some processors that provides a higher level of security including isolated execution, code integration, and state confidentiality (https://en.wikipedia.org/wiki/Trusted_execution_environment). Naive TEE as a computing platform has several shortages such as the lack of availability guarantee. Ekiden [ref: Ekiden] fixed these problems by introducing a TEE-blockchain hybrid architecture and implemented high performance confidential smart contract. However, contracts in Ekiden are isolated, meaning the contracts cannot interoperate with each other, and cannot interoperate to external blockchains.
In this paper, we present Phala Network, a novel cross-chain interoperable confidential smart contract network based on Polkadot. We introduce an Event Sourcing / CQRS architecture into a TEE-blockchain hybrid system to archive cross-chain interoperability for the confidential smart contracts. We further designed a Libra-Polkadot bridge to implement a privacy-preserving Libra Coin by confidential contract.
Intel SGX is a popular implementation of TEE (Trusted Execution Environment). It runs code inside a special "Enclave" so that the execution of the code is deterministic, i.e., not affected by other processes or underlying operating system, and the intermediate states is not leaked. In a properly set up system, Intel SGX can defend the attacks from the OS layer and hardware layer (https://www.intel.com/content/www/us/en/architecture-and-technology/software-guard-extensions.html).
Next, the generated attestation quote is sent to the Intel Remote Attestation Service. Intel will sign the quote iff the signing credentials are valid. As each credential is uniquely bound to an Intel CPU unit, fake attestation quotes will never pass the Remote Attestation Service check.
Finally, the attestation quote signed by Intel serves as the proof of successful execution. It proves that specific code has been run inside an SGX enclave and produces certain output, which implies the confidentiality and the correctness of the execution. The proof can be published and validated by anyone with generic hardware.
Intel SGX and the Remote Attestation protocol is the foundation of confidential contract. Except for Intel SGX, there are also alternative implementation choices like AMD SEV and ARM TrustZone.
Phala Network aims to build a platform for general-purpose privacy-preserving Turing-Complete smart contracts. The basic requirements for such a platform could be as follows.
-
Confidentiality
Unlike the existing blockchains for smart contracts, Phala Network avoids the leakage of any input, output, or intermediate state of confidential contract. Only authorized queries to the contract will be answered.
-
Code Integrity
Anyone can verify that an output is produced by a specific smart contract published on the blockchain.
-
State Consistency
Anyone can verify that an execution happened at a certain blockchain height, which implies the output of the execution is subject to a certain chainstate.
-
Availability
There must not be a single point of failure such as disconnection of the miner.
The existing TEE solutions, e.g., Intel SGX, can only prevent the leakage of sensitive information during the execution of isolated programs, and provide no guarantee on availability or verification of input data. Thus it requires a carefully-designed infrastructure to integrate TEE into blockchain to meet the requirements above.
We are going to introduce the design of Phala Network and how it fulfills the above requirements in the following sections.
A typical smart contract can be regarded as a state machine of a current state
Since the state transition process happens inside the enclave, any of its intermediate states remains invisible to outside. We can further encrypt the reached state and input event to prevent the attackers from inferring the internal state of contract with event replay.
Let
where
Unlike the existing smart contract, a confidential contract doesn't expose any information outside the enclave by default. To answer authorized queries, we introduce a query function
The confidential contract must first validate the identity of the user and then respond to her query. Apart from the queries from users, the contract may also accept a special query producing side effects. The side effects include the egressing data that can be posted back to the blockchain by miners.
In this design, the executor (the enclave) is stateless, which greatly simplifies the design of the system. The events on the blockchain then become the canonical source of the inputs to the contract, which implies Event Sourcing design pattern. We further utilized the idea of Command Query Responsibility Segregation in the design of the protocol.
There are a few roles involved in the protocol.
- Users invoke, query and deploy smart contracts. Users interact with smart contracts via Blockchain and Worker Nodes. They can verify the blockchain as well as the cryptographic evidence on the blockchain independently by running a light client or full node. Special hardware is not needed.
- Worker Nodes run confidential contracts in TEE compatible hardware. Worker Nodes are off-chain. In each node, a special program called
pRuntime
is deployed to the enclave. The runtime has a builtin VM to run contracts. It also cooperates with the blockchain to support the contracts in full life cycle. Worker Nodes can be further divided into three roles:- Genesis Node helps bootstrap the network and set up the cryptographic configuration. There's only one Genesis Node and it's destroyed after the launch of Phala Network.
- Gatekeepers manage the secrets to ensure the availability and security of the network. Gatekeepers are dynamically elected on the blockchain and they stake a large amount of Phala token. They are rewarded for being online and maybe slashed in case of misbehavior because there must be a certain number of the Gatekeepers running at any time.
- Miners execute the confidential contracts. They get paid by selling their computing resources to the users. Unlike Gatekeepers, Miners need to stake just a small amount of the Phala token and can join & exit the network as they want.
- Remote Attestation Service is a public service to validate if a Worker Node has deployed
pRuntime
correctly. The cryptographic evidence produced by the service can prove a certain output is produced bypRuntime
running inside a TEE. IAS is Intel SGX's remote attestation service implementation. - Blockchain is the backbone of Phala Network. It stores the identities of the Worker Nodes, the published confidential contracts, the encrypted contract state, and the invocation transactions from users and other blockchains. When plugged into a Polkadot parachain slot, it's capable to interoperate with other blockchains through the Polkadot relay chain.
All the Worker Nodes are required to be registered on the blockchain before participating in mining or Gatekeeper election.
Remote attestation provides a building block to verify the execution as well as its output of a certain code inside the enclave. Running such remote attestation on each execution is inefficient and can be avoided. In Phala Network we adopt a better protocol in which the attestation is only required once during the registration.
With the identity registered, a TLS-like channel between the requester and the target pRuntime
can be established. The identities published on the blockchain can serve as the PKI to avoid MitM attack. Note that not only the client can talk to pRuntime
securely, but a secure channel between two runtime is also possible.
In a well established TLS connection, the two parties can trust each other without further need of remote attestation. Nobody can pretend to be a registered worker node because the corresponding pRuntime
is the only party who has the private key to establish the TLS connection. This trick improves the efficiency and flexibility of code execution in the runtime. It's also widely used in the Phala Network protocol.
The metadata submitted to the blockchain also contains sufficient information to locate the access endpoint of the worker node, for example, libp2p multiaddrs.
Genesis Node assists the launch of the blockchain until it finishes:
- Before the genesis block, the Genesis Node runs
pRuntime.Bootstrap
to generate a key pair as the identity of the node, and a symmetric key, namely Genesis Identity$I_g(pk_g, sk_g)$ and Genesis Key$k_g$ . The runtime reveals$pk_g$ but keeps$sk_g$ and$k_g$ privately. - Start the blockchain with
$pk_g$ . In this stage$pk_g$ is published in the genesis block and is used for other worker nodes to establish secure channels to the Genesis Node.$k_g$ is kept inside the Genesis Node and is used to store secrets necessary to run the network on the blockchain. A list of the initial Phala token distribution is hard-coded in the genesis block. - The blockchain is at pre-launch phase after the genesis block. Governance module is enabled but other modules including confidential contract are still disabled until the Gatekeepers are elected.
- Worker Nodes who want to participate in Gatekeeper election can follow the Worker Node Registration scheme to register their identities on the blockchain. Then
$n_{gk}$ (a chain parameter between tens to hundreds) Gatekeepers will be elected during the pre-launch phase. This can be done via an on-chain Polkadot-style NPoS validator election (see Appendix II for details). - When the election is finished, the Gatekeepers send a request to the Genesis Node for
$k_g$ through a TLS connection. The Genesis Node only answers the requests from the selected Gatekeeper. - The Genesis Node retires and self destroys when all the Gatekeepers are ready.
So far
Periodical key rotation is necessary for forward secrecy. The latest key at epoch
Both Miners and Gatekeepers have to follow the "Worker Node Registration" scheme to join the network. To ensure the service quality, all the worker nodes must stake a certain amount of the Phala token and could be slashed once it fails to meet the responsive requirements. We will discuss the details about staking and monitoring in "Responsiveness Monitoring" section.
As Gatekeepers store the root key and need to be always online, they have to meet a higher standard and need to stake a larger amount. They are rewarded for keeping online and could be slashed otherwise.
The bytecode of the compiled contract is published on the blockchain and then loaded by a user-specified miner. The Gatekeepers generate a symmetric encryption key for each newly published contract. The key is shared with the corresponding miner for state encryption. More specifically:
- The developer publish the contract to the blockchain
- Once Gatekeepers notice the contract, they generate a corresponding contract key
$k_c$ for contract state encryption - Gatekeepers save
$k_c$ to the blockchain as a part of the chainstate encrypted with Root Key$k_r$ - The developer finds an available miner to load the contract. The developer can either run his miner so that no extra fee is needed, or rent one from a resource market (see "Economic Design Paper" for more details).
- The miner runtime connects to Gatekeeper through a TSL connection and asks for
$k_c$ bypRuntime.GetContractKey
.
The miner's pRuntime
will use
The Gatekeepers are re-elected periodically according to the election rule (Appendix II). Let
- The old Gatekeeper set
$G_{n-1}$ validates the new set$G_n$ -
$G_{n-1}$ decrypts the state of the Gatekeepers with the old root key$k_r^{(n-1)}$ -
$G_n$ generates a new root key$k_r^{(n)}$ and retrieve the state from$G_{n-1}$ -
$G_n$ encrypts the state with$k_r^{(n)}$ and stores it to the blockchain - The runtime of the retired Gatekeepers will halt and destroy
$k_r^{(n-1)}$ -
$G_n$ generates new contract keys and notify the miner to rotate the keys. Miners' runtime then will ask$G_n$ for contract key bypRuntime.GateContractKey
Root key and contract key rotation ensures the forward secrecy of the confidential state (for both Gatekeeper state and contracts state). The keys for the obsolete data are destroyed once the rotation is done.
We adopt an Event Sourcing / CQRS style architecture for the contract execution. Read queries and write commands are segregated.
The contract state is determined by the write commands which have multiple sources: user invocations, blockchain events, and ingressive messages from the relay chain. In a naive design, we ask all the write commands to be recorded explicitly on the blockchain. The commands are denoted by Ingressive Events to pRuntime
. As the events on the blockchain are ordered naturally, the blockchain becomes a canonical source of events.
To invoke a contract the user needs to generate a shared secret key with the miner who runs the contract. This can be done via a non-interactive Diffie-Hellman key exchange scheme with her private key and the miner's registered public key (https://www.sqi.at/resources/Schindler-2019-CIW-Distributed-Key-Generation-with-Ethereum-Smart-Contracts.pdf). The key is used for future communication and then the user can submit the encrypted payload to the blockchain. The invocation events are processed by pRuntime
once they arrive at the miner node.
As invocation payloads are included in a block, the blockchain is naturally a canonical source of events. All the contract invocations initiated by users, smart contracts, and other blockchains are timestamped and treated equally by the executor. It therefore makes a unified interface for contract interoperability.
The downside of the architecture is that the confirmation of the commands happens after the confirmation of the block. The performance of the blockchain becomes the bottleneck for contract invocations. However, the read-only queries are made into the runtime directly and the performance is not bounded by the blockchain. This is possible because the queries don't modify the contract state.
Miners are responsible to ensure the communication between pRuntime
and the blockchain. A monitoring scheme is needed to ensure the connectivity. In the worst case (e.g. miner shutdown) the contract execution can be resumed by another miner.
A single miner is sufficient to run a contract. Though miners are incentivized to run contracts in the long term, the miner may still becomes unresponsive due to network or power outage rarely. In such a case another miner can recover the saved state from the blockchain and resume execution.
As mentioned in section "Execute the Contract", one of the side-effects produced by pRuntime
is the periodical contract state update. The dumped state is encrypted by
Both Gatekeepers and miners are required to keep the responsiveness to keep the functionality of Phala Network. Gatekeepers have to maintain a high level of responsiveness because the root key is kept inside the Gatekeepers runtime and must be available at any time. As long as Gatekeepers can serve the contract key for miners, the availability of the contracts can be guaranteed. So unresponsive miners are not as harmful as unresponsive Gatekeepers to the system.
We adopt a Polkadot-like unresponsiveness detection algorithm (https://research.web3.foundation/en/latest/polkadot/slashing/amounts/#unresponsiveness). Both Gatekeepers and miners produce side-effects by their runtime. They have to at least post the state updates periodically to the blockchain within an interval. So all the submitted side-effects can be used as a counter of the online activities. Then we can determine if a node
Security fallbacks:
- Alternative TEE hardware: Though we use Intel SGX as the reference for the current design, we don't make any assumptions about the hardware. Potential TEE hardware includes AMD SEV, ARM Turstzone, and some open-source implementations in progress. When we support any alternatives the different TEE can work together transparently.
- Contract key backup by secret sharing scheme: To avoid catastrophes where Intel SGX breaks entirely (e.g. Intel bans all the Remote Attestation request from our side), we can utilize a secret sharing scheme to distribute the Root Key to the Gatekeepers, or maybe two generations of the Gatekeepers. In such a case, we can wait for the deployment of an alternative TEE system. Then the secret holders can collaborate to ingest the key to recover the execution of Phala Network.
Optimizations:
- Event batching & async state commitment
- State storage prune
- Layer 2 state sharing