-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add remote signing to substrate client #4689
Comments
We have proposed a solution for this based on Intel SGX TEEs:
Remarks to your OP:
|
We're about to do a new VRF, likely called VRedJubJub, that'll we'll need to support as well, and of course BLS signatures, but they do not add as much complexity here, but of course legder devices cannot produce SNARKs and maybe cannot do BLS signatures. |
Just for your information: Zondax is receiving a grant from us to work on a flexible TrustZone-based HSM stack |
Changing this in substrate will be very involved, as it introduces a completely different pattern of what the keystore is and how it works. They way it works right now is, that the keystore is a single entity in the system (either in memory or saved on disk), holding different types of keys for different tasks. When a component needs to sign something it asks the keystore for the appropriate keys and uses them to sign the data. Meaning this is a direct, non-blocking API and in doubt the keys holds all information for signing directly in memory–though discouraged, you can keep the key around and reuse it. This however, proposes a completely different approach how signing works. Rather than the keystore holding the keys, you'd have to submit something you'd like to have signed to it and wait for that to return. Making it an async and indirect API. While not impossible, a range of crates depend on the keystore directly and a range of others imply this pattern (e.g. GRANDPA). Switching these is a pretty large task, touching a lot of code, many of which are sync right now and would become async as a result, with –probably– a big tail of things to have to change in responds to that ;) . |
We'd prefer doing this by features, not adding some new |
I don't think that it will be that involved on the Substrate side of signing. It is right that we need some changes here and there. However, aura, grandpa and babe are already async. The trait can just return a Offchain signing (imonline) shouldn't also be that hard, we need to call into the host anyway and use As everything uses the |
I donno if https://github.com/iqlusioninc/armistice is relevant, but maybe good to track if you'd working on this stuff |
Some implementations might actually be synchronous, such as those based on an on-chip TEE. |
I think keystore and signer should be two different independent entities. Actually the concept of a software-based keystore may not always be required.. Substrate should ideally deal with a signer only. This signer may later rely on a keystore or not. My recommendation is to aim for an asynchronous design to cope with latency issues. Even in the case of fast TEEs, it can affect performance if signing operations require context switches, syscalls, etc. With respect to the work we did at Zondax in Tendermint, yes we used the HSM in Ledger devices (deserializing votes, checking with a monotonic counter, etc.). Latency in these devices is in the order of tens of milliseconds so an asynchronous approach was very important regardless of running in-process or remotely. We are now working on a completely new design for Kusama/Polkadot/Substrate with a very much hardened datacenter-quality external device, running in a TEE plus in some models we even have access to an integrated HSM. While running a "lean" node would be possible, it means adding a bigger attack surface that we strongly would like to avoid. Anyway, I am not sure if this discussion is still active.. though having seen the changes here https://github.com/paritytech/substrate/pull/4925/files. I think a good and quick step forward would be to: This way interested parties can provide clean alternative implementations. 3- Ideally make |
IMO, once the signer/keystore have been fully decoupled and made async.. third-party implementations can define their own API, comm protocol, in-process vs remote approach, etc. I think this is the most flexible approach. There is still one more complex but important issue. At the moment, signers operate on blobs, so they cannot really know what it is being signed. In some cases, signers may even received hashes of the actual content. This severely limits how smart a signer can be.. meaning, it is not possible to track and design adequate double signing protection schemes. I would need to dig more into the current substrate implementation, but I wonder if there are a few convenient places that could be extended to provide more information at the moment of signing or this is at the moment scattered all over the code. Otherwise, I can already see that, at least from my project perspective, the keystore is actually not the point where we need to plug-in but just before GRANDPA/BABE/etc decide to sign and still have an structured object. |
@brenzi @jleni this seems like a perfect use-case for a formally-verified microkernel, such as seL4. The microkernel could provide software-based isolation between untrusted components, such as the network stack, and trusted components, such as the signer implementation. One major caveat is that the main framework that I know of for using seL4, CAmkES, only supports systems where all resources are statically allocated. Ideally, the trusted code should not use dynamic memory allocation, but I am not sure if this is practical. |
@demimarie-parity Very interesting! But wouldn't this require self-hosted signer HW? Even if cloud services would offer SeL4 VPS, why would you trust them? They still have access to all memory. Am I missing something? |
@brenzi You are not. That is one reason why self-hosted signer hardware should be preferred. The biggest caveat is that not everyone can provide the level of physical security required, and most cannot provide the needed protection against DDoS attacks. Could @kirushik chip in? Using seL4 has a few caveats:
|
To elaborate: From my perspective, the only advantage of a TEE and/or HSM is protection against attackers with physical access. I believe that equally important, if not more important, is privilege separation a la QubesOS. While Substrate is a substantial attack surface, we can remove much of the rest. |
Working with QubesOS and Redox might be a good idea as well. |
I disagree with this, TEEs do not have much to do with physical access. Both TEEs and HSMs can provide different (better?) guarantees than QubesOS (basically a Xen hypervisor without ASLR or NX). I will not write extensively here, to avoid going off-topic, given this issue is mostly about providing an API for teams to provide their preferred security solution. Happy to organize or a Riot channel about this though! Nevertheless, as there are MANY valid alternatives and approaches, I would strongly suggest to make the architecture as flexible as possible so different solutions can be integrated over time. |
To advance this a bit further, especially after merging #4925, here's my line of thinking when it comes to implementing client support for remote signing:
pub trait Signer {
fn supported_keys(
&self,
id: KeyTypeId,
) -> Result<Vec<CryptoTypePublicPair>, BareCryptoStoreError>;
fn sign_with(
&self,
id: KeyTypeId,
key: &CryptoTypePublicPair,
msg: &[u8],
at_blockhash: &[u8],
) -> Result<Vec<u8>, BareCryptoStoreError>;
}
/// Type of the client signer.
#[derive(Clone, Debug)]
pub enum SignerType {
Local,
RemoteClient,
RemoteServer,
}
pub struct LocalSigner {
keystore: Store,
}
impl LocalSigner {
fn new(keystore: Store) -> LocalSigner {
LocalSigner {
keystore,
}
}
}
impl Signer for LocalSigner {
fn sign_with(
&self,
id: KeyTypeId,
key: &CryptoTypePublicPair,
msg: &[u8],
_at_blockhash: &[u8],
) -> Result<Vec<std::primitive::u8>, BareCryptoStoreError> {
self.keystore.sign_with(id, key, msg)
}
fn supported_keys(
&self,
id: KeyTypeId,
) -> Result<Vec<CryptoTypePublicPair>, BareCryptoStoreError> {
self.keystore.supported_keys(id, vec![])
}
}
I would like to get some feedback on the above to move this forward. |
Why do you want to introduce a new trait? The
You don't need to pass |
You're absolutely right. After working on the code for a bit, it is apparent to me that the separation of Signer and Keystore doesn't make sense. I am reverting the work i did by keeping Keystore as-is and going to introduce |
Could you expand on this a bit please? how would the key type be relevant to the blob sent for signing? |
If you see the |
I've increasingly realized that block seals should probably use the |
@burdges I would love to see that change be made sooner rather than later, but I am not sure if it is practical right now. We can always make it at the next hard fork. |
Hi, when can we have a remote signer for substrate session keys? I just re-read the description and everything is still very relevant. Let's prioritize this? |
We've several major projects that shall further change the session key crypto: beefy, including optimized signing, sassafras, including ring VRFs and ephemeral block signing keys, new session certificates for shashing reform, post-quantum options, and equivocation prevention. All development is path dependent.. It's possible if complex to implement remote signers for these after they're working, but it's impossible to implement & deploy these once everyone expects a remote signer. |
It should also be mentioned, that Zondax has a working external signer that allows the management of session keys inside of an ARMs TrustZone, however it seems Parity has not interest to support this officially yet (see #10423 for details). |
The following proposes the addition of remote signing functionality within the substrate client.
Context
Security of Proof of Stake networks lie within the hands of validators - without the security these entities provide, the whole system falls apart. The responsibility of a validator is to operate stable, reliable, consistent, and secure operations of their nodes. This responsibility also includes managing their signing keys, keys that let the network know they were the ones that verified that the activity they put on the network is non-byzantine.
As a validator, the current paradigm of storing hot session keys in the client leaves much to be desired in terms of security. Although session keys cannot lead to direct access of funds, a compromise of the validator host (and the session keys within it) can lead to a complete loss of funds for a validator and the funds of those nominating them. Furthermore, there is a greedy incenctive to compromise these keys, as up to 10% of the slash can get rewarded to those who report it. While key rotation helps mitigate this to an extent, a more elegant solution of key storage and signing will be required in the long run.
Separating out the storage and signing interface of session keys from the validator host client would allow validators to create more robust and flexible operations, while providing additional layers of defense against possible attack vectors. A full compromise of the validator host shouldn't enable conditions where the validator can be slashed. Separating out the storage of session keys would mean adding the ability to have a remote signing interface, which gives a flexible means of having a remote signing server - one which ideally has double signing protection and HSM, TEE, Ledger, and TSS support. This addition increases the cost of compromising validator operations, something that creates a more resilient and secure network in the long run.
Remote Signing Server
The following proposes the approaches of one remote signing server, although the interfaces exposed by the substrate client should allow for multiple implementations to exist. The signing server proposed here would live as a rust module in a separate repository - these considerations are for reference and context.
A remote signing server should be flexible to account for a diversity of key management approaches, including TEE, HSM, cloud HSM, Ledger, and encrypted software based key storage. Additionally, the remote signing server should be able to support multiple substrate based chains. This essentially acts as a single API for all key management and signing.
Approach
The signing server should run as a separate process on a physical on-premise host, although cloud based should be considered as well (although is less preferred).
An approach would be to have the remote server have an inverse connection where the remote signer makes an outbound encrypted connection channel to the validator host listening at the multiaddr URL specified by the substrate client cli flag
--keystore-server <URL>
. The remote signer would not be open to any outbound traffic, reducing it's attack surface. It's the signer's responsibility then to keep the connection open to the substrate client. After making an initial connection, the remote signing server listens for RPC requests from the validator host, handles them by creating the appropriate signature or payload, and sends the response back to the validator host.RPC API Spec
Requests and responses from the substrate client to the signing server should be tagged appropriately to differentiate how and what to sign. These would be specific to the module that is requesting them, such as
GRANDPA
orBABE
.One could imagine the following types of RPC requests/responses:
GrandpaPrevoteRequest
/GrandpaPrevoteResponse
GrandpaPrecommitRequest
/GrandpaPrecommitResponse
BabeVRFRequest
/BabeVRFResponse
BabeAuthorRequest
/BabeAuthorResponse
The specifics of these should be a point of discussion as how to minimize the changes needed in the substrate client.
Configuration
Configuration of the signing server can be done via a config file that gets loaded upon starting the remote signing server. As one design goal is to have flexible ways of storing keys, this will be used for specifying the key provider (what is storing the keys), type of key, validator host, and so forth.
The following is a non-exhaustive list of some possible configuration parameters:
validator
name
chain
kusama
,polkadot
,dev
,flaming-fir
,parachain-id
, etcvalidator-multiaddr
validator-connection
key
purpose
babe
,grandpa
,authority-discovery
, etckey-provider
YubiHSM2
,TEE
,Ledger
,AWS CloudHSM
, etckey-type
sr25519
,ed25519
,bls12-381
, etcCLI
The remote signing server would likely have a cli interface for setup, debugging, and deployment.
One could imagine the following possible commands:
generate
--val-name
the name of the validator for which the key belongs--purpose
with optionsgrandpa
,babe
,aura
, etc--key-type
with optionssr25519
,ed25519
,bls12-381
--key-provider
with optionssoft
,yubihsm
,sgx
,ledger
, or othersadd
--val-name
the name of the validator for which the key belongs--purpose
with optionsgrandpa
,babe
,aura
, etc--key-type
with optionssr25519
,ed25519
,bls12-381
--key-provider
with optionssoft
,yubihsm
,sgx
,ledger
, or othersrotate-keys
ping
Key Providers
The following describes some key providers and some benefits and trade offs they may provide.
HSMs
HSMs, or hardware security modules, allow you to store keys in a secure manner within hardware. They use tamper proof secure elements that prevent key extraction and allow payloads to be signed without ever exposing the private keys to the host. Since the generated keys never leave the device, even if the validator host is compromised, an attacker would not be able to access these keys.
One issue with most HSMs, however is that they are dumb signing oracles. It will sign whatever it recieves without verifying it. Thus this alone doesn't provide much security compared to soft signing in terms of equivocation. If the validator host is compromised, an attacker can still request a signature, however they cannot extract the keys themselves. This approach is thus most useful with a remote signing server that also has double signing protection.
TEE
A remote signer operating within a TEE such as SGX or Trustzone gives increased security compared to filestore based storage.
Here's one approach as to how this can be used in this type of situation.
Ledger
Ledgers work very well amidst HSM-like solutions, as they are programmable (and thus double signing protection can be built into the software). They are also cheap, highly available, and easily accessible. In production datacenters, these can work surprisingly well.
Substrate Client
One would need to modify the Substrate client to account for fetching keys and signatures externally.
A first thing that needs to be done is implment an RPC server for sending and fetching requests. This would involve either creating a new module,
keystore-server
, or modifying thekeystore
module to include this.The RPC server would start to run when additional cli flag is given to a substrate client,
--keystore-server <CONNECTION_SECRET>
. When this flag is given, the RPC server well begin to listen for a request from the remote signing server to initiate a handshake.CONNECTION_SECRET
will be needed to start the handshake, and from an operators perspective, this should be handled with a secrets management service like Hashcorp Vault. Additionally, another flag,--keystore-server-url <URL>
could be specified as a specific url or port that the RPC Server listens on.If the subrate node is started with the
--keystore-server
flag enabled, it would wait until a handshake is made before it starts producing and finalizing blocks.Additionally, changes would need to be made to the substrate client to change how keys are fetched and signatures created compared to how it exists currently. One approach here would be to modify the
keystore
in the client to contain abstractions over this happening in either the client or fetching them from the remote server. This would contain the interface that both the client signer (perhaps within thekeystore
) or external signer implements. Either a newkeystore-server
or modified existingkeystore
will have the responsibility of generating requests needed to send to the external signing server. Changes in the consensus modules will need to be made to delegate the creating of those requests tokeystore
/keystore-server
.Double Signing Protection
Although adding a remote signer can add a layer of security compared to the current status quo, if the validator host were to be compromised, the attacker can still initiate a double sign by invoking the remote signing server. In order to mitigate this, double signing protection should eventually get built into the remote signing server. If the substrate client is compromised, the signing server should be able to prevent equivocation, or anything that ends in the corresponding extreme level of slashing for the validator.
In order to do this, the remote signing server would need to keep track of state as to not be able to produce or finalize conflicting blocks.
In Tezos, double signing protection is done by keeping track of a high watermark for endorsements and block headers. The high watermark is the highest level to have been baked so far and no block header or endorsement will be signed at a lower block level than the previous block or endorsement.
In Cosmos, this is done by keeping track of the last Height, Round, Step (HRS). When trying to sign a new block, it will only sign any that have a higher HRS.
Thus, the following will need to be constructed individually:
High Availabilty
Having both remote signing as well as double signing protection can help give way to high availabilty (active/active) type setups that would increase the resiliancy of the network and validator operations. One possibility this unlocks is a MPC ha keystore server with m of n threshold based signatures required to produce the signature to the validator host. This depends on #11, but ultimately creates an extremely robust setup where the cost and opportunity to compromise a validator becomes substantially lower than the current status quo.
v1
A first version of this would have minimal functionality at first, likely using session keys like they are now, but isolated within a remote signing server. HSM interfaces as well as double signing protection should be next steps.
Discussion
keystore
be modified for this logic?The text was updated successfully, but these errors were encountered: