Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Preparing light client structure [v2] #150

Merged
merged 12 commits into from
May 28, 2018
Merged

Preparing light client structure [v2] #150

merged 12 commits into from
May 28, 2018

Conversation

svyatonik
Copy link
Contributor

Following #142 + conversation with @rphmeier .
I've tried to make specialized PolkadotApi implementation (just a stub for now), which should work on light client by requesting execution proofs from full nodes.
I've tried to reuse as much service/client/network code as possible, as suggested by @arkpar in #142 . Though having light implementation of Client is not a priority right now (if I understand you correctly, @rphmeier), there's still light version of Client (stub actually).

Brief overview:

  • extracted CallExecutor trait with call(at, method, call_data) method. Two implementations - LocalCallExecutor, which uses locally-available state data and RemoteCallExecutor which fetches execution proof from remote nodes && checks it locally;
  • since there are no tries right now, merkle proofs are not available => execution proof is the whole state atm. This will be changed later, after tries are landed. So the proof check is: (1) check that root is the same as in the at block (2) check that the local execution result is the same. So it isn't a light client actually right now, since it fetches a lot of data from remote node;
  • client backend is now shared between Client and CallExecutor implementations => it is now stored as Arc in both Client and CallExecutor;
  • introduced RemotePolkadotApi implementation of PolkadotApi, which takes advantage of remote execution. Still not sure that 2nd implementation is required or we can make single PolkadotApi for Client, working solely with CallExecutor - so it is a subject to change;
  • 'light backend' should fetch required data from remote nodes using provided Fetcher. Since there's separate layer for executing calls (CallExecutor), it should only fetch a data for RPC calls (when properly used), but still it should be a fully-functional backend;
  • OnDemand is an implementation of Fetcher, which is now able to fetch execution proofs only. Later, block body, storage, tx inclusion proofs should be added;
  • Client::authorities_at implementation, which used direct state access is replaced with runtime method call => added this method to the test-runtime implementation => test-runtime genesis file change;
  • slightly changed sync protocol for light nodes - body is not queried when new block is announced. Client::check_justification works because remote authorities_at call is available on light clients;
  • as a result of ^^^, headers synchronization works right now (only headers + justification is queried when new block is announced + check_justification works as intended [given that there's no merkle proofs, though]).

Some TODOs (excluding described above):

  • separate tx pool implementation - if I understand correctly, we should store only local transactions on light clients. Plus, current implementation never prunes txs from the pool when they're included in the block, because this relies on accessing the block body;
  • only connecting to full nodes from light client nodes. Or maybe at least reserving some slots for full nodes.

@svyatonik svyatonik added the A0-please_review Pull request needs code review. label May 8, 2018
@svyatonik
Copy link
Contributor Author

P.S.:

  • light client backend uses in-memory blockhain backend as a storage now (i.e. it will resync from start after restarting);
  • there are big 'technical' changes about moving prepare_genesis closure content into GenesisBuilder trait method build - logic is not changed here;
  • not sure if collator + validator nodes should also be considered as FULL nodes (i.e. if they're storing state data locally) - would appreciate any advice (there's TODO for this in the code);

@@ -124,6 +115,10 @@ pub fn run<I, T>(args: I) -> error::Result<()> where
info!("Starting validator.");
role = service::Role::VALIDATOR;
}
else if matches.is_present("light") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else if should be on preceding line

@@ -211,21 +211,23 @@ pub struct Service {

impl Service {
/// Create and start a new instance.
pub fn new<C>(
pub fn new<A, C>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not convinced that this service needs to support light client in any form since it's only for validators (who are definitely full nodes)

Copy link
Contributor

@rphmeier rphmeier May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be a hierarchy client backend traits where we can bound things by Backend: LocalBackend to filter out light nodes or Backend: RemoteBackend to accept light nodes only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added here. Had to add another pair of traits (LocalPolkadotApi + RemotePolkadotApi), since consensus service depends on PolkadotApi trait (and I didn't wanted to make PolkadotApi generic).


/// Creates light client and register protocol with the network service
pub fn new_light(config: Configuration) -> Result<Service<client::light::Backend, client::RemoteCallExecutor<client::light::Backend, CodeExecutor>>, error::Error> { // TODO: light
Service::new(move |_, executor, genesis_builder: GenesisBuilder| {
Copy link
Contributor

@rphmeier rphmeier May 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for an actual service, the logic needs to be fairly different for a light client.

for one, we need to introduce a feedback loop between block import and the sync service, where we can fetch things like validator set changes and state proofs before continuing sync.

and we should cut out all the validator stuff from light clients, it's not necessary to support

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't result of import_block is enough? I mean - we can do anything (like check for validators set changes there (in import_block)) there && just return error to sync if something goes wrong. So this will affect only small portion inside Client implementation && leave the Service as is.


let (remote_result, remote_proof) = self.fetcher.execution_proof(block_hash, method, call_data)?;

// code below will be replaced with proper proof check once trie-based proofs will be possible
Copy link
Contributor

@rphmeier rphmeier May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are possible now, but I'm fine with deferring to another PR if you like

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah - better to start adding support for tries after this PR (overall structure) is approved

self.executor.clone()
/// Get the set of authorities at a given block.
pub fn authorities_at(&self, id: &BlockId) -> error::Result<Vec<AuthorityId>> {
self.executor.call(id, "authorities",&[])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is your plan for having light clients memoize this and code changes? How will that fit into the abstraction?

Under the current consensus algorithm, we can expect certain runtimes to make a digest entry every time the authorities or code change for the next block. Under the planned consensus algorithm this change will be applied at some point beyond finality of the signal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now: detect change in import_block method (digest is available there for light clients) => reset cache (not sure yet - where cache must be placed - in OnDemand service, or at the LightBackend/LightCallExecutor).

Then: not sure, but I guess the same strategy will work, but cache reset must be scheduled, instead of executing right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will probably want to keep all authority sets and scheduled transitions in the DB until they are no longer necessary, and then have some way to look up by block ID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - will add some meta-db + make cache backed by this db then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw - the question from the future PRs :) do we need the same data on full nodes, or we're only considering archive nodes right now (=> all states are available => cache is not required)?

Copy link
Contributor

@rphmeier rphmeier May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now let's consider only archive nodes. but the scheduling logic should be mostly the same for light and full nodes.

@@ -26,7 +26,8 @@ use primitives::block::{self, Id as BlockId, HeaderHash};
use blockchain::{self, BlockStatus};
use state_machine::backend::{Backend as StateBackend, InMemory};

fn header_hash(header: &block::Header) -> block::HeaderHash {
/// Compute block header hash.
pub fn header_hash(header: &block::Header) -> block::HeaderHash {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is redundant with blake2_256(), isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, replaced with direct call


fn storage(&self, _key: &[u8]) -> Result<Option<Vec<u8>>, Self::Error> {
// TODO [light]: fetch from remote node
Err(error::ErrorKind::NotAvailableOnLightClient.into())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is really not the right approach. Making round trips for every storage entry we want to read <<<< making a single round trip for an entire call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a StateBackend which is generated from an execution proof; then all the methods can be implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's impl StateBackend for OnDemandState - stub right now. storage is not intended to be used during calls - as mentioned in description:

'light backend' should fetch required data from remote nodes using provided Fetcher. Since there's separate layer for executing calls (CallExecutor), it should only fetch a data for RPC calls (when properly used), but still it should be a fully-functional backend;

match core.remove(peer, response.id) {
Some(request) => {
// we do not bother if receiver has been dropped already
let _ = request.sender.send(response);
Copy link
Contributor

@rphmeier rphmeier May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should check the validity of the response here and reassign if the response is not correct. otherwise we are just relying that our peers are correct and are vulnerable to sybil attack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could have each request come with an associated attempts counter, and users can specify a default maximum when creating the OnDemand service. When dispatching a request they can use that default or provide their own maximum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to add this logic to the LightCallExecutor / LightBackend - remote execution proof check = local execution (am I right?). And that is performed in LightCallExecutor right now. So fetch => execute locally => disconnect if fail.

Copy link
Contributor

@rphmeier rphmeier May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, as long as there's a convenient way to disconnect the peer from the LightCallExecutor -- that will require passing peer info along with the response.

Copy link
Contributor

@rphmeier rphmeier May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little different than the OnDemand service in parity-ethereum, which handles data validation and rescheduling internally. But then it has the nice property that only good data comes out of the service. This architecture is also useful for implementing a network protocol that implements heterogeneous requests in a single packet, where inputs of later requests are outputs of earlier ones (see PIP definition).

note that the eventual goal for substrate is to use a network protocol like that -- if there is another way to do it without adding more round trips I am fine with that as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no - I think this would change the structure significantly, so let it be in this PR - I'll update the code here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added FetchChecker trait to the light client module (where Fetcher is also defined). It is responsible for checking remote response and converting it from 'remote' to 'local' form (like dropping merkle proofs, because they're not required after check is performed).
So implementation of Fetcher (i.e. OnDemand) could stay in network crate && do not refer directly to blockchain/client && only check+convert remote response using this trait (implemented in light module).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add attempts counter in next PRs, if you don't mind - still want to achieve some overall structure stability in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see FetchChecker anywhere in the diff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None => return,
};

let mut request = self.pending_requests.pop_front().expect("checked in loop condition; qed");
Copy link
Contributor

@rphmeier rphmeier May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just loop for mut x in self.pending_requests.drain()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there might be no peers available or number of peers is < than number of requests (currently 1 request <-> 1 peer at a time)

@@ -34,8 +34,10 @@ extern crate error_chain;
#[cfg(test)]
extern crate substrate_keyring as keyring;

use client::backend::Backend;
use client::Client;
pub mod light;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since light clients are meant to be first-class we should also put the full client implementation in a submodule

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to do this, but actually full client could be backed by db backend (which is already in the separate crate, named db, not the full) and in-memory backend (which I guess could be renamed to TestBackend). So there's actually nothing to move to the full submodule right now, except for db crate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impl<B: LocalBackend> PolkadotApi for Client<B, LocalCallExecutor<B, NativeExecutor<LocalDispatch>>> looks like it could go in a full module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. missed that you're talking about Api, not about the client

@svyatonik svyatonik added A4-gotissues and removed A0-please_review Pull request needs code review. labels May 18, 2018
@svyatonik svyatonik added A0-please_review Pull request needs code review. and removed A4-gotissues labels May 21, 2018
message: message::RemoteCallRequest,
}

impl Response {
Copy link
Contributor

@rphmeier rphmeier May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should just be a Future. wait comes for free then, along with a bunch of other things which are useful for composing async calls.

/// Light client data fetcher.
pub trait Fetcher: Send + Sync {
/// Fetch method execution proof.
fn execution_proof(&self, block: HeaderHash, method: &str, call_data: &[u8]) -> error::Result<(Vec<u8>, Vec<Vec<u8>>)>;
Copy link
Contributor

@rphmeier rphmeier May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should return a Future. It will still be easy to use it in a synchronous way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe CallExecutor should return a future too. This will make it compose nicely with asynchronous RPC servers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed Fetcher to return Future. Left CallExecutor as is for now

@rphmeier rphmeier added A5-grumble and removed A0-please_review Pull request needs code review. labels May 22, 2018
@rphmeier
Copy link
Contributor

rphmeier commented May 22, 2018

we should store only local transactions on light clients

light clients should also not propagate non-local transactions, or any block data. They should take up "corners" of the gossip topology where they only leech data.

@svyatonik svyatonik added A0-please_review Pull request needs code review. and removed A5-grumble labels May 22, 2018
@gavofyork gavofyork added A8-looksgood and removed A0-please_review Pull request needs code review. labels May 24, 2018
@gavofyork
Copy link
Member

need another merge...

@svyatonik svyatonik merged commit 6ba747c into master May 28, 2018
@svyatonik svyatonik deleted the light_init branch May 28, 2018 07:28
JoshOrndorff pushed a commit to moonbeam-foundation/substrate that referenced this pull request Apr 21, 2021
…ytech#150)

* created npm package and added it to tests

* install types bundle

* also export definitions

Co-authored-by: Joshy Orndorff <[email protected]>

* remove exitreasons>
liuchengxu pushed a commit to chainx-org/substrate that referenced this pull request Aug 23, 2021
liuchengxu added a commit to chainx-org/substrate that referenced this pull request Aug 23, 2021
* Refactor xsupport pallet

* Optimize using try_for_each

* Rename to as_string and as_string_hex

* .

* Rename to as_addr()

* Remove useless Str
liuchengxu added a commit to autonomys/substrate that referenced this pull request Jun 3, 2022
Simplify the properties generation in ChainSpec
Wolmin added a commit to Quantum-Blockchains/substrate that referenced this pull request Nov 15, 2022
* Add toolchain file + gh action file

* Add build action

* Adjust test workflow for client only

* Delete testing action

* Change step name
helin6 pushed a commit to boolnetwork/substrate that referenced this pull request Jul 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants