-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Conversation
substrate/network/src/consensus.rs
Outdated
trace!(target:"sync", "Error broadcasting statement notification: {:?}", e); | ||
} | ||
} else { | ||
trace!(target:"sync", "Ignored statement from unregistered peer {}", peer_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra tab
polkadot/consensus/src/service.rs
Outdated
} | ||
} | ||
|
||
fn run_bft<C, P>(bft_service: &BftService<P, C>, network: Arc<net::ConsensusService>, client: &C, handle: reactor::Handle, header: &Header) -> Result<(), <P as bft::ProposerFactory>::Error> where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: more idiomatic as a free function
polkadot/consensus/src/service.rs
Outdated
bft_service.build_upon(&header, input, output, handle.clone()) | ||
} | ||
|
||
fn process_message(msg: net::BftMessage, authorities: &[AuthorityId], parent_hash: HeaderHash) -> Option<bft::Communication> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this too
impl Drop for Service { | ||
fn drop(&mut self) { | ||
if let Some(thread) = self.thread.take() { | ||
thread.join().expect("The service thread has panicked"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i worry that this could cause shutdown to deadlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The service now explicitly shuts down the network and client event stream, that should cancel all futures and stop the event loop.
polkadot/consensus/src/service.rs
Outdated
|
||
impl<E> Sink for BftSink<E> { | ||
type SinkItem = bft::Communication; | ||
type SinkError = E; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we leave a TODO to replace this with the !
type when that's stabilized?
polkadot/consensus/src/service.rs
Outdated
debug!("Error starting BFT agreement: {:?}", e); | ||
} | ||
}).map_err(|e| debug!("BFT agreement error: {:?}", e)); | ||
if let Err(_e) = core.run(start_bft.into_future()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of doing this in a loop, couldn't we do
let bft_starter = client.import_notification_stream().for_each(|notification| ...)
and core.run(bft_starter)
?
let mut storage = Default::default(); | ||
let key: AccountId = Keyring::One.into(); | ||
|
||
let genesis_config = GenesisConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this duplicated from the CLI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved, not duplicated
polkadot/service/src/lib.rs
Outdated
let events = thread_client.import_notification_stream().map(|notification| { | ||
thread_network.on_block_imported(¬ification.header); | ||
}); | ||
if let Err(_) = core.run(events.into_future()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, i think that this is a misuse of into_future()
and the loop isn't strictly required
substrate/bft/src/lib.rs
Outdated
/// Signal that a valid block with the given header has been imported. | ||
/// | ||
/// If the local signing key is an authority, this will begin the consensus process to build a | ||
/// block on top of it. If the executor fails to run the future, an error will be returned. | ||
pub fn build_upon(&self, header: &Header) -> Result<(), P::Error> { | ||
pub fn build_upon<InStream, OutSink, E>(&self, header: &Header, input: InStream, output: OutSink, executor: E) -> Result<(), P::Error> where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may as well skip the executor as a parameter if it isn't owned by the service anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip the executor as a parameter of what? This function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. this function takes an executor as a parameter just so it can call executor.execute
at the end. might as well just do that at the call sites and save some complexity
substrate/client/src/client.rs
Outdated
/// Block was broadcasted on the network. | ||
NetworkBroadcast, | ||
/// Block that was collated by this node. | ||
Own, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean in the context of the 2-phase commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that the block body was generated on this machine and was not received from the network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the block body is almost always generated on another machine, but any block bodies coming directly from the consensus process have already been evaluated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a variant for that.
/// Block is part of the initial sync with the network. | ||
NetworkInitialSync, | ||
/// Block was broadcasted on the network. | ||
NetworkBroadcast, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't all blocks broadcast over the network in some sense? doesn't it make more sense to provide hints about whether a block is the head of the chain or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The head hint is provided as a separate parameter to import
function. This is for something else. Mainly for logging and possibly skipping certain checks for local blocks.
header: header, | ||
is_new_best: is_new_best, | ||
}; | ||
if let Err(e) = self.import_notification_sink.lock().try_send(notification) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fails if the buffer is full; seems dangerous to discard the notification in this case. maybe use the Sink::send
api and wait for the future to resolve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import is more important and should not block on the notification handler. Normally this should never happen but if it does, skipping a notification is fine since there will be a new block eventually anyway. The handlers are not supposed to expect to receive notifications for every block anyway. It is quite possible that the node goes to sleep or loses connectivity for a while and when it goes back online there won't be any notifications for all the blocks that arrived in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skipping a notification is fine since there will be a new block eventually anyway
this is reasoning about "eventually" a little too long term for me to be comfortable using this in practice.
there won't be a new block if not enough authorities start the agreement process (e.g. if there are 1/3 offline and one of the "honest" nodes skips this notification. then the agreement would only complete when this guy restarted his node)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increased the buffer to 65536 for now. Unfortunately it is preallocated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still dangerous but probably workable for now. we should write a fix after this PR -- maybe a 50ms timer where the same chain head being observed twice in a row starts consensus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll address that in the following PR. Probably ny replacing multiqueue
with a vector of mpsc channels.
if self.peers.contains_key(&peer_id) { | ||
// TODO: validate signature? | ||
if let Err(e) = self.bft_message_sink.try_send(message) { | ||
trace!(target:"sync", "Error broadcasting BFT message notification: {:?}", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropping messages when the buffer is full will almost certainly cause consensus to fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The network handler should not block. If the buffer is full this means that the system just lacks performance to handle the messages and consensus will fail anyway. Blocking here will just move congestion to the lower level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree -- we could get 65 messages all at once and then one will be dropped. But if those are the last messages we were going to receive then it's fine to let the agreement logic process them slowly. It isn't fine to drop the last commit message, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increased the buffer to 65536 for now. Unfortunately it is preallocated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worrying but workable for PoC-1. This is something that should fail on OOM and not before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll address that in the following PR. Probably ny replacing multiqueue with a vector of mpsc channels.
substrate/network/src/error.rs
Outdated
@@ -23,7 +25,7 @@ error_chain! { | |||
} | |||
|
|||
links { | |||
Client(client::error::Error, client::error::ErrorKind); | |||
Client(client::error::Error, client::error::ErrorKind) #[doc="Blockchain error"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it always a blockchain error?
looks reasonable to me |
polkadot/api/src/lib.rs
Outdated
@@ -140,7 +140,7 @@ pub trait PolkadotApi { | |||
|
|||
/// A checked block ID used for the substrate-client implementation of CheckedBlockId; | |||
#[derive(Debug, Clone, Copy)] | |||
pub struct CheckedId(BlockId); | |||
pub struct CheckedId(pub BlockId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making this pub kind of defeats the purpose of the interface
|
||
pub fn on_statement(&mut self, peer_id: PeerId, statement: message::Statement) { | ||
if let Some(ref mut peer) = self.peers.get_mut(&peer_id) { | ||
// TODO: validate signature? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also TODO (but beyond this PR) is handle only current parent-hash statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be addressed in an upcoming PR
|
||
pub fn on_bft_message(&mut self, peer_id: PeerId, message: message::BftMessage) { | ||
if self.peers.contains_key(&peer_id) { | ||
// TODO: validate signature? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't the signatures being checked? I thought you mentioned that to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(it's done in process_message
on the other side of this stream)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but eventually there has to be a check here as well to avoid propagating invalid messages. Or it should be moved here.
* Fix and re-enable ts-block tests * Update frontier checkout * Runtime impl_version bump * cargo update * Remove comments Co-authored-by: Joshy Orndorff <[email protected]>
provide multinodexindex
* refactor for multisig remove doublemap for multisig * Feature/linked node (paritytech#79) provide linked node data struct for runtime module storage this linked node support option data or no option data * provide linkednode struct in cxsupport * refactor linked node * add option for LinkedNode * refactor linkednode remove template mode, use associate type to replace it * Fix static net address (paritytech#80) * Feature/linked_node provide multinodexindex (paritytech#82) provide multinodexindex * fix bug for linked_node when add same index node, do nothing for this node * refactor financialrecords to support withdraw cache refactor financialrecords to support withdraw cache and remove deposit fee * btc bridge * rename num/hash relationship data (NumberForHash/HashsForNumber) * let HashsForNumber map to a vec to get all forked block * add blocknumber in BlockHeaderFor * tokenbalances refactor tokenbalances to support issue token in genesis * reject issue chainx token and provide u32, usize as for tokenbalance * Perfect deposit (paritytech#83) * Add deposit cache * Perfect deposit * Perfect withdraw (paritytech#84) * Perfect withdraw * add network check in btc bridge * when meet testnet, jump header bit check * check the bit change block in genesis * Fix test build * Feature/refactor match (paritytech#86) * matchorder and pendingorders * Fix op_return account id parse * tokenbalances: provide reserved type for reservedtoken * Fix merge error * update genesis_config * Update genesis_config * x-btc * provide codec for btreemap due to orphan for mod, use local struct named `CodecBTreeMap` * Update latest bitcoin-rust dependeces * Implement initial vote weight via coin age (paritytech#87) * Use struct instead of map as much as possible * Unchecked initial version * All intentions except these validators are candidates * Add harsh test * Put candidates into stats * Rename unstake to deactive * Revert StakeWeight * Remove useless code * Remove MAX_INTENTIONS * Refactor with btreemap (paritytech#88) * Refactor NominationRecordsOf to NominationRecords using BTreeMap * Remove candidate_count * Rename deactive to deactivate * optimization match (paritytech#89) * remove ensureaccountliquid in tokenbalances and support for ensureaccountliquied has changed in staking module * Hotfix/fix stakeweight type (paritytech#90) * Revert StakeWeight type * Change Balance to u64 * Fix total_minted and remove valid_cands_weight * Fix insert registration information failure (paritytech#91) * Change receive_address type * Update exterbn * update secp256k1 dependency (paritytech#94) * Fix receive_address bug * Support new substrate in cxrml * update rust-secp256k1 dependeces * Runtime build ok * Build ok * New runtime interface * Update all runtime module * Runtime build ok * All build ok * Add node runtime to support chainx runtime * Update new runtime
Farmer bin to lib - Phase 2
This PR integrates the client, consensus, networking and transaction pool into a
Service
struct.Still TODO: