-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Conversation
Agree. I am trying to create a test bed on GCP for overall network testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great stuff! I offer lots of nits here, but please don't assume that implies I don't like the PR. I had an easy time following your code - thanks so much! Let's polish this one up and get it merged!
src/choose_gossip_peer_strategy.rs
Outdated
|
||
let result = weighted_strategy.calculate_weighted_remote_index(key1); | ||
|
||
// If nobody has seen a newer update then rever to default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
@@ -0,0 +1,306 @@ | |||
use crdt::ReplicatedData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add module documentation?
src/choose_gossip_peer_strategy.rs
Outdated
pub const DEFAULT_WEIGHT: u32 = 1; | ||
|
||
pub trait ChooseGossipPeerStrategy { | ||
fn choose_peer(&self, options: Vec<&ReplicatedData>) -> Result<ReplicatedData>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to return a reference here to avoid the clone()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, @aeyakovenko, what do you think about the use of the term "peer" here? Should we consider replacing ReplicatedData
with Peer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@garious, yeah I thought the reference would have been fine, but in the original gossip_request() code, there was a clone(), so I aligned with that. https://github.com/solana-labs/solana/blob/master/src/crdt.rs#L520. Looking at it more in depth now, it doesn't seem necessary.
src/choose_gossip_peer_strategy.rs
Outdated
fn choose_peer(&self, options: Vec<&ReplicatedData>) -> Result<ReplicatedData>; | ||
} | ||
|
||
pub struct ChooseRandomPeerStrategy<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment that summarizes this strategy and its limitations?
src/choose_gossip_peer_strategy.rs
Outdated
} | ||
} | ||
|
||
pub struct ChooseWeightedPeerStrategy<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment that summarizes this strategy and how it overcomes the limitations of ChooseRandomPeerStrategy?
tests/data_replicator.rs
Outdated
trace!("waiting to converge:"); | ||
let mut done = false; | ||
for _ in 0..30 { | ||
done = c1.read().unwrap().table.len() == 3 && c2.read().unwrap().table.len() == 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old version of cargo fmt
- this will fail CI. If you rustup update stable
, it'll automatically update rustfmt along with the compiler.
tests/data_replicator.rs
Outdated
// Make sure liveness table entry contains correct result for c2 | ||
let c2_index_result_for_c4 = liveness_map.get(&c2_id); | ||
assert!(c2_index_result_for_c4.is_some()); | ||
assert!(*(c2_index_result_for_c4.unwrap()) == c2_index_for_c4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_eq!
tests/data_replicator.rs
Outdated
// Make sure liveness table entry contains correct result for c3 | ||
let c3_index_result_for_c4 = liveness_map.get(&c3_id); | ||
assert!(c3_index_result_for_c4.is_some()); | ||
assert!(*(c3_index_result_for_c4.unwrap()) == c3_index_for_c4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_eq!
tests/data_replicator.rs
Outdated
threads.extend(dr2.thread_hdls.into_iter()); | ||
threads.extend(dr3.thread_hdls.into_iter()); | ||
|
||
for t in threads.into_iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for that into_iter()
tests/data_replicator.rs
Outdated
threads.extend(dr1.thread_hdls.into_iter()); | ||
threads.extend(dr4.thread_hdls.into_iter()); | ||
|
||
for t in threads.into_iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for that into_iter()
I took a stab at updating the title and description. Feel free to update that again. A description that explains how the resulting system is somehow "better" would be helpful context for anyone else reviewing the PR. The ticket says "Improve convergence", but in what way? More secure? Faster? Perhaps this is a question for @aeyakovenko. |
@aeyakovenko, in that list of metrics, which would you assume would change as a result of this PR? |
in theory all 3, but we need some tests to experiment with this :). @carllin @pgarg66 how is your knowledge of http://man7.org/linux/man-pages/man8/tc-netem.8.html I think we can sim gossip on a large set of nodes on just a single big machine with lots of cpu cores. since all the threads should only be doing network IO. |
I've used netem in past. I was planning to use it to induce network errors. |
I haven't used netem before, but would sign up as a willing disciple/guinea pig of the wise one: @pgarg66 |
Thanks a bunch @carllin. Let's keep this going. Great code, great direction. Now the challenge is finding ways to measure all the value you're adding. |
…#438) * Add failing tests * No need to hold source RefMut in process_close_account and process_toggle_freeze_account
Instead of randomly choosing a peer to get gossip updates from, this PR adds an option to weight that choice by stake.
Fixes #302