-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QuorumStore] Fix batch requester #7190
Conversation
} | ||
} else { | ||
// retry_limit exceeds, fail to get batch from peers | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so if we break here, the last set of peers gets only 1 second to receive RPC responses? Instead of a break immediately, is the behavior we want to wait until the futures are drained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Resolved.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…into daniel-qs-counter
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
for peer in request_peers { | ||
futures.push(network_sender.request_batch(request.clone(), peer, rpc_timeout)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work?
else if (futures.len() == 0) {
break;
}
Having another num_futures
just feels a bit brittle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Thanks
let network_sender = self.network_sender.clone(); | ||
let request_num_peers = self.request_num_peers; | ||
let my_peer_id = self.my_peer_id; | ||
let epoch = self.epoch; | ||
let timeout = Duration::from_millis(self.request_timeout_ms as u64); | ||
let retry_timeout = Duration::from_millis(self.retry_timeout_ms as u64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name is confusing to me, we probably should call it "retry_interval" or something similar
num_futures -= 1; | ||
if let Ok(batch) = response { | ||
counters::RECEIVED_BATCH_RESPONSE_COUNT.inc(); | ||
if batch.verify().is_ok() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need a rebase
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
if self.num_retries == 0 { | ||
let mut rng = rand::thread_rng(); | ||
// make sure nodes request from the different set of nodes | ||
self.next_index = rng.gen::<usize>() % self.signers.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are peers shuffled or sorted? If sorted, can it be a problem that we are using a consecutive range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's shuffled
proof.shuffled_signers(&self.validator_verifier), |
if self.num_retries == 0 { | ||
let mut rng = rand::thread_rng(); | ||
// make sure nodes request from the different set of nodes | ||
self.next_index = rng.gen::<usize>() % self.signers.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's shuffled
proof.shuffled_signers(&self.validator_verifier), |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
✅ Forge suite
|
* fix batch request counter, add retry limit to config * new batch requester
* fix batch request counter, add retry limit to config * new batch requester Co-authored-by: danielx <[email protected]>
Description
Fix the batch requester to send batch requests to peers every retry_timeout interval (via RPC request of rpc_timeout) up to retry_limit times. Fix related counters, and move related parameters to the QS config file.
Test Plan