Loop over confirm_slot until no more entries or slot completed #28535

buffalu · 2022-10-21T15:31:51Z

Problem

while confirm_slot is running, more shreds may have been received and inserted into blockstore.
we should process those immediately

Summary of Changes

loop over confirm_slot until end of slot or no more shreds

Risks

what if leader slowly feeds shreds to validators and keeps them stuck in confirm_slot? it doesn't seem like there's any concept of time in here, so perhaps we consider adding a timeout for how long we're willing to run this method for?

buffalu · 2022-10-21T15:36:38Z

core/src/replay_stage.rs

-            prioritization_fee_cache,
-        )?;
+
+        let mut more_entries_to_process = true;


could do bank.is_complete here, but should've been picked up after-the-fact on the previous iteration and not marked for replay

apfitzge · 2022-10-21T15:41:31Z

Do you have any stats on how much this improves replay by in prod?

what if leader slowly feeds shreds to validators and keeps them stuck in confirm_slot?

We wouldn't loop here, but what is the current behavior if the validator is slowly fed shreds?

buffalu · 2022-10-21T16:18:26Z

Do you have any stats on how much this improves replay by in prod?

will load onto one of our mainnet machines now and respond shortly

offerm · 2022-10-21T16:50:25Z

I believe this is highly related to #27729 and the suggested fix #28069
I'm not sure the loop provide any value once @28069 merged

buffalu · 2022-10-21T17:04:04Z

I believe this is highly related to #27729 and the suggested fix #28069 I'm not sure the loop provide any value once @28069 merged

it would def help to avoid a whole other iteration of replay stage logic (why run another iteration of replay stage logic if you can stay here and continue to process to finish earlier?), but the change you made should help too. i think they're separate but related

buffalu · 2022-10-22T01:32:08Z

two voting validators running on the same hardware.

replay slot complete timestamps
green = new algorithm

entry poh verification elapsed.
green = new algorithm

replay compute time (blue = new algorithm)

replay elapsed (red = new algorithm. this is where @offerm pr would come in the most handy i think)

tx verify time (light pink = new algorithm)

fetch entry time(green = new algorithm)

entry poh verification over long time period (green = new algo). kinda surprised how much it makes a difference here

sakridge · 2022-10-24T15:39:27Z

core/src/replay_stage.rs

+        // more entries may have been received while replaying this slot.
+        // looping over this ensures that slots will be processed as fast as possible with the
+        // lowest latency.
+        while more_entries_to_process {


I think we might want some kind of exit out of here based on timing as a safe-guard. Otherwise we are potentially not voting for a long time or starting a leader slot which wouldn't be great.

@carllin wdyt?

i agree that'd be the safest thing to do, could limit to 400-ish ms. worst case exec takes longer than expected and we just hit another iteration of replay before hitting this again

Sleep may not be necessary, worst case we just hit one more iteration of reading from blockstore before returning here right:

solana/ledger/src/blockstore_processor.rs

Lines 1151 to 1153 in 0145447

if slot_entries_load_result.0.is_empty() {

return Ok(false);

}

i don't think sleep is the right word, more like:

let start = Instant::now(); while more_entries_to_process && start.elapsed() < Duration::from_millis(TIMEOUT) { confirm_slot(...); }

hmmm yeah that works, should prevent bad leader from DOSing by continually streaming new shreds

Could even probably warn if it hits that timeout

loop over confirm slot

390aa86

mergify bot added the community Community contribution label Oct 21, 2022

mergify bot requested a review from a team October 21, 2022 15:32

change var name

0145447

buffalu commented Oct 21, 2022

View reviewed changes

sakridge reviewed Oct 24, 2022

View reviewed changes

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Dec 29, 2022

github-actions bot closed this Jan 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loop over confirm_slot until no more entries or slot completed #28535

Loop over confirm_slot until no more entries or slot completed #28535

buffalu commented Oct 21, 2022

buffalu Oct 21, 2022

apfitzge commented Oct 21, 2022 •

edited

Loading

buffalu commented Oct 21, 2022

offerm commented Oct 21, 2022

buffalu commented Oct 21, 2022 •

edited

Loading

buffalu commented Oct 22, 2022

sakridge Oct 24, 2022

buffalu Oct 24, 2022

carllin Oct 24, 2022

buffalu Oct 24, 2022

carllin Oct 24, 2022

carllin Oct 24, 2022

	if slot_entries_load_result.0.is_empty() {
	return Ok(false);
	}

Loop over confirm_slot until no more entries or slot completed #28535

Loop over confirm_slot until no more entries or slot completed #28535

Conversation

buffalu commented Oct 21, 2022

Problem

Summary of Changes

Risks

buffalu Oct 21, 2022

Choose a reason for hiding this comment

apfitzge commented Oct 21, 2022 • edited Loading

buffalu commented Oct 21, 2022

offerm commented Oct 21, 2022

buffalu commented Oct 21, 2022 • edited Loading

buffalu commented Oct 22, 2022

sakridge Oct 24, 2022

Choose a reason for hiding this comment

buffalu Oct 24, 2022

Choose a reason for hiding this comment

carllin Oct 24, 2022

Choose a reason for hiding this comment

buffalu Oct 24, 2022

Choose a reason for hiding this comment

carllin Oct 24, 2022

Choose a reason for hiding this comment

carllin Oct 24, 2022

Choose a reason for hiding this comment

apfitzge commented Oct 21, 2022 •

edited

Loading

buffalu commented Oct 21, 2022 •

edited

Loading