Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

Send messages to EntryNotifierService in Tpu #31889

Closed

Conversation

CriesofCarrots
Copy link
Contributor

Problem

#31290 adds an entry notification service, which is currently only fed from the Tvu pipeline. But leaders should also support geyser entry notifications -- this is especially useful for testing plugins on solana-test-validator, since the TestValidator is the leader for every slot.

Summary of Changes

Adds a little service to sit between PohRecorder and BroadcastStage in the Tpu pipeline. If geyser entry notifications are supported, the entry summary information is copied from the message and sent to the entry notification service. All entries are passed through unchanged to the BroadcastStage receiver.

I recommend reviewing by commit, because a change to the PohRecorder::set_bank() caused a lot of churn in tests (like in #25688)

@CriesofCarrots CriesofCarrots force-pushed the entry-notif-tpu branch 2 times, most recently from ebb1a45 to c32d28b Compare May 31, 2023 01:12
@CriesofCarrots CriesofCarrots added the v1.16 PRs that should be backported to v1.16 label May 31, 2023
@CriesofCarrots CriesofCarrots requested a review from carllin May 31, 2023 01:28
}
}

// TODO: in PohRecorder, we panic if the send to BroadcastStage fails. Should we do the same here?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on this question?
Also, would you want to see this send in a separate thread?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we panic in pohrecorder? PohService waits for an exit signal https://github.com/apfitzge/solana/blob/abbf96f17aca9647766559db5b5de714726da033/poh/src/poh_service.rs#L371, and poh_recorder doesn't panic on record either

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you want to see this send in a separate thread?

This is already running in a separate TpuEntryNotifier thread right? Seems like this send is perfectly fine here since it's nonblocking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we panic in pohrecorder? PohService waits for an exit signal https://github.com/apfitzge/solana/blob/abbf96f17aca9647766559db5b5de714726da033/poh/src/poh_service.rs#L371, and poh_recorder doesn't panic on record either

This is the panic I was referring to:

Err(e) => panic!("Poh recorder returned unexpected error: {e:?}"),

It's a bit of a chore to track all the results around, so I might be misreading. But I think that if the BroadcastStage send fails with an Err, this returns an Err, which gets returned to that method with the panic I linked above here. Is that right?

@codecov
Copy link

codecov bot commented May 31, 2023

Codecov Report

Merging #31889 (743b071) into master (40d3e8e) will decrease coverage by 0.1%.
The diff coverage is 66.8%.

@@            Coverage Diff            @@
##           master   #31889     +/-   ##
=========================================
- Coverage    81.9%    81.9%   -0.1%     
=========================================
  Files         760      761      +1     
  Lines      207397   207528    +131     
=========================================
+ Hits       169962   170006     +44     
- Misses      37435    37522     +87     

}
}

// TODO: in PohRecorder, we panic if the send to BroadcastStage fails. Should we do the same here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we panic in pohrecorder? PohService waits for an exit signal https://github.com/apfitzge/solana/blob/abbf96f17aca9647766559db5b5de714726da033/poh/src/poh_service.rs#L371, and poh_recorder doesn't panic on record either

poh/src/poh_recorder.rs Outdated Show resolved Hide resolved
core/src/tpu.rs Outdated
@@ -229,10 +231,18 @@ impl Tpu {
prioritization_fee_cache,
);

let (tpu_entry_sender, tpu_entry_receiver) = unbounded();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would it be clearer to rename these to broadcast_entry_sender and broadcast_entry_receiver to match the names of the constructor arguments in TpuEntryNotifier

}
}

// TODO: in PohRecorder, we panic if the send to BroadcastStage fails. Should we do the same here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you want to see this send in a separate thread?

This is already running in a separate TpuEntryNotifier thread right? Seems like this send is perfectly fine here since it's nonblocking

Comment on lines 268 to 279
pub entry_index: Option<usize>,
}

impl WorkingBank {
fn increment_entry_index(&mut self) {
if let Some(index) = self.entry_index.as_mut() {
*index += 1
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we could just make entry_index always a usize? It doesn't seem to be saving much work other than incrementing an integer.

This has the added benefit of removing the additional track_entry_indexes in set_bank()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also thinking maybe it would be better to just track all these indexes in tpu_entry_notifier? I.e. if you see a new entry for the same slot, then bump the index. This has a few benefits

  1. would allow us to leave poh_recorder unchanged and unaware of this additional logic
  2. if the track entry option is off, we can avoid spinning up the tpu_entry_notifier service altogether, and no additional work is incurred

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. Thanks for the idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there is an additional reason for tracking the entry indexes in poh_recorder: in order to be able to figure out which transactions fell in which entry, we are planning to add the entry index to the geyser transaction notification. The easiest way to do that is to include the entry index with the transaction-starting-index that's returned here

core/src/tpu.rs Outdated Show resolved Hide resolved
@CriesofCarrots CriesofCarrots force-pushed the entry-notif-tpu branch 2 times, most recently from b3775f4 to 173147b Compare June 3, 2023 02:08
@CriesofCarrots
Copy link
Contributor Author

Last 3 commits are new, based on review comments (I'm happy to clean up+squash things later). However, this persists tracking of entry_indexes in poh_recorder, as per this comment: #31889 (comment)
Thoughts?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
v1.16 PRs that should be backported to v1.16
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants