-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache block time in Blockstore #11955
Cache block time in Blockstore #11955
Conversation
let slot_duration = slot_duration_from_slots_per_year(bank.slots_per_year()); | ||
let epoch = bank.epoch_schedule().get_epoch(bank.slot()); | ||
let stakes = HashMap::new(); | ||
let stakes = bank.epoch_vote_accounts(epoch).unwrap_or(&stakes); | ||
|
||
if let Err(e) = blockstore.cache_block_time(bank.slot(), slot_duration, stakes) { | ||
error!("cache_block_time failed: slot {:?} {:?}", bank.slot(), e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How expensive is this? Maybe wrap a measure around it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pieces of this are measured in get_timestamp_slots()
and cache_block_time()
. You looking for a sum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just more wondering if we run into issues of being able to keep up with the new slots as they come in. But if this operation takes 1-10ms or so then no worries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empirical data suggests this operation will be in the 2-10ms range for current mainnet-beta throughput. However, it does depend on deserializing blocks to find vote transactions, and that deserialization definitely takes longer as TPS increases. (With about 10k TPS, I was seeing this take about 10x as long on my under-powered GCE instance.) One solution would be to index vote transactions/timestamps in blockstore to avoid the deserialization altogether; possibly as part of the transaction-status-service. I think that could be a follow-up optimization. Wdyt? @mvines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok cool, that seems fine for now. But how about this:
- Move the
recv_timeout()
out ofcache_block_time()
- In the main thread loop, wrap a measure around
cache_block_time()
. Then ifcache_block_time()
takes longer than IDK, 100ms or so, emit awarn!
orerror!
log.
Since this is an unbounded channel, if cache_block_time()
ever does get backed up and roots start coming in faster than it can process then we have a memory leak and will probably eventually OOM. It'd be nice to get yelled at from the log if this ever starts happening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a plan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this look? d0a0f50
ec4453f
to
a346ced
Compare
a346ced
to
b72f12d
Compare
b72f12d
to
5bb49ae
Compare
Codecov Report
@@ Coverage Diff @@
## master #11955 +/- ##
========================================
Coverage 82.0% 82.0%
========================================
Files 337 338 +1
Lines 79225 79332 +107
========================================
+ Hits 65011 65124 +113
+ Misses 14214 14208 -6 |
I rolled in adding block_time to ConfirmedBlock because it was a one-liner, which brought along block_time -> bigtable for free :) |
@mvines Anything more you'd like to see here? |
let slot_duration = slot_duration_from_slots_per_year(bank.slots_per_year()); | ||
let epoch = bank.epoch_schedule().get_epoch(bank.slot()); | ||
let stakes = HashMap::new(); | ||
let stakes = bank.epoch_vote_accounts(epoch).unwrap_or(&stakes); | ||
|
||
if let Err(e) = blockstore.cache_block_time(bank.slot(), slot_duration, stakes) { | ||
error!("cache_block_time failed: slot {:?} {:?}", bank.slot(), e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just more wondering if we run into issues of being able to keep up with the new slots as they come in. But if this operation takes 1-10ms or so then no worries
2263558
to
80bfc10
Compare
80bfc10
to
8572ed2
Compare
Post-merge comments welcome |
* Add blockstore column to cache block times * Add method to cache block time * Add service to cache block time * Update rpc getBlockTime to use new method, and refactor blockstore slightly * Return block_time with confirmed block, if available * Add measure and warning to cache-block-time
* Submit a vote timestamp every vote (#10630) * Submit a timestamp for every vote * Submit at most one vote timestamp per second * Submit a timestamp for every new vote Co-authored-by: Tyera Eulberg <[email protected]> * Timestamp first vote (#11856) * Cache block time in Blockstore (#11955) * Add blockstore column to cache block times * Add method to cache block time * Add service to cache block time * Update rpc getBlockTime to use new method, and refactor blockstore slightly * Return block_time with confirmed block, if available * Add measure and warning to cache-block-time Co-authored-by: Michael Vines <[email protected]>
Problem
The
getBlockTime
rpc endpoint can return null for a block that hasn't been pruned from Blockstore. This is because we only keep the last 5 epochs of stake info, and stake info is needed for calculating a block timestamp on demand.To address this problem, and generally offer better block-time support, we've made two changes to the original design (https://docs.solana.com/implemented-proposals/validator-timestamp-oracle):
Summary of Changes
getBlockTime
rpcConfirmedBlock
when populatedFixes #10089
Note: blocks before this PR is released may still return
null