Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache block time in Blockstore #11955

Merged
merged 6 commits into from
Sep 9, 2020

Conversation

CriesofCarrots
Copy link
Contributor

@CriesofCarrots CriesofCarrots commented Aug 31, 2020

Problem

The getBlockTime rpc endpoint can return null for a block that hasn't been pruned from Blockstore. This is because we only keep the last 5 epochs of stake info, and stake info is needed for calculating a block timestamp on demand.

To address this problem, and generally offer better block-time support, we've made two changes to the original design (https://docs.solana.com/implemented-proposals/validator-timestamp-oracle):

  1. The initial design called for block times to be calculated on demand in order to demonstrate that they are deterministic from the ledger. Caching block times saves this unnecessary every-call work; the block times are still deterministic from the ledger.
  2. The initial design called for validators to vote every ~30min (4500 slots) in order to save space in Vote transactions. As of Submit a vote timestamp every vote #10630 , validators are now timestamp every Vote. As a result, there are plenty of timestamps available to calculate a stake-weighted timestamp as soon as a block is rooted in Blockstre.

Summary of Changes

  • Add Blockstore column family to cache blocktimes
  • Add CacheBlockTimeService, triggered from ReplayStage::handle_votable_bank, to calculate and cache a block's timestamp as soon as a block is rooted
  • Use cache in getBlockTime rpc
  • Add block_time to ConfirmedBlock when populated

Fixes #10089
Note: blocks before this PR is released may still return null

core/src/replay_stage.rs Outdated Show resolved Hide resolved
Comment on lines +51 to +58
let slot_duration = slot_duration_from_slots_per_year(bank.slots_per_year());
let epoch = bank.epoch_schedule().get_epoch(bank.slot());
let stakes = HashMap::new();
let stakes = bank.epoch_vote_accounts(epoch).unwrap_or(&stakes);

if let Err(e) = blockstore.cache_block_time(bank.slot(), slot_duration, stakes) {
error!("cache_block_time failed: slot {:?} {:?}", bank.slot(), e);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How expensive is this? Maybe wrap a measure around it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pieces of this are measured in get_timestamp_slots() and cache_block_time(). You looking for a sum?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just more wondering if we run into issues of being able to keep up with the new slots as they come in. But if this operation takes 1-10ms or so then no worries

Copy link
Contributor Author

@CriesofCarrots CriesofCarrots Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empirical data suggests this operation will be in the 2-10ms range for current mainnet-beta throughput. However, it does depend on deserializing blocks to find vote transactions, and that deserialization definitely takes longer as TPS increases. (With about 10k TPS, I was seeing this take about 10x as long on my under-powered GCE instance.) One solution would be to index vote transactions/timestamps in blockstore to avoid the deserialization altogether; possibly as part of the transaction-status-service. I think that could be a follow-up optimization. Wdyt? @mvines

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool, that seems fine for now. But how about this:

  1. Move the recv_timeout() out of cache_block_time()
  2. In the main thread loop, wrap a measure around cache_block_time(). Then if cache_block_time() takes longer than IDK, 100ms or so, emit a warn! or error! log.

Since this is an unbounded channel, if cache_block_time() ever does get backed up and roots start coming in faster than it can process then we have a memory leak and will probably eventually OOM. It'd be nice to get yelled at from the log if this ever starts happening

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a plan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this look? d0a0f50

@CriesofCarrots CriesofCarrots force-pushed the cache-block-time branch 2 times, most recently from ec4453f to a346ced Compare August 31, 2020 23:35
@CriesofCarrots CriesofCarrots added CI Pull Request is ready to enter CI and removed CI Pull Request is ready to enter CI labels Sep 1, 2020
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Sep 1, 2020
@codecov
Copy link

codecov bot commented Sep 1, 2020

Codecov Report

Merging #11955 into master will increase coverage by 0.0%.
The diff coverage is 93.1%.

@@           Coverage Diff            @@
##           master   #11955    +/-   ##
========================================
  Coverage    82.0%    82.0%            
========================================
  Files         337      338     +1     
  Lines       79225    79332   +107     
========================================
+ Hits        65011    65124   +113     
+ Misses      14214    14208     -6     

@CriesofCarrots
Copy link
Contributor Author

I rolled in adding block_time to ConfirmedBlock because it was a one-liner, which brought along block_time -> bigtable for free :)
Holding off on deprecating getBlockTime for now: #10089 (comment)

@CriesofCarrots
Copy link
Contributor Author

@mvines Anything more you'd like to see here?

core/src/cache_block_time_service.rs Outdated Show resolved Hide resolved
Comment on lines +51 to +58
let slot_duration = slot_duration_from_slots_per_year(bank.slots_per_year());
let epoch = bank.epoch_schedule().get_epoch(bank.slot());
let stakes = HashMap::new();
let stakes = bank.epoch_vote_accounts(epoch).unwrap_or(&stakes);

if let Err(e) = blockstore.cache_block_time(bank.slot(), slot_duration, stakes) {
error!("cache_block_time failed: slot {:?} {:?}", bank.slot(), e);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just more wondering if we run into issues of being able to keep up with the new slots as they come in. But if this operation takes 1-10ms or so then no worries

@CriesofCarrots
Copy link
Contributor Author

Post-merge comments welcome

@CriesofCarrots CriesofCarrots merged commit 05db41f into solana-labs:master Sep 9, 2020
CriesofCarrots added a commit to CriesofCarrots/solana that referenced this pull request Sep 23, 2020
* Add blockstore column to cache block times

* Add method to cache block time

* Add service to cache block time

* Update rpc getBlockTime to use new method, and refactor blockstore slightly

* Return block_time with confirmed block, if available

* Add measure and warning to cache-block-time
CriesofCarrots added a commit that referenced this pull request Sep 23, 2020
* Submit a vote timestamp every vote (#10630)

* Submit a timestamp for every vote

* Submit at most one vote timestamp per second

* Submit a timestamp for every new vote

Co-authored-by: Tyera Eulberg <[email protected]>

* Timestamp first vote (#11856)

* Cache block time in Blockstore (#11955)

* Add blockstore column to cache block times

* Add method to cache block time

* Add service to cache block time

* Update rpc getBlockTime to use new method, and refactor blockstore slightly

* Return block_time with confirmed block, if available

* Add measure and warning to cache-block-time

Co-authored-by: Michael Vines <[email protected]>
@CriesofCarrots CriesofCarrots deleted the cache-block-time branch September 24, 2020 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RPC: getBlockTime returns null for slot that hasn't been purged
3 participants