Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable periodic compaction on several columns #32548

Merged
merged 2 commits into from
Jul 20, 2023

Conversation

steviez
Copy link
Contributor

@steviez steviez commented Jul 19, 2023

Problem

Periodic compaction was previously disabled on all columns in #27571 in favor of the delete_file_in_range() approach that #26651 introduced. However, several columns still rely on periodic compaction to reclaim storage. Namely, the TransactionStatus and AddressSignatures columns as these columns contain a slot in their key, but as a non-primary index.

The result of periodic compaction not running on these columns is that no storage space was being reclaimed from columns. This is obviously bad and would lead to a node eventually running of storage space and crashing.

Between the two PR's above and another one (#28409), we might have gotten a little overzealous in removing "dead" code and ended up deleted something that we still needed (I reviewed that PR and missed it 😢 ). This change affects master & v1.16, and probably ran undetected for so long since I imagine there are not as many nodes running with --enable-rpc-transaction-history on testnet.

Summary of Changes

This PR reintroduces periodic compaction, but only for the columns that need it. Again, note that this change only affects validators running with --enable-rpc-transaction-history, as the TransactionStatus and AddressSignatures are not populated without this flag.

A lot of the comment is adapted from content that ryoqun originally wrote, but that got ripped out with 27571; this PR adds back a rewrite, lots of things are similar but some things are different as well.

The below graph shows the blockstore_rocksdb_cfs.total_sst_files_size for the largest columns; this statistic reports how much space is occupied on disk by SST's (the rocksdb backing file):

  • Pink ==> data shreds
  • Purple ==> coding shreds
  • Orange ==> transaction status
  • Blue ==> address signatures
  • The reported value is in bytes, and the dividers on the vertical scale are 100B increments. Thus, each divider is 100 GB
image

A few notes to describe why the traces:

  • A: The node was running without --enable-rpc-transaction-history in this period. So, there is no space occupied by the extra columns and the amount of space occupied by data and shred columns stays relatively flat around around ~230 GB each.
  • B: The node was restarted with --enable-rpc-transaction-history; you can immediately see the space occupied by transaction status and address signatures columns begin to grow. More importantly, this growth is unbounded over the course of several days as it reaches almost 800 GB for address signatures and ~230 GB for transaction status.
  • C: The node was restarted with this PR on top of master. The node immediately gets to work on cleaning outdated files from transaction status and address signatures columns, and we can see the occupied space drops pretty quickly. Note that the node remained healthy while it was doing this cleaning

Periodic compaction was previously disabled on all columns in solana-labs#27571 in
favor of the delete_file_in_range() approach that solana-labs#26651 introduced.
However, several columns still rely on periodic compaction to reclaim
storage. Namely, the TransactionStatus and AddressSignatures columns as
these columns contain a slot in their key, but as the secondary index.

The result of periodic compaction not running on these columns is that
no storage space was being reclaimed from columns. This is obviously bad
and would lead to a node eventually running of storage space and
crashing.

This PR reintroduces periodic compaction, but only for the columns that
need it.
@steviez steviez marked this pull request as ready for review July 20, 2023 05:19
@codecov
Copy link

codecov bot commented Jul 20, 2023

Codecov Report

Merging #32548 (b95bff6) into master (3b8bcb4) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##           master   #32548     +/-   ##
=========================================
- Coverage    82.0%    82.0%   -0.1%     
=========================================
  Files         780      780             
  Lines      210719   210730     +11     
=========================================
- Hits       172884   172861     -23     
- Misses      37835    37869     +34     

@steviez steviez requested review from yhchiang-sol and ryoqun July 20, 2023 05:21
@steviez steviez added the v1.16 PRs that should be backported to v1.16 label Jul 20, 2023
@steviez steviez requested a review from jbiseda July 20, 2023 18:02
Copy link
Contributor

@jbiseda jbiseda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, a few suggestions.

ledger/src/blockstore_db.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_db.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_db.rs Outdated Show resolved Hide resolved
ledger/src/blockstore_db.rs Outdated Show resolved Hide resolved

no_compaction_cfs.get(cf_name).is_some()
// Returns whether compactions should be enabled for the given column (name).
fn should_enable_cf_compaction(cf_name: &str) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reversing the logic and making this "enable" rather than "exclude"!

Copy link
Contributor

@jbiseda jbiseda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@steviez steviez merged commit d73fa1b into solana-labs:master Jul 20, 2023
@steviez steviez deleted the bstore_fix_compact branch July 20, 2023 21:34
mergify bot pushed a commit that referenced this pull request Jul 20, 2023
Periodic compaction was previously disabled on all columns in #27571 in
favor of the delete_file_in_range() approach that #26651 introduced.
However, several columns still rely on periodic compaction to reclaim
storage. Namely, the TransactionStatus and AddressSignatures columns, as
these columns contain a slot in their key, but as a non-primary index.

The result of periodic compaction not running on these columns is that
no storage space is being reclaimed from columns. This is obviously bad
and would lead to a node eventually running of storage space and
crashing.

This PR reintroduces periodic compaction, but only for the columns that
need it.

(cherry picked from commit d73fa1b)
Copy link
Contributor

@yhchiang-sol yhchiang-sol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting and fixing the issue with detailed results.

The comment in the PR is correct about the effectiveness of delete_file_in_range() #26651, which works only when the keys are prefixed by Slot so it doesn't apply to TransactionStatus and AddressSignatures.

However, it surprises me a bit that the regular RocksDB compactions don't reclaim storage effectively on TransactionStatus and AddressSignatures, but the findings in this PR also indicate opportunities to further improve the compactions for these two columns.

The fix looks right and legit to me as it brings back the previous settings of these two column families. I will think about whether there're better ways to make RocksDB smarter to naturally reclaim storage on these two column families without issuing recompactions.

@steviez
Copy link
Contributor Author

steviez commented Jul 21, 2023

However, it surprises me a bit that the regular RocksDB compactions don't reclaim storage effectively on TransactionStatus and AddressSignatures, but the findings in this PR also indicate opportunities to further improve the compactions for these two columns.

We discussed some of this in DM's, but for the sake of leaving a paper trail, my current theory:

  • The default value for --limit-ledger-size is 200M shreds. With current mnb workload, this is about 200k slots (~1k shreds / slot)
  • With how LedgerCleanupService works on retiring slots in FIFO order, this means that a slot has about 24 hours of life in the blockstore. Ie, if slot S is inserted at 12:00 today, I expect it to be marked for clean at 12:00 tomorrow.
  • By the time a slot S is marked for clean, my guess is that a file has propagated down the tree to a higher level. At this point, it may not get picked up for another compaction. The key space for these columns are pubkeys and signatures; both of these should have a fairly even distribution (obviously Pubkey will have some "hot" keys). So, maybe not enough for Rocks to think to compact

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v1.16 PRs that should be backported to v1.16
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants