Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

purge duplicated bank prioritization fee from cache #33062

Conversation

tao-stones
Copy link
Contributor

@tao-stones tao-stones commented Aug 30, 2023

Problem

Duplicated banks can add inconsistent data to prioritization_fee_cache.

Summary of Changes

  • purge duplicated bank's priority_fee from cache during finalization
  • add test for the described scenario
  • add metrics counts to monitor such occurrences

Fixes #

@@ -296,6 +297,9 @@ impl OptimisticallyConfirmedBankTracker {
*highest_confirmed_slot = slot;
}
drop(w_optimistically_confirmed_bank);

// finalize block's minimum prioritization fee cache for this bank
prioritization_fee_cache.finalize_priority_fee(slot, bank_id);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finalize prioritization_fee_cache with OC-ed bank_id, purge fee from other banks from cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, actually, i think .finalize_priority_fee() should be called just after Self::notify_slot_status(..., SlotNotification::OptimisticallyConfirmed)-ing inside OptimisticallyConfirmedBankTracker::notify_or_defer().

otherwise, we can't ensure bank is frozen (prioritization cache can potentially have updates until frozen) and all parent banks' prioritization fee caches are correctly finalized without skipping?

note that BankNotification::OptimisticallyConfirmed(slot) can have gaps and could be send before locally replaying..

so, #32692 had already bugs, it seems...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, not familiar enough with this part of code. Let me understand how it flows first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, the gap between OptimisticallyConfirmed and bank frozen is real. Thanks for pointing to the correct solution. I opened a separate PR #33100 to address this bug.

@@ -356,6 +358,9 @@ impl OptimisticallyConfirmedBankTracker {
w_optimistically_confirmed_bank.bank = bank;
}
drop(w_optimistically_confirmed_bank);

// finalize block's minimum prioritization fee cache for this bank
prioritization_fee_cache.finalize_priority_fee(frozen_slot, frozen_bank_id);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do same when pending OC bank is frozen

@@ -121,20 +109,26 @@ impl PrioritizationFeeCacheMetrics {
enum CacheServiceUpdate {
TransactionUpdate {
slot: Slot,
bank_id: BankId,
Copy link
Contributor Author

@tao-stones tao-stones Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add bank_id to CacheServiceUpdate enums

transaction_fee: u64,
writable_accounts: Arc<Vec<Pubkey>>,
},
BankFrozen {
slot: Slot,
bank_id: BankId,
},
Exit,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, how about adding new BankDumped { slot: Slot }, instead of introducing this BankId-based 2nd level caching? And the variant can be sent inside purge_unfirmed_duplicate_slot() after plumbing &PriotizationFeeCache there.

that alternative impl is simpler and should be safe as long as solPrFeeCacSvc is single threaded, which should be because of already existing linearizable requirement of BankFrozen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. But then I thought that if I had missed purge_unconfirmed_duplicated_slot() last time, what would make sure I won't miss something else, or new duplicated slot/bank logic being added without being aware of prioritization_fee_cache? That motivates to explore the approach of purge cache by OC event, which should be definitive and un-rollbackable. But I do agree it adds some complicity, and using bank_id as secondary key may not be ideal. Hence this is a draft. 😃 Keen to hear your thoughts on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i see your reasoning for bank_id approach clearly.

i spent some time on which is better. and i couldn't conclude it. so, it boils down to the author's preference.

as said bank_id takes more human effort than BankDumped approach. yet, i saw you added tests. seems you're well energized for the effort. :)

please continue. once this pr is out of draft, i'll review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree that this approach is bit more involved, but not too bad :) I reasoned to myself that it is worth the extra effort to make p.fee cache bit more independent.

I added test thinking either approach we will need a test to capture the corner case scenario.

I'd like to add a metric counter to capture the occurrence of dup bank, then will move it out of draft.

Thanks for being patient ❤️

@tao-stones tao-stones force-pushed the prioritization_fee_cache_purge_duplicated_bank branch 5 times, most recently from 07b357c to 2d097ef Compare August 31, 2023 20:16
@tao-stones tao-stones force-pushed the prioritization_fee_cache_purge_duplicated_bank branch from 2d097ef to b135ef7 Compare September 1, 2023 16:00
@tao-stones tao-stones marked this pull request as ready for review September 1, 2023 16:05
@tao-stones tao-stones force-pushed the prioritization_fee_cache_purge_duplicated_bank branch from b135ef7 to df8b794 Compare September 1, 2023 16:41
@@ -121,20 +125,26 @@ impl PrioritizationFeeCacheMetrics {
enum CacheServiceUpdate {
TransactionUpdate {
slot: Slot,
bank_id: BankId,
transaction_fee: u64,
writable_accounts: Arc<Vec<Pubkey>>,
},
BankFrozen {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

come to think of it, this variant isn't about frozen. maybe, BankFinalized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right! fixing it.

Copy link
Contributor

@ryoqun ryoqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good. i think #33100 needs to land first and rebase this pr then.

@tao-stones
Copy link
Contributor Author

generally looks good. i think #33100 needs to land first and rebase this pr then.

Yea, this sounds a plan

@tao-stones tao-stones force-pushed the prioritization_fee_cache_purge_duplicated_bank branch from d5a3069 to 456dcbb Compare September 6, 2023 15:02
@tao-stones
Copy link
Contributor Author

generally looks good. i think #33100 needs to land first and rebase this pr then.

Yea, this sounds a plan

#33100 is merged and rebased.

@codecov
Copy link

codecov bot commented Sep 6, 2023

Codecov Report

Merging #33062 (878a996) into master (90d9839) will increase coverage by 0.1%.
Report is 812 commits behind head on master.
The diff coverage is 90.5%.

@@            Coverage Diff            @@
##           master   #33062     +/-   ##
=========================================
+ Coverage    81.9%    82.1%   +0.1%     
=========================================
  Files         762      785     +23     
  Lines      207801   211280   +3479     
=========================================
+ Hits       170353   173509   +3156     
- Misses      37448    37771    +323     

Copy link
Contributor

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see any issues, but I think the bug was found by ryoqun, so please wait for his feedback before merging

@tao-stones tao-stones requested a review from ryoqun September 6, 2023 21:15
.filter_map(|(slot, slot_prioritization_fee)| {
slot_prioritization_fee
.iter()
.find_map(|prioritization_fee_read| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about renaming prioritization_fee_read => prioritization_fee as this isn't related to Mutex anymore.

measure!(Self::get_prioritization_fee(cache, slot), "cache_lock_time");

let (mut block_prioritization_fee, entry_lock_time) =
measure!(block_prioritization_fee.lock().unwrap(), "entry_lock_time");

// prune cache by evicting write account entry from prioritization fee if its fee is less
// or equal to block's minimum transaction fee, because they are irrelevant in calculating
// block minimum fee.
let (_, slot_finalize_time) = measure!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems the returned Result is silently discarded after assigning it to _. how about error!()-ing, at least?

side-note: i have hidden desire for --deny clippy::let_underscore_must_use tree-wide... ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 added error!(...) should Result is_err()

.unwrap()
.get_min_transaction_fee()
.get(&bank.bank_id())
.map(|block_fee| block_fee.get_min_transaction_fee())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling block_fee.get_min_transaction_fee() isn't meaningless... i think you meant to call .and_then()? or remove .map(...) altogether?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean, i think .map(...)'s return type is Option<Option<u64>>. so, we aren't checking .is_none() on the returned value of .get_min_transaction_fee().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Yea, should use and_then; then realized just checking if block_fee.get_min_transaction_fee() has value would make test flaky. In case a bank updates cache with multiple transactions, since each transaction applies to cache one at time, the sync_update() should make sure all transactions have applied (instead of checking if any transaction has applied). I moved metrics.accumulate_successful_transaction_update_count(1); and conveniently used it to sync updates. tbh, not too happy double-purposing metrics counter here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, glad that we fixed test flakiness. :)

tbh, not too happy double-purposing metrics counter here.

same here. but i think this is acceptable, considering this is just a tiny piece of test harnesses...

while !fee.lock().unwrap().is_finalized() {
while !fee
.get(&bank_id)
.map_or_else(|| false, |block_fee| block_fee.is_finalized())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: .map_or() should be preferred here, because it's fine to eager-evaluate false.

ryoqun
ryoqun previously approved these changes Sep 7, 2023
Copy link
Contributor

@ryoqun ryoqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with nits

@tao-stones tao-stones added v1.14 v1.16 PRs that should be backported to v1.16 and removed v1.14 v1.16 PRs that should be backported to v1.16 labels Sep 8, 2023
@tao-stones tao-stones requested a review from ryoqun September 8, 2023 19:26
Copy link
Contributor

@ryoqun ryoqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. thanks for the extra polishes. :)

@tao-stones tao-stones merged commit 4f4ce69 into solana-labs:master Sep 11, 2023
@tao-stones
Copy link
Contributor Author

will added bp label when #33100 bps are merged.

@tao-stones tao-stones added v1.14 v1.16 PRs that should be backported to v1.16 labels Sep 14, 2023
mergify bot pushed a commit that referenced this pull request Sep 14, 2023
* purge duplicated bank prioritization fee from cache

* add test for purge dup bank

* Added metrics counts to monitor anomalies

* fix a flaky test

(cherry picked from commit 4f4ce69)

# Conflicts:
#	runtime/src/prioritization_fee.rs
#	runtime/src/prioritization_fee_cache.rs
mergify bot pushed a commit that referenced this pull request Sep 14, 2023
* purge duplicated bank prioritization fee from cache

* add test for purge dup bank

* Added metrics counts to monitor anomalies

* fix a flaky test

(cherry picked from commit 4f4ce69)

# Conflicts:
#	runtime/src/prioritization_fee_cache.rs
tao-stones added a commit that referenced this pull request Sep 22, 2023
#33062) (#33253)

* purge duplicated bank prioritization fee from cache (#33062)

* purge duplicated bank prioritization fee from cache

* add test for purge dup bank

* Added metrics counts to monitor anomalies

* fix a flaky test

(cherry picked from commit 4f4ce69)

# Conflicts:
#	runtime/src/prioritization_fee.rs
#	runtime/src/prioritization_fee_cache.rs

* manual merge

---------

Co-authored-by: Tao Zhu <[email protected]>
Co-authored-by: Tao Zhu <[email protected]>
tao-stones added a commit that referenced this pull request Sep 22, 2023
#33062) (#33254)

* purge duplicated bank prioritization fee from cache (#33062)

* purge duplicated bank prioritization fee from cache

* add test for purge dup bank

* Added metrics counts to monitor anomalies

* fix a flaky test

(cherry picked from commit 4f4ce69)

# Conflicts:
#	runtime/src/prioritization_fee_cache.rs

* maunal merge

---------

Co-authored-by: Tao Zhu <[email protected]>
Co-authored-by: Tao Zhu <[email protected]>
bw-solana pushed a commit to bw-solana/solana that referenced this pull request Jan 10, 2025
solana-labs#33062) (solana-labs#33253)

* purge duplicated bank prioritization fee from cache (solana-labs#33062)

* purge duplicated bank prioritization fee from cache

* add test for purge dup bank

* Added metrics counts to monitor anomalies

* fix a flaky test

(cherry picked from commit 4f4ce69)

# Conflicts:
#	runtime/src/prioritization_fee.rs
#	runtime/src/prioritization_fee_cache.rs

* manual merge

---------

Co-authored-by: Tao Zhu <[email protected]>
Co-authored-by: Tao Zhu <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
v1.16 PRs that should be backported to v1.16
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants