-
Notifications
You must be signed in to change notification settings - Fork 4.5k
purge duplicated bank prioritization fee from cache #33062
purge duplicated bank prioritization fee from cache #33062
Conversation
@@ -296,6 +297,9 @@ impl OptimisticallyConfirmedBankTracker { | |||
*highest_confirmed_slot = slot; | |||
} | |||
drop(w_optimistically_confirmed_bank); | |||
|
|||
// finalize block's minimum prioritization fee cache for this bank | |||
prioritization_fee_cache.finalize_priority_fee(slot, bank_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finalize prioritization_fee_cache
with OC-ed bank_id, purge fee from other banks from cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, actually, i think .finalize_priority_fee()
should be called just after Self::notify_slot_status(..., SlotNotification::OptimisticallyConfirmed)
-ing inside OptimisticallyConfirmedBankTracker::notify_or_defer()
.
otherwise, we can't ensure bank is frozen (prioritization cache can potentially have updates until frozen) and all parent banks' prioritization fee caches are correctly finalized without skipping?
note that BankNotification::OptimisticallyConfirmed(slot)
can have gaps and could be send before locally replaying..
so, #32692 had already bugs, it seems...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, not familiar enough with this part of code. Let me understand how it flows first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, the gap between OptimisticallyConfirmed and bank frozen is real. Thanks for pointing to the correct solution. I opened a separate PR #33100 to address this bug.
@@ -356,6 +358,9 @@ impl OptimisticallyConfirmedBankTracker { | |||
w_optimistically_confirmed_bank.bank = bank; | |||
} | |||
drop(w_optimistically_confirmed_bank); | |||
|
|||
// finalize block's minimum prioritization fee cache for this bank | |||
prioritization_fee_cache.finalize_priority_fee(frozen_slot, frozen_bank_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do same when pending OC bank is frozen
@@ -121,20 +109,26 @@ impl PrioritizationFeeCacheMetrics { | |||
enum CacheServiceUpdate { | |||
TransactionUpdate { | |||
slot: Slot, | |||
bank_id: BankId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add bank_id
to CacheServiceUpdate
enums
transaction_fee: u64, | ||
writable_accounts: Arc<Vec<Pubkey>>, | ||
}, | ||
BankFrozen { | ||
slot: Slot, | ||
bank_id: BankId, | ||
}, | ||
Exit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, how about adding new BankDumped { slot: Slot }
, instead of introducing this BankId
-based 2nd level caching? And the variant can be sent inside purge_unfirmed_duplicate_slot()
after plumbing &PriotizationFeeCache
there.
that alternative impl is simpler and should be safe as long as solPrFeeCacSvc
is single threaded, which should be because of already existing linearizable requirement of BankFrozen
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. But then I thought that if I had missed purge_unconfirmed_duplicated_slot()
last time, what would make sure I won't miss something else, or new duplicated slot/bank logic being added without being aware of prioritization_fee_cache
? That motivates to explore the approach of purge cache by OC event, which should be definitive and un-rollbackable. But I do agree it adds some complicity, and using bank_id
as secondary key may not be ideal. Hence this is a draft. 😃 Keen to hear your thoughts on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i see your reasoning for bank_id approach clearly.
i spent some time on which is better. and i couldn't conclude it. so, it boils down to the author's preference.
as said bank_id takes more human effort than BankDumped
approach. yet, i saw you added tests. seems you're well energized for the effort. :)
please continue. once this pr is out of draft, i'll review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree that this approach is bit more involved, but not too bad :) I reasoned to myself that it is worth the extra effort to make p.fee cache bit more independent.
I added test thinking either approach we will need a test to capture the corner case scenario.
I'd like to add a metric counter to capture the occurrence of dup bank, then will move it out of draft.
Thanks for being patient ❤️
07b357c
to
2d097ef
Compare
2d097ef
to
b135ef7
Compare
b135ef7
to
df8b794
Compare
@@ -121,20 +125,26 @@ impl PrioritizationFeeCacheMetrics { | |||
enum CacheServiceUpdate { | |||
TransactionUpdate { | |||
slot: Slot, | |||
bank_id: BankId, | |||
transaction_fee: u64, | |||
writable_accounts: Arc<Vec<Pubkey>>, | |||
}, | |||
BankFrozen { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
come to think of it, this variant isn't about frozen
. maybe, BankFinalized
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right! fixing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally looks good. i think #33100 needs to land first and rebase this pr then.
Yea, this sounds a plan |
d5a3069
to
456dcbb
Compare
Codecov Report
@@ Coverage Diff @@
## master #33062 +/- ##
=========================================
+ Coverage 81.9% 82.1% +0.1%
=========================================
Files 762 785 +23
Lines 207801 211280 +3479
=========================================
+ Hits 170353 173509 +3156
- Misses 37448 37771 +323 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see any issues, but I think the bug was found by ryoqun, so please wait for his feedback before merging
.filter_map(|(slot, slot_prioritization_fee)| { | ||
slot_prioritization_fee | ||
.iter() | ||
.find_map(|prioritization_fee_read| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: how about renaming prioritization_fee_read
=> prioritization_fee
as this isn't related to Mutex
anymore.
measure!(Self::get_prioritization_fee(cache, slot), "cache_lock_time"); | ||
|
||
let (mut block_prioritization_fee, entry_lock_time) = | ||
measure!(block_prioritization_fee.lock().unwrap(), "entry_lock_time"); | ||
|
||
// prune cache by evicting write account entry from prioritization fee if its fee is less | ||
// or equal to block's minimum transaction fee, because they are irrelevant in calculating | ||
// block minimum fee. | ||
let (_, slot_finalize_time) = measure!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems the returned Result is silently discarded after assigning it to _
. how about error!()
-ing, at least?
side-note: i have hidden desire for --deny clippy::let_underscore_must_use
tree-wide... ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼 added error!(...)
should Result is_err()
.unwrap() | ||
.get_min_transaction_fee() | ||
.get(&bank.bank_id()) | ||
.map(|block_fee| block_fee.get_min_transaction_fee()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling block_fee.get_min_transaction_fee()
isn't meaningless... i think you meant to call .and_then()
? or remove .map(...)
altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mean, i think .map(...)
's return type is Option<Option<u64>>
. so, we aren't checking .is_none()
on the returned value of .get_min_transaction_fee()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Yea, should use and_then
; then realized just checking if block_fee.get_min_transaction_fee()
has value would make test flaky. In case a bank
updates cache with multiple transactions, since each transaction applies to cache one at time, the sync_update()
should make sure all transactions have applied (instead of checking if any transaction has applied). I moved metrics.accumulate_successful_transaction_update_count(1);
and conveniently used it to sync updates. tbh, not too happy double-purposing metrics counter here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, glad that we fixed test flakiness. :)
tbh, not too happy double-purposing metrics counter here.
same here. but i think this is acceptable, considering this is just a tiny piece of test harnesses...
while !fee.lock().unwrap().is_finalized() { | ||
while !fee | ||
.get(&bank_id) | ||
.map_or_else(|| false, |block_fee| block_fee.is_finalized()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: .map_or()
should be preferred here, because it's fine to eager-evaluate false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. thanks for the extra polishes. :)
will added bp label when #33100 bps are merged. |
* purge duplicated bank prioritization fee from cache * add test for purge dup bank * Added metrics counts to monitor anomalies * fix a flaky test (cherry picked from commit 4f4ce69) # Conflicts: # runtime/src/prioritization_fee.rs # runtime/src/prioritization_fee_cache.rs
* purge duplicated bank prioritization fee from cache * add test for purge dup bank * Added metrics counts to monitor anomalies * fix a flaky test (cherry picked from commit 4f4ce69) # Conflicts: # runtime/src/prioritization_fee_cache.rs
#33062) (#33253) * purge duplicated bank prioritization fee from cache (#33062) * purge duplicated bank prioritization fee from cache * add test for purge dup bank * Added metrics counts to monitor anomalies * fix a flaky test (cherry picked from commit 4f4ce69) # Conflicts: # runtime/src/prioritization_fee.rs # runtime/src/prioritization_fee_cache.rs * manual merge --------- Co-authored-by: Tao Zhu <[email protected]> Co-authored-by: Tao Zhu <[email protected]>
#33062) (#33254) * purge duplicated bank prioritization fee from cache (#33062) * purge duplicated bank prioritization fee from cache * add test for purge dup bank * Added metrics counts to monitor anomalies * fix a flaky test (cherry picked from commit 4f4ce69) # Conflicts: # runtime/src/prioritization_fee_cache.rs * maunal merge --------- Co-authored-by: Tao Zhu <[email protected]> Co-authored-by: Tao Zhu <[email protected]>
solana-labs#33062) (solana-labs#33253) * purge duplicated bank prioritization fee from cache (solana-labs#33062) * purge duplicated bank prioritization fee from cache * add test for purge dup bank * Added metrics counts to monitor anomalies * fix a flaky test (cherry picked from commit 4f4ce69) # Conflicts: # runtime/src/prioritization_fee.rs # runtime/src/prioritization_fee_cache.rs * manual merge --------- Co-authored-by: Tao Zhu <[email protected]> Co-authored-by: Tao Zhu <[email protected]>
Problem
Duplicated banks can add inconsistent data to prioritization_fee_cache.
Summary of Changes
Fixes #