-
Notifications
You must be signed in to change notification settings - Fork 4.5k
solana-ledger-tool verify
now runs AccountsBackgroundService
#26349
Conversation
/// maximum drop bank signal queue length | ||
/// (consider de-duping this const and `solana-core::validator::MAX_DROP_BANK_SIGNAL_QUEUE_SIZE`) | ||
const MAX_DROP_BANK_SIGNAL_QUEUE_SIZE: usize = 10_000; | ||
|
||
// There should only be one bank, the root bank in BankForks. Thus all banks added to | ||
// BankForks from now on will be descended from the root bank and thus will inherit | ||
// the bank drop callback. | ||
assert_eq!(bank_forks.read().unwrap().banks().len(), 1); | ||
let (pruned_banks_sender, pruned_banks_receiver) = | ||
crossbeam_channel::bounded(MAX_DROP_BANK_SIGNAL_QUEUE_SIZE); | ||
{ | ||
let root_bank = bank_forks.read().unwrap().root_bank(); | ||
root_bank.set_callback(Some(Box::new( | ||
root_bank | ||
.rc | ||
.accounts | ||
.accounts_db | ||
.create_drop_bank_callback(pruned_banks_sender), | ||
))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be moved into load_bank_forks()
, or create a new pub function with this code that calls load_bank_forks()
?
As you know this code duplicates what's in load_blockstore()
from validator.rs:
Lines 1461 to 1481 in 3fcdc45
// Before replay starts, set the callbacks in each of the banks in BankForks so that | |
// all dropped banks come through the `pruned_banks_receiver` channel. This way all bank | |
// drop behavior can be safely synchronized with any other ongoing accounts activity like | |
// cache flush, clean, shrink, as long as the same thread performing those activities also | |
// is processing the dropped banks from the `pruned_banks_receiver` channel. | |
// There should only be one bank, the root bank in BankForks. Thus all banks added to | |
// BankForks from now on will be descended from the root bank and thus will inherit | |
// the bank drop callback. | |
assert_eq!(bank_forks.read().unwrap().banks().len(), 1); | |
let (pruned_banks_sender, pruned_banks_receiver) = bounded(MAX_DROP_BANK_SIGNAL_QUEUE_SIZE); | |
{ | |
let root_bank = bank_forks.read().unwrap().root_bank(); | |
root_bank.set_callback(Some(Box::new( | |
root_bank | |
.rc | |
.accounts | |
.accounts_db | |
.create_drop_bank_callback(pruned_banks_sender), | |
))); | |
} |
If possible, could this duplication be refactored into a single place that both validator.rs::load_blockstore()
and bank_forks_utils.rs::load()
can call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense. I'd like to get feedback that this approach helps reduce the RAM usage he was seeing first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into same issue of OOM'ing when running solana-ledger-tool verify
with tip of master (e74ad90). Sadly, cherry-picking this PR on top didn't help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In light of #26895 (comment), I'll try this PR on top of slightly newer tip of master (224550d) a shot
Update: no luck here
Codecov Report
@@ Coverage Diff @@
## master #26349 +/- ##
===========================================
+ Coverage 76.0% 82.0% +5.9%
===========================================
Files 40 632 +592
Lines 2366 175202 +172836
Branches 340 0 -340
===========================================
+ Hits 1799 143702 +141903
- Misses 449 31500 +31051
+ Partials 118 0 -118 |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
bank_forks.clone(), | ||
&exit, | ||
abs_request_handler, | ||
false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is what prevented this from working; this false
corresponds to accounts_db_caching_enabled
arg, so cleanup wasn't happening as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took over this PR in #26914
Problem
Jeff's machine is OOMing
Summary of Changes
Try to do Jeff a solid