2x Warehouse Nodes Stopped Replaying and Uploading Blocks But Remained Online #29901

buffalu · 2023-01-25T15:49:48Z

Problem

Our two warehouse nodes running v1.14.13-jito stopped processing slots last night. They were still online, but weren't able to progress. One stopped around 3AM CT and another stopped around 7:45AM CT.

They also stopped uploading to bigtable

Identities:

5oXsXDXM8kmcK3KjSpurxB8DsJUqWCrSiD2kcLWEniSy
ATb9AnWw12SsUGtG4LadUhHL276Tf51u1XTmt8kFY3iG

Proposed Solution

Determine why they stopped freezing slots and fix

buffalu · 2023-01-25T15:49:53Z

buffalu · 2023-01-25T15:56:08Z

bigtable slot upload chart

steviez · 2023-01-25T22:09:40Z

I believe bigtable will only upload rooted slots. It is possible that banks were getting frozen but not rooted (ie a consensus divergence); can you check your logs or metrics to see if your nodes were freezing banks?

buffalu · 2023-01-26T05:05:44Z

it happened again on 5oXsXDXM8kmcK3KjSpurxB8DsJUqWCrSiD2kcLWEniSy

logs: https://drive.google.com/file/d/1WVpB8pjVoUHg7WWlWdtd4uggE8SmE_fv/view?usp=share_link

bigtable upload (times are central time, jan 25 2023)
22:39:24 was last block uploaded

buffalu · 2023-01-26T05:08:01Z

last frozen bank:

Jan 26 04:39:35 ny-mainnet-warehouse-1 solana-validator[2643753]: [2023-01-26T04:39:35.044682103Z INFO  solana_runtime::bank] bank frozen: 174429976 hash: 2Fn27poQeRZUqbuyQgFPvzqjkReuYH2SDnAkupofpFuC accounts_delta: FUg7UnKSqy4K4W4TBCkHx8y6pPfamFWNU2LfTfdu3NWy signature_count: 2524 last_blockhash: 2avz5dUychyLytLUCpcpDHpYP16fMC6JEeHHYadRuwmw capitalization: 539312136320050691

buffalu · 2023-01-26T05:15:23Z

looks like a total deadlock? this is replay-loop-timing-stats which seems like it should be emitted every iteration with 1s rate limit

steviez · 2023-01-26T07:31:50Z

Taking a quick look at your log, I would agree that deadlock seems like a possibility. I think getting some stracktraces would be helpful to figure out where things are stuck. General process:

Rebuild binary with debug symbols

RUSTFLAGS='-g -C force-frame-pointers=yes' cargo build --release --bin solana-validator

Restart process and wait for hang again
Attach gdb and dump backtrace for all threads (the (gdb) lines are run within gdb)

$ gdb -p <PID>
(gdb) set logging file ~/gdb.log
(gdb) set logging on
(gdb) thread apply all bt full
(gdb) quit
$ cat ~/gdb.log

With the log, you should be able to inspect the backtraces and see which threads are stuck.

segfaultdoc · 2023-01-29T15:06:35Z

Taking a quick look at your log, I would agree that deadlock seems like a possibility. I think getting some stracktraces would be helpful to figure out where things are stuck. General process:

Rebuild binary with debug symbols
RUSTFLAGS='-g -C force-frame-pointers=yes' cargo build --release --bin solana-validator
Restart process and wait

Wait for hang again

Attach gdb and dump backtrace for all threads (the (gdb) lines are run within gdb)
$ gdb -p <PID>
(gdb) set logging file ~/gdb.log
(gdb) set logging on
(gdb) thread apply all bt full
(gdb) quit
$ cat ~/gdb.log
With the log, you should be able to inspect the backtraces and see which threads are stuck.

Trying to get gdb installed on a package manager-less system is a much more daunting task than I would have imagined. In the meantime I downgraded one of the nodes to 1.13.6 and seeing the same symptoms of no bigtable uploads

kunal365roy · 2023-01-31T07:04:12Z

We've seen issues similar to this and have founding removing the accounts, ledger and snapshots directory and restarting while fetching the snapshot will reestablish bt connection + upload reliably. I was hoping this patch in 1.13.6 would stop the bt upload sync #28728

steviez · 2023-01-31T07:17:49Z

We've seen issues similar to this

Did your node completely lock up (ie stop replaying slot) or did your node just stop uploading to bigtable while continuing to remain with the tip of the cluster? The issue you linked sounds like a session issue (ie the latter) whereas this issue is exploring a fundamental issue where the validator client grinds to a halt across the board.

removing the accounts, ledger and snapshots directory and restarting while fetching the snapshot will reestablish bt connection + upload reliably

This is a bandaid, and should really be a last resort. Did you ever try stopping/restarting your nodes along the way without wiping everything? If you're still encountering this, do you currently have any open issues and/or PR's ?

Additionally, if you're uploading to bigtable, suppose:

Your nodes last uploads at slot X
You then wipe your local state and restart
Upon restart, your node fetches a snapshot at slot Y

You now have a hole in your history from (X, Y] ...

segfaultdoc · 2023-02-03T16:12:58Z

Here's a gdb stacktrace of our locked up node
[
gdb.log
](url)

kunal365roy · 2023-02-03T16:17:46Z

Did your node completely lock up (ie stop replaying slot) or did your node just stop uploading to bigtable while continuing to remain with the tip of the cluster? The issue you linked sounds like a session issue (ie the latter) whereas this issue is exploring a fundamental issue where the validator client grinds to a halt across the board.

Ah I read too fast apologize, we also have this as a separate issue. I'm referring to the one where the validator continues to sink but loses the bt session, fixed by the latest release.

My validators have been very inconsistently syncing as of late

Additionally, if you're uploading to bigtable, suppose:
Your nodes last uploads at slot X
You then wipe your local state and restart
Upon restart, your node fetches a snapshot at slot Y
You now have a hole in your history from (X, Y] ...

Yeah we have an alerter and backfill async job that fills that with the solana ledger tool

segfaultdoc · 2023-02-03T17:18:27Z

Adding to this thread that 1.13.5 ran into the issue, bumping down to 1.13.4

steviez · 2023-02-06T03:07:06Z

Here's a gdb stacktrace of our locked up node [ gdb.log ](url)

Hi @segfaultdoc - the log you posted appears to be truncated a bit, I don't see all threads; any chance you saved the output and have the full log available?

segfaultdoc · 2023-02-07T18:33:45Z

Here it is:
gdb.log

steviez · 2023-02-07T19:35:38Z

Thanks for the updated gdb output. It looks like some of the threads are still missing, but there is output for more threads from this second output which help shed some light. It looks like some threads are stuck on this lock:

solana/ledger/src/blockstore.rs

Line 181 in d53c49c

pub lowest_cleanup_slot: RwLock<Slot>,

I see solana-ledger-p (solana-ledger-purge) thread waiting on this lock:

solana/core/src/ledger_cleanup_service.rs

Line 240 in 46fde27

*blockstore.lowest_cleanup_slot.write().unwrap() = lowest_cleanup_slot;

Thread 967 (Thread 0x7ea9d1d1a640 (LWP 1010644) "solana-ledger-p"):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
No locals.
#1  0x000055a9efa6a825 in std::sys::unix::futex::futex_wait () at library/std/src/sys/unix/futex.rs:62
No locals.
#2  0x000055a9edb68602 in std::sys::unix::locks::futex_rwlock::RwLock::write_contended () at library/std/src/sys/unix/locks/futex_rwlock.rs:220
No locals.
#3  0x000055a9ee0e15e3 in std::sys::unix::locks::futex_rwlock::RwLock::write (self=0x7f02562e5138) at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys/unix/locks/futex_rwlock.rs:156
No locals.
#4  std::sys_common::rwlock::MovableRwLock::write (self=0x7f02562e5138) at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/rwlock.rs:102
No locals.
#5  std::sync::rwlock::RwLock<u64>::write<u64> (self=0x7f02562e5138) at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sync/rwlock.rs:298
No locals.
#6  solana_core::ledger_cleanup_service::{impl#0}::cleanup_ledger::{closure#0} () at core/src/ledger_cleanup_service.rs:240

Several bigtable-upload threads are also waiting on the same lock (through a series of function calls):

solana/ledger/src/blockstore.rs

Line 2576 in 46fde27

let lowest_cleanup_slot = self.lowest_cleanup_slot.read().unwrap();

Thread 783 (Thread 0x7eac2d4d0640 (LWP 1010407) "bigtable-upload"):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
No locals.
#1  0x000055a9efa6a825 in std::sys::unix::futex::futex_wait () at library/std/src/sys/unix/futex.rs:62
No locals.
#2  0x000055a9edb684cc in std::sys::unix::locks::futex_rwlock::RwLock::read_contended () at library/std/src/sys/unix/locks/futex_rwlock.rs:139
No locals.
#3  0x000055a9ee85750f in std::sys::unix::locks::futex_rwlock::RwLock::read (self=0x7f02562e5138) at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys/unix/locks/futex_rwlock.rs:88
        state = <optimized out>
        state = <optimized out>
#4  std::sys_common::rwlock::MovableRwLock::read (self=0x7f02562e5138) at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sys_common/rwlock.rs:86
No locals.
#5  std::sync::rwlock::RwLock<u64>::read<u64> (self=0x7f02562e5138) at /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/std/src/sync/rwlock.rs:210
No locals.
#6  solana_ledger::blockstore::Blockstore::check_lowest_cleanup_slot (self=0x7f02562e5010, slot=175822617) at ledger/src/blockstore.rs:2576
No locals.
#7  0x000055a9ee85d3ca in solana_ledger::blockstore::Blockstore::get_completed_ranges (self=0x89, slot=3221225478, start_index=139648018373181) at ledger/src/blockstore.rs:3193
No locals.
#8  0x000055a9ee85ce73 in solana_ledger::blockstore::Blockstore::get_slot_entries_with_shred_info (self=0x89, slot=175822617, start_index=0, allow_dead_slots=false) at ledger/src/blockstore.rs:3150
No locals.
#9  0x000055a9ee855db0 in solana_ledger::blockstore::Blockstore::get_slot_entries (self=0x7f02562e5010, shred_start_index=0, slot=<optimized out>) at ledger/src/blockstore.rs:3138
No locals.

We do .unwrap() when we attempt to grab the locks; these callsites didn't panic so the lock seemingly is NOT poisoned. Also, I didn't see any mention of panics from y'all, but might be a worthwhile sanity check real quick.

The full output could be helpful in narrowing down the search space, altho there should only be a handful of threads that grab a writer on that lock so we might be able to proceed without it as well.

steviez · 2023-02-08T01:47:54Z

Got back to this, looks like we have a call-path that will attempt to obtain the a read-lock a second time while it is already held. The callpath is:

+ Blockstore::get_rooted_block()
+!!! Blockstore::check_lowest_cleanup_slot() (line 1926) to obtain read lock
+--+ Blockstore:: get_complete_block() (line 1929)
   +--+ Blockstore::get_slot_entries() (line 1948)
      +--+ Blockstore::get_slot_entries_with_shred_info() (line 2778)
         +--+ Blockstore::get_completed_ranges() (line 2790)
            +!!! Blockstore::check_lowest_cleanup_slot() (line 2856) to obtain read lock

solana/ledger/src/blockstore.rs

Lines 1920 to 1932 in bfc29a6

    
           pub fn get_rooted_block( 
        
               &self, 
        
               slot: Slot, 
        
               require_previous_blockhash: bool, 
        
           ) -> Result<VersionedConfirmedBlock> { 
        
               datapoint_info!("blockstore-rpc-api", ("method", "get_rooted_block", String)); 
        
               let _lock = self.check_lowest_cleanup_slot(slot)?; 
        
               if self.is_root(slot) { 
        
                   return self.get_complete_block(slot, require_previous_blockhash); 
        
               } 
        
               Err(BlockstoreError::SlotNotRooted) 
        
           }

solana/ledger/src/blockstore.rs

Lines 1934 to 2017 in bfc29a6

    
               pub fn get_complete_block( 
        
                   &self, 
        
                   slot: Slot, 
        
                   require_previous_blockhash: bool, 
        
               ) -> Result<VersionedConfirmedBlock> { 
        
                   let slot_meta_cf = self.db.column::<cf::SlotMeta>(); 
        
                   let slot_meta = match slot_meta_cf.get(slot)? { 
        
                       Some(slot_meta) => slot_meta, 
        
                       None => { 
        
                           info!("SlotMeta not found for slot {}", slot); 
        
                           return Err(BlockstoreError::SlotUnavailable); 
        
                       } 
        
                   }; 
        
                   if slot_meta.is_full() { 
        
                       let slot_entries = self.get_slot_entries(slot, 0)?; 
        
                       if !slot_entries.is_empty() { 
        
                           let blockhash = slot_entries 
        
                               .last() 
        
                               .map(|entry| entry.hash) 
        
                               .unwrap_or_else(|| panic!("Rooted slot {slot:?} must have blockhash")); 
        
                           let slot_transaction_iterator = slot_entries 
        
                               .into_iter() 
        
                               .flat_map(|entry| entry.transactions) 
        
                               .map(|transaction| { 
        
                                   if let Err(err) = transaction.sanitize( 
        
                                       // Don't enable additional sanitization checks until 
        
                                       // all clusters have activated the static program id 
        
                                       // feature gate so that bigtable upload isn't affected 
        
                                       false, // require_static_program_ids 
        
                                   ) { 
        
                                       warn!( 
        
                                           "Blockstore::get_block sanitize failed: {:?}, \ 
        
                                           slot: {:?}, \ 
        
                                           {:?}", 
        
                                           err, slot, transaction, 
        
                                       ); 
        
                                   } 
        
                                   transaction 
        
                               }); 
        
                           let parent_slot_entries = slot_meta 
        
                               .parent_slot 
        
                               .and_then(|parent_slot| { 
        
                                   self.get_slot_entries(parent_slot, /*shred_start_index:*/ 0) 
        
                                       .ok() 
        
                               }) 
        
                               .unwrap_or_default(); 
        
                           if parent_slot_entries.is_empty() && require_previous_blockhash { 
        
                               return Err(BlockstoreError::ParentEntriesUnavailable); 
        
                           } 
        
                           let previous_blockhash = if !parent_slot_entries.is_empty() { 
        
                               get_last_hash(parent_slot_entries.iter()).unwrap() 
        
                           } else { 
        
                               Hash::default() 
        
                           }; 
        
                           let rewards = self 
        
                               .rewards_cf 
        
                               .get_protobuf_or_bincode::<StoredExtendedRewards>(slot)? 
        
                               .unwrap_or_default() 
        
                               .into(); 
        
                           // The Blocktime and BlockHeight column families are updated asynchronously; they 
        
                           // may not be written by the time the complete slot entries are available. In this 
        
                           // case, these fields will be `None`. 
        
                           let block_time = self.blocktime_cf.get(slot)?; 
        
                           let block_height = self.block_height_cf.get(slot)?; 
        
                           let block = VersionedConfirmedBlock { 
        
                               previous_blockhash: previous_blockhash.to_string(), 
        
                               blockhash: blockhash.to_string(), 
        
                               // If the slot is full it should have parent_slot populated 
        
                               // from shreds received. 
        
                               parent_slot: slot_meta.parent_slot.unwrap(), 
        
                               transactions: self 
        
                                   .map_transactions_to_statuses(slot, slot_transaction_iterator)?, 
        
                               rewards, 
        
                               block_time, 
        
                               block_height, 
        
                           }; 
        
                           return Ok(block); 
        
                       } 
        
                   } 
        
                   Err(BlockstoreError::SlotUnavailable) 
        
               }

solana/ledger/src/blockstore.rs

Lines 2777 to 2780 in bfc29a6

    
               pub fn get_slot_entries(&self, slot: Slot, shred_start_index: u64) -> Result<Vec<Entry>> { 
        
                   self.get_slot_entries_with_shred_info(slot, shred_start_index, false) 
        
                       .map(|x| x.0) 
        
               }

solana/ledger/src/blockstore.rs

Lines 2784 to 2832 in bfc29a6

    
               pub fn get_slot_entries_with_shred_info( 
        
                   &self, 
        
                   slot: Slot, 
        
                   start_index: u64, 
        
                   allow_dead_slots: bool, 
        
               ) -> Result<(Vec<Entry>, u64, bool)> { 
        
                   let (completed_ranges, slot_meta) = self.get_completed_ranges(slot, start_index)?; 
        
                   // Check if the slot is dead *after* fetching completed ranges to avoid a race 
        
                   // where a slot is marked dead by another thread before the completed range query finishes. 
        
                   // This should be sufficient because full slots will never be marked dead from another thread, 
        
                   // this can only happen during entry processing during replay stage. 
        
                   if self.is_dead(slot) && !allow_dead_slots { 
        
                       return Err(BlockstoreError::DeadSlot); 
        
                   } else if completed_ranges.is_empty() { 
        
                       return Ok((vec![], 0, false)); 
        
                   } 
        
                   let slot_meta = slot_meta.unwrap(); 
        
                   let num_shreds = completed_ranges 
        
                       .last() 
        
                       .map(|(_, end_index)| u64::from(*end_index) - start_index + 1) 
        
                       .unwrap_or(0); 
        
                   let entries: Result<Vec<Vec<Entry>>> = if completed_ranges.len() <= 1 { 
        
                       completed_ranges 
        
                           .into_iter() 
        
                           .map(|(start_index, end_index)| { 
        
                               self.get_entries_in_data_block(slot, start_index, end_index, Some(&slot_meta)) 
        
                           }) 
        
                           .collect() 
        
                   } else { 
        
                       PAR_THREAD_POOL.install(|| { 
        
                           completed_ranges 
        
                               .into_par_iter() 
        
                               .map(|(start_index, end_index)| { 
        
                                   self.get_entries_in_data_block( 
        
                                       slot, 
        
                                       start_index, 
        
                                       end_index, 
        
                                       Some(&slot_meta), 
        
                                   ) 
        
                               }) 
        
                               .collect() 
        
                       }) 
        
                   }; 
        
                   let entries: Vec<Entry> = entries?.into_iter().flatten().collect(); 
        
                   Ok((entries, num_shreds, slot_meta.is_full())) 
        
               }

solana/ledger/src/blockstore.rs

Lines 2860 to 2882 in bfc29a6

    
           fn get_completed_ranges( 
        
               &self, 
        
               slot: Slot, 
        
               start_index: u64, 
        
           ) -> Result<(CompletedRanges, Option<SlotMeta>)> { 
        
               let _lock = self.check_lowest_cleanup_slot(slot)?; 
        
               let slot_meta_cf = self.db.column::<cf::SlotMeta>(); 
        
               let slot_meta = slot_meta_cf.get(slot)?; 
        
               if slot_meta.is_none() { 
        
                   return Ok((vec![], slot_meta)); 
        
               } 
        
               let slot_meta = slot_meta.unwrap(); 
        
               // Find all the ranges for the completed data blocks 
        
               let completed_ranges = Self::get_completed_data_ranges( 
        
                   start_index as u32, 
        
                   &slot_meta.completed_data_indexes, 
        
                   slot_meta.consumed as u32, 
        
               ); 
        
               Ok((completed_ranges, Some(slot_meta))) 
        
           }

So, if LedgerCleanupService attempts to purge between the start and end of above callpath, we could enter a deadlock as the writer will block the second reader. I'll look at re-arranging stuff to avoid this and throw a PR up

steviez · 2023-02-09T00:30:59Z

@buffalu @segfaultdoc - The proposed fix is a little scary given that we're playing with locks, so we just want to be extra sure that there isn't anything else at play.

Could you please confirm whether this warehouse node is doing anything else (ie serving RPC requests)? Additionally, can you comment on CPU utilization before/during the observed hang?

buffalu added the community Community contribution label Jan 25, 2023

buffalu changed the title ~~2x Warehouse Nodes Stopped Replaying and Uploading Blocks~~ 2x Warehouse Nodes Stopped Replaying and Uploading Blocks But Remained Online Jan 25, 2023

steviez self-assigned this Feb 1, 2023

steviez mentioned this issue Feb 8, 2023

Remove recursive read lock that could deadlock Blockstore #30203

Merged

steviez closed this as completed in #30203 Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2x Warehouse Nodes Stopped Replaying and Uploading Blocks But Remained Online #29901

2x Warehouse Nodes Stopped Replaying and Uploading Blocks But Remained Online #29901

buffalu commented Jan 25, 2023 •

edited

Loading

buffalu commented Jan 25, 2023

buffalu commented Jan 25, 2023

steviez commented Jan 25, 2023

buffalu commented Jan 26, 2023 •

edited

Loading

buffalu commented Jan 26, 2023

buffalu commented Jan 26, 2023

steviez commented Jan 26, 2023 •

edited

Loading

segfaultdoc commented Jan 29, 2023

kunal365roy commented Jan 31, 2023

steviez commented Jan 31, 2023

segfaultdoc commented Feb 3, 2023

kunal365roy commented Feb 3, 2023

segfaultdoc commented Feb 3, 2023

steviez commented Feb 6, 2023

segfaultdoc commented Feb 7, 2023

steviez commented Feb 7, 2023

steviez commented Feb 8, 2023 •

edited

Loading

steviez commented Feb 9, 2023

2x Warehouse Nodes Stopped Replaying and Uploading Blocks But Remained Online #29901

2x Warehouse Nodes Stopped Replaying and Uploading Blocks But Remained Online #29901

Comments

buffalu commented Jan 25, 2023 • edited Loading

Problem

Proposed Solution

buffalu commented Jan 25, 2023

buffalu commented Jan 25, 2023

steviez commented Jan 25, 2023

buffalu commented Jan 26, 2023 • edited Loading

buffalu commented Jan 26, 2023

buffalu commented Jan 26, 2023

steviez commented Jan 26, 2023 • edited Loading

segfaultdoc commented Jan 29, 2023

kunal365roy commented Jan 31, 2023

steviez commented Jan 31, 2023

segfaultdoc commented Feb 3, 2023

kunal365roy commented Feb 3, 2023

segfaultdoc commented Feb 3, 2023

steviez commented Feb 6, 2023

segfaultdoc commented Feb 7, 2023

steviez commented Feb 7, 2023

steviez commented Feb 8, 2023 • edited Loading

steviez commented Feb 9, 2023

buffalu commented Jan 25, 2023 •

edited

Loading

buffalu commented Jan 26, 2023 •

edited

Loading

steviez commented Jan 26, 2023 •

edited

Loading

steviez commented Feb 8, 2023 •

edited

Loading