Performance problem with get_slots_since in high forking #24878

sakridge · 2022-05-02T06:50:30Z

Problem

When there are a lot of non-rooted slots. get_slots_since starts to take longer (100s of ms):

Proposed Solution

Seems like we might be doing some extra work here or could be more efficient in storing the delta. We store what we captured last and then start adding new slots to a new set which we can query for the next call.

yhchiang-sol · 2022-05-03T23:28:10Z

Thanks for raising the issue. I will take a deeper look.

I happen to have one node that enables RocksDB perf metrics that might provide some relevant information.

When get_slots_since getting slower, we spent more time on reading table files (get_from_output_files_nanos) and creating new iterators (new_table_block_iter_nanos):

compared to the normal situation where it spent more time getting data from the mem-table (get_from_memtable_nanos) -- this also means we are mostly reading recent data in a normal situation where data are still in the in-memory buffer.

Column-family-wise, index, slot-meta, and erasure-meta` column families seem contribute the most during same period of time.

yhchiang-sol · 2022-05-04T03:34:12Z

Looks like we are issuing multiple get operations sequentially in get_slots_since(). One possible way to improve it is to use RocksDB's batch-version MultiGet API, but RocksDB's C and rust API only has the unbatched version of MultiGet. Let me see if I can make this available in C and rust API.

yhchiang-sol · 2022-05-05T07:39:12Z

Created facebook/rocksdb#9952 that makes the batch-version MultiGet API available in RocksDB's C API.

yhchiang-sol · 2022-05-12T07:35:21Z

The PR for rust-rocksdb: rust-rocksdb/rust-rocksdb#633

steviez · 2022-09-19T19:30:11Z

@yhchiang-sol - While the optimization to use multi_get() definitely showed improvements, I'm not sure it is exactly what Stephen had in mind.

In ReplayStage::generate_new_bank_forks(), the list of slots to Blockstore::get_slots_since() come from the below call:

        let frozen_bank_slots: Vec<u64> = frozen_banks
            .keys()
            .cloned()
            .filter(|s| *s >= forks.root())
            .collect();

When the network/node is healthy, frozen banks are getting rooted and thus pruned from bank_forks at the same rate that new banks are getting frozen. So, the number of slots passed to the blockstore query is roughly constant.

When roots stop getting made, the pruning does not happen, and we begin to accumulate frozen banks in bank_forks. This has other repercussions (see #23061) as well, but in the context of this issue, it means we are are passing lots of repeat keys to Blockstore::get_slots_since().

So, the optimization of get_slots_since() is good, but we could potentially realize even larger savings by being smarter about avoiding repeated queries. That being said, after thinking about this for a minute, I think we would need a fairly drastic change to how we discover new children to avoid needing to pass all frozen banks to the query every execution.

For example, consider the following sequence where bankX refers to a bank at some slot X:
(a) bankX is not frozen, and as such, isn't yielded in the frozen_bank_slots list when generate_new_bank_forks() called
(b) bankX is completed and frozen
(c) On the next iteration of generate_new_bank_forks(), X is included in frozen_bank_slots and as such, all of X's children will have banks created
(d) Some future slot Z is created whose parent is X
(e) generate_new_bank_forks() is called again which calls get_slots_since()

The initial idea was that since we had seen X's children in (c), we could potentially exclude X from the get_slots_since() call in (e). However, with our current discovery mechanism, we'd have to query slot X again to learn about Z.

We actually encounter this kind of scenario that would break things around epoch boundaries, where we might have several forks in the new epoch (we've seen one per leader for the first several leaders) all building off the last slot in the previous epoch.

Haven't given much thought to alternatives, but also maybe not worth spending additional cycles on the moment? Maybe we just take the gains we get with multi_get()

yhchiang-sol · 2022-09-20T00:02:47Z

So, the optimization of get_slots_since() is good, but we could potentially realize even larger savings by being smarter about avoiding repeated queries.

Look like a good opportunity here! RocksDB's batched multi_get() also comes with an optional boolean indicating whether the input keys are already sorted. If so, then internally RocksDB will skip the sorting.

   virtual void MultiGet(const ReadOptions& options,
                         ColumnFamilyHandle* column_family,
                         const size_t num_keys, const Slice* keys,
                         PinnableSlice* values, Status* statuses,
                         const bool /*sorted_input*/ = false);

And, luckily, I did port this optional flag to rust-rocksdb, so we can use it without waiting for another release!

     /// Return the values associated with the given keys and the specified column family
     /// where internally the read requests are processed in batch if block-based table
     /// SST format is used.  It is a more optimized version of multi_get_cf.
     pub fn batched_multi_get_cf<K, I>(
         &self,
         cf: &impl AsColumnFamilyRef,
         keys: I,
         sorted_input: bool,
     ) -> Vec<Result<Option<DBPinnableSlice>, Error>>

If we also want to avoid repeated queries, then sorting before passing it to multi_get() makes a lot of sense. By doing this, we can solve two problems at once!

…27686) #### Problem The current implementation of get_slots_since() invokes multiple rocksdb::get(). As a result, each get() operation may end up requiring one disk read. This leads to poor performance of get_slots_since described in #24878. #### Summary of Changes This PR makes get_slots_since() use the batched version of multi_get() instead, which allows multiple get operations to be processed in batch so that they can be answered with fewer disk reads.

sakridge assigned yhchiang-sol May 2, 2022

This was referenced Jul 1, 2022

Update rust-rocksdb library to include batched_multi_get() #26333

Closed

(LedgerStore) LedgerColumn::multi_get() #26354

Merged

yhchiang-sol mentioned this issue Sep 9, 2022

Optimize get_slots_since() using the batched version of multi_get() #27686

Merged

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Sep 20, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance problem with get_slots_since in high forking #24878

Performance problem with get_slots_since in high forking #24878

sakridge commented May 2, 2022

yhchiang-sol commented May 3, 2022 •

edited

Loading

yhchiang-sol commented May 4, 2022

yhchiang-sol commented May 5, 2022

yhchiang-sol commented May 12, 2022

steviez commented Sep 19, 2022

yhchiang-sol commented Sep 20, 2022

Performance problem with get_slots_since in high forking #24878

Performance problem with get_slots_since in high forking #24878

Comments

sakridge commented May 2, 2022

Problem

Proposed Solution

yhchiang-sol commented May 3, 2022 • edited Loading

yhchiang-sol commented May 4, 2022

yhchiang-sol commented May 5, 2022

yhchiang-sol commented May 12, 2022

steviez commented Sep 19, 2022

yhchiang-sol commented Sep 20, 2022

yhchiang-sol commented May 3, 2022 •

edited

Loading