-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update lru for read only cache if it has been long enough since last access #32560
Conversation
Codecov Report
@@ Coverage Diff @@
## master #32560 +/- ##
=========================================
- Coverage 82.0% 81.9% -0.1%
=========================================
Files 784 785 +1
Lines 212542 211208 -1334
=========================================
- Hits 174343 173174 -1169
+ Misses 38199 38034 -165 |
f2b72c8
to
0d4d38d
Compare
@brooksprumo I added you during draft. I wanted to get your thoughts on the concept. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this help a validator on mnb? I could see it being beneficial for the very hot accounts that are read a lot. Won't there still be lock contention, just delayed? Maybe this is a win due to reduction in updates from the hot accounts?
Lastly, I imagine we may not update an account because it doesn't get re-read (e.g. read sooner than the skip amount, not updated, then evicted, then read just afterwards, when it would've originally been still in the cache). I'm hoping this is both infrequent and inconsequential.
My concern is that this is pretty heuristic and the risk with heuristic approach is that we don't know when it will perform poorly; e.g. when there is high load, or certain transactions pattern, or the set of active accounts are rotated, etc. That issue aside, shouldn't the update frequency be like every |
Pick a k. Say k=10.
For time based at 100ms:
This cache is only there to improve performance of loading common accounts. Behavior is correct whether the account is in the read accounts cache or not. Plenty of accounts are only read once and live briefly in the accounts cache (~50s). It seems fine to use a heuristic approach that improves performance for all hits and misses and that sufficiently approximates theoretical ideal eviction behavior when the possible penalty is decreased performance for loading accounts which are accessed many times close together and then not accessed for a long time (50s) and then accessed again. We get expected greatly increased performance for theoretically possible occasional worse performance. Of course, this is measured against mnb traffic today. Traffic could change. LRU is still a reasonable eviction scheme if you have to pick one, and this tweak to the LRU record keeping is still sufficiently ideal LRU. |
Oh, you are tracking the time per entry. diff --git a/runtime/src/read_only_accounts_cache.rs b/runtime/src/read_only_accounts_cache.rs
index 23a0af60f4..98334ad6ae 100644
--- a/runtime/src/read_only_accounts_cache.rs
+++ b/runtime/src/read_only_accounts_cache.rs
@@ -15,6 +15,7 @@ use {
},
};
+const LRU_UPDATE_CADENCE: u64 = 10;
const CACHE_ENTRY_SIZE: usize =
std::mem::size_of::<ReadOnlyAccountCacheEntry>() + 2 * std::mem::size_of::<ReadOnlyCacheKey>();
@@ -35,6 +36,7 @@ pub(crate) struct ReadOnlyAccountsCache {
// always sorted in the order that they have last been accessed. When doing
// LRU eviction, cache entries are evicted from the front of the queue.
queue: Mutex<IndexList<ReadOnlyCacheKey>>,
+ counter: AtomicU64,
max_data_size: usize,
data_size: AtomicUsize,
hits: AtomicU64,
@@ -48,6 +50,7 @@ impl ReadOnlyAccountsCache {
Self {
max_data_size,
cache: DashMap::default(),
+ counter: AtomicU64::default(),
queue: Mutex::<IndexList<ReadOnlyCacheKey>>::default(),
data_size: AtomicUsize::default(),
hits: AtomicU64::default(),
@@ -84,7 +87,7 @@ impl ReadOnlyAccountsCache {
// Move the entry to the end of the queue.
// self.queue is modified while holding a reference to the cache entry;
// so that another thread cannot write to the same key.
- {
+ if self.counter.fetch_add(1, Ordering::Relaxed) % LRU_UPDATE_CADENCE == 0 {
let mut queue = self.queue.lock().unwrap();
queue.remove(entry.index);
entry.index = queue.insert_last(key); |
Are you more comfortable with this 1/k sampling method? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Since caching is an optimization, and not required for correctness, this feels ok to me. We can tweak the parameters as well, as we test it out on mnb/rollout on testnet/on pop-net.
I offer some naming tweaks, but just suggestions that would help my brain. Not required.
Lastly, I would think the 1/k-sampling would heavily weigh the updates for the hottest accounts. Is that right? So we may entirely miss updating luke-warm/cold-ish accounts since a majority of the updates will happen for the vote program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
I don't really understand above. |
Here's how I thought about it. Details: 20,000 all accounts reads/s account a: Highly likely that we basically never update lru due to a read of account a |
With
A reasonable Assuming that
then that means that account |
c82d735
to
581cfc5
Compare
Note that hit percent seems to track as perfectly as we could hope for: |
This cuts out the 2 master machines. Top is #32721 (random eviction) |
Some data I don't trust yet.
|
These results indicate periodic sampling (k=64) performed better than 100ms per account update. But, it appears miss rate was higher with k=64, so overall load would be slower for misses. I'm trying again adding k=32 and sampling max of 400ms per account. We could probably bump 400ms to much higher (like 10s) and greatly reduce # samples we take and thus, contention. But, we can't bump k much higher because we increase misses I think. Still digging in. We are never deterministic. I guess a long ledger tool run could give us more reliable data. There will still be a lot of noise. |
We can eliminate several contenders:
|
Some observations:
There are still unclear effects of these 2 methodologies. The effects may be tied to the specific access patterns in effect at the moment on mnb. Both methods produce correct results. |
@behzadnouri A 1/k sampling rate oversamples the high frequency accounts and then evicts and re-adds enough accounts (and large accounts) that the overall replay performance is worse. read only cache load time per (hits + misses): replay stats, load_us: read only accounts cache hit percent: bytes of account data loaded as a result of misses (some are unavoidable): |
581cfc5
to
670ad44
Compare
670ad44
to
ab78a5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - the numbers for the performance improvements look good.
We can always tune this more. We also own the full stack, so we can pick heuristics/implementations that best match our workloads. I think this PR shows how a workload-aware impl can net some big wins. If/when the workload changes, we can pick the best impl for it.
The downside I see is that this is too fine tuned for the current mainnet load and, being a heuristic approach, it might not work out well if the load characteristics change. |
We have discussed and measured various implementations for quite some time. It sounds like we can agree that reducing how often we update the lru queue is a reasonable approach. |
Problem
Working on improving load time. Read cache spends a lot of time blocked on global mutex updating eviction lru queue.
Summary of Changes
update lru for read only cache if it has been long enough since last access. The idea is that entries in the read only accounts cache will fall into 2 camps: read frequently or infrequently. Frequently read items will be accessed again prior to eviction and lru will be updated (assume 50s delay from insertion to eviction). Infrequently read items will not be accessed again prior to eviction. By avoiding lru modification on every access, the common case of read cache hits is greatly improved. The only casualties will be accounts which are read twice within the time limit (100ms atm) and then evicted after 50s instead of 50.1s. The cache will still operate correctly, and the load will work correctly, we will just lose some performance for those special items who are infrequently loaded in this type of pattern. System performance and access patterns on accounts are not deterministic and the read cache will always have noise. The fastest code is code that doesn't run at all. By eliminating almost all calls to the lru update code, loads are on average much faster.
Fixes #