-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
solana-validator leaks memory (but at very slow pace) #14366
Comments
status update: I've run another experiment by hourly executing |
I'm getting clue, it seems that these are outstanding mem leaks: ~230MB (47M => 272M) EDIT: well, this could be a false alarm; the peak memory doesn't tell this as a definitive leak. maybe only validator session B is killed while processing actively? Validator session A (run for 10499sec against mainnet-beta):
Validator session B (run for 17822 secs against mainnet-beta):
|
another very outstanding PinnedVec memory use (I dunno this is legitimate; but this is the top 2 consumer of validator heap memory, comprising roughly 50%!): Validator session A (out of total memory leaked: 6.76GB):
Validator session B (out of total memory leaked: 8.96GB):
|
another supposedly steady memory increase. (not conclusively leaks): 50MB leak per 2 hours if true From validator session A:
From validator session B:
|
another hashbrown (HashMap) suspicious one: 30MB leak per 2 hours if true A:
B:
|
status update: I'm testing now against testnet |
I looked both code, but I couldn't find anything smelly quickly. From newbie me in this code, it looks like it's pruning correctly older ClusterSlots and VoteTracker entries. Hmm, HashMap needs periodic @carllin does something ring? If |
I think that might be right if we are not ok with the excess capacity. when I looked both Vec and HashMap I don't think size-down capacity for |
Now I got tds version of heaptrack; also longest run of Continuing on #14366 (comment): it seems that PinnedVec or its Recycler continues to leak... I haven't looked the code yet.
@sakridge could this also be a legimate leak aside from the Vec HashMap capacity leak? I don't think we need 5GB / 3GB heap data at any given moment to run validator... |
@behzadnouri this pr (#14467) could be a fix to one of the above suspicious memory leaks, especially like this? #14366 (comment)? Or completely different one? (I'm not seeing crds in my backtraces). Maybe, did you find #14467 via metrics? |
I doubt that that is the case. You mention:
but the issue with #14467 does not go away with restart. If you restart a node it quickly syncs up to the previous table it had in memory. Also as you also mentioned, those stack traces do not show relevant
yes, it is |
@behzadnouri I see! thanks for clear explanations. :) |
The recyclers just allocate to cover the transitional load and don't size down generally. Potentially we can size down with some heuristic if we can detect there is too much being held onto. |
status update!:
I see. Yeah, it seems that commenting-out recycling code seems to reduce leaks in general. I'm now thinking about the heuristics.. Also, I'm seeing odd RSS leakage via recycled AppendVec, which shouldn't count toward RSS, but toward SHR. Like the PinnedVec recycler I'm just disabled and watching to see the leakage are gone. Lastly, I'm now tracking down to |
Whoa, an odd thing is happening here. It seems that Given this patch, on tds: @@ -37,14 +37,18 @@ impl ClusterSlots {
for epoch_slots in epoch_slots_list {
let slots = epoch_slots.to_slots(root);
for slot in &slots {
- if *slot <= root {
+ if *slot <= root || *slot > root + 10000 { // HACK
continue;
}
let unduplicated_pubkey = self.keys.get_or_insert(&epoch_slots.from);
self.insert_node_id(*slot, unduplicated_pubkey);
}
}
- self.cluster_slots.write().unwrap().retain(|x, _| *x > root);
+ if let Ok(mut w) = self.cluster_slots.write() {
+ let old_len = w.len();
+ w.retain(|x, _| *x > root);
+ info!("ClusterSlots: root: {}, len: {} => {} cpacity: {}", root, old_len, w.len(), w.capacity());
+ }
self.keys.purge();
*self.since.write().unwrap() = since;
} Without HACK:
With HACK (I assumed validator won't ever need to hold info of far-future slots (+10000)... lol)
And indeed, @carllin Do you have any idea? Maybe, possibly compressed |
@ryoqun, hmmm weird, is the node caught up with the cluster? I can only imagine those far-future slots if:
I think we can distinguish between the above by seeing how many nodes are in the For context, when thinking about whether we can do a blanket filter like
|
Verdict? it seems that this is the case... Maybe, mainnet-beta and tds are mixed up?
|
This smells like someone who was invited to MB from TdS and copypasta'd their way to victory 😔 Shred version should be keeping the networks apart though |
81965-leaks.txt.gz Some more heaptrack profiles; which seem very confusing. Some of the reported leaks are just plain rust vectors and I do not see how they can be leaked. Page fault events: page_faults.svg.gz |
…on (#17899) Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Their epoch-slots are accumulated in ClusterSlots causing bogus slots very far from current root which are not purged and so cause ClusterSlots keep consuming more memory: #17789 #14366 (comment) #14366 (comment) This commit updates ClusterInfo::get_epoch_slots, and discards entries from nodes with unknown or different shred-version. Follow up commits will patch gossip not to waste bandwidth and memory over crds values of nodes with different shred-version. (cherry picked from commit 985280e) # Conflicts: # core/src/cluster_info.rs
…on (backport #17899) (#19551) * excludes epoch-slots from nodes with unknown or different shred version (#17899) Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Their epoch-slots are accumulated in ClusterSlots causing bogus slots very far from current root which are not purged and so cause ClusterSlots keep consuming more memory: #17789 #14366 (comment) #14366 (comment) This commit updates ClusterInfo::get_epoch_slots, and discards entries from nodes with unknown or different shred-version. Follow up commits will patch gossip not to waste bandwidth and memory over crds values of nodes with different shred-version. (cherry picked from commit 985280e) # Conflicts: # core/src/cluster_info.rs * removes backport merge conflicts Co-authored-by: behzad nouri <[email protected]>
Updating the issue with what was discussed on discord: v1.6 is using jemalloc, but #16346 removed jemalloc, so 1.7 is using system allocator. Above is
Noticeably from the 3 chunks, 1.7.11 with jemalloc is using least amount of memory. So if:
then it shows that removing jemalloc was the culprit for 1.7 memory regression.
#20149 reverted back the allocator to jemalloc. |
Hi @behzadnouri, sorry for bothering, but it looks like |
thanks, for letting me know. I will add that |
Without this feature jemalloc is used only for Rust code but not for bundled C/C++ libraries (like rocksdb). #14366 (comment)
Without this feature jemalloc is used only for Rust code but not for bundled C/C++ libraries (like rocksdb). #14366 (comment) (cherry picked from commit 4bf6d0c)
…20325) Without this feature jemalloc is used only for Rust code but not for bundled C/C++ libraries (like rocksdb). #14366 (comment) (cherry picked from commit 4bf6d0c) Co-authored-by: behzad nouri <[email protected]>
…s#20317) Without this feature jemalloc is used only for Rust code but not for bundled C/C++ libraries (like rocksdb). solana-labs#14366 (comment)
btw, I'm looking into this again. things to try:
|
This was with 1.8.5 after about 16 days of uptime. RTX 2080ti, 11 GB memory. |
I think i've found the last remaining leak cause.... in rocksdb: here's the patch for the bus factor
I'll detail later without fix: with fix: |
hi, i haven't actively investigating the possible memory leak bug, which i thought i came up with. seems it's false alarm... also, i was testing v1.8.x line. |
We do seemingly have another memory leak/growth (master & v1.14) that is in the early stages of investigation at the moment. That being said, I'm in favor of closing this issue due to its' age. Releases are different and code is so different that I think new investigation would be worthy of a new issue (currently in Discord). We could always reference this issue as a "prior work" in a new issue. |
Problem
solana-validator
(tested v1.4.19) definitely leaks memory needing periodic restart of once per a week or so.The pace seems stable across nodes at the rate of 1-2G/day.
Proposed Solution
Debug.
we don't know this existed on the v1.3 line as well.
But this leak is observed from both RPC and non-RPC nodes.
All, the leak happening on RssAnon. This excludes AppendVec (mmap) as it's accounted under RssFile
So, remaining culprits: gossip, blockstore, runtime, rocksdb, etc.
For runtime, blockstore, I think we can just run loong
ledger-tool verify
session.CC: @carllin
The text was updated successfully, but these errors were encountered: