Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce ~2 GBs mem by avoiding another overalloc. #14806

Merged
merged 2 commits into from
Jan 25, 2021

Conversation

ryoqun
Copy link
Member

@ryoqun ryoqun commented Jan 23, 2021

Problem

Currently, ClusterInfoVoteListener over-allocates (unrecycled) Packets by factor of 128. Ultimately, this results in approximately ~2G consistent unneeded memory use both on testnet and mainnet-beta, which is not easily swappable because of quick create and destroy.

Details

Firstly, about the the overallocation:

ClusterInfoVoteListener calls to_packets_chunked and in turn it calls Packets::Default(), which surprisingly creates PinnedVec with capacity of 128. (I have a draft cleaning pr for this)

let msgs = packet::to_packets_chunked(&votes, 1);

pub fn to_packets_chunked<T: Serialize>(xs: &[T], chunks: usize) -> Vec<Packets> {
let mut out = vec![];
for x in xs.chunks(chunks) {
let mut p = Packets::default();

fn default() -> Packets {
let packets = PinnedVec::with_capacity(NUM_RCVMMSGS);

pub const NUM_RCVMMSGS: usize = 128;

Then, Packet's memory size is a 1304:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=adfaa40101d596d9e372f96b20f1a9df

And finally, VerifiedVotePakcets retains those Packets keyed by CrdsValueLabel::Vote(VoteIndex, Pubkey):

pub struct VerifiedVotePackets(HashMap<CrdsValueLabel, (u64, Packets)>);

self.0.insert(label, (*last_update_version, packet));
}
while let Ok(vote_packets) = vote_packets_receiver.try_recv() {
for (label, packet) in vote_packets {
self.0.insert(label, (*last_update_version, packet));

Also, combined with the fact that there are 300~400 validators both on testnet and mainnet-beta, and x32 votes per validator, we allocate 2G mem, only for 15M worth data (x 128):

[4] pry(main)> 1304 * 32 * 400 * 128 / (1024 * 1024)
=> 2037
[5] pry(main)> 1304 * 32 * 400 / (1024 * 1024)
=> 15

And this is found by anallyzing the following heaptrack info:

1.99GB leaked over 259982 calls from
  alloc::alloc::alloc::h42bff26f59c33bd6
    at /home/ryoqun/.rustup/toolchains/nightly-2020-12-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:86
    in /home/sol/.local/share/solana/install/releases/1.4.23/solana-release/bin/solana-validator
  alloc::alloc::Global::alloc_impl::h6e8f952ef784c80f
    at /home/ryoqun/.rustup/toolchains/nightly-2020-12-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:166
  _$LT$alloc..alloc..Global$u20$as$u20$core..alloc..Allocator$GT$::allocate::hcd5aa98f8fdb82d8
    at /home/ryoqun/.rustup/toolchains/nightly-2020-12-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:226
  alloc::raw_vec::RawVec$LT$T$C$A$GT$::allocate_in::h35baa41fee755df6
    at /home/ryoqun/.rustup/toolchains/nightly-2020-12-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:188
  alloc::raw_vec::RawVec$LT$T$C$A$GT$::with_capacity_in::hb6938eb2e8af4afe
    at /home/ryoqun/.rustup/toolchains/nightly-2020-12-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:129
  alloc::vec::Vec$LT$T$C$A$GT$::with_capacity_in::h6fb314e1783cedc5
    at /home/ryoqun/.rustup/toolchains/nightly-2020-12-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec.rs:498
  alloc::vec::Vec$LT$T$GT$::with_capacity::hc736c0b130ffbed8
    at /home/ryoqun/.rustup/toolchains/nightly-2020-12-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec.rs:364
  solana_perf::cuda_runtime::PinnedVec$LT$T$GT$::with_capacity::h0e9b629a9ac237cd
    at perf/src/cuda_runtime.rs:235
  _$LT$solana_perf..packet..Packets$u20$as$u20$core..default..Default$GT$::default::h713f4e6119bb937f
    at perf/src/packet.rs:22
  solana_perf::packet::to_packets_chunked::hfc385ec0d9e6a87f
    at /home/ryoqun/work/solana/solana/perf/src/packet.rs:64
    in /home/sol/.local/share/solana/install/releases/1.4.23/solana-release/bin/solana-validator
  solana_core::cluster_info_vote_listener::ClusterInfoVoteListener::verify_votes::hb7280041ea55ff2d
    at core/src/cluster_info_vote_listener.rs:344
    in /home/sol/.local/share/solana/install/releases/1.4.23/solana-release/bin/solana-validator
  solana_core::cluster_info_vote_listener::ClusterInfoVoteListener::recv_loop::h89d6a22bbe2764cc
    at core/src/cluster_info_vote_listener.rs:331
  solana_core::cluster_info_vote_listener::ClusterInfoVoteListener::new::_$u7b$$u7b$closure$u7d$$u7d$::h29f58de0cb36359c
    at core/src/cluster_info_vote_listener.rs:262
    in /home/sol/.local/share/solana/install/releases/1.4.23/solana-release/bin/solana-validator
  std::sys_common::backtrace::__rust_begin_short_backtrace::h9b49012839beba4e
    at /home/ryoqun/.rustup/toolchains/nightly-2020-12-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125
  std::thread::Builder::spawn_unchecked::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h3041e870e79d8d0a
    at /home/ryoqun/.rustup/toolchains/nightly-2020-12-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:474
    in /home/sol/.local/share/solana/install/releases/1.4.23/solana-release/bin/solana-validator
  _$LT$std..panic..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::h3acac363628da548
    at /home/ryoqun/.rustup/toolchains/nightly-2020-12-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:322
  std::panicking::try::do_call::h5442c435157cc750
    at /home/ryoqun/.rustup/toolchains/nightly-2020-12-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379
  std::panicking::try::h0687aaf903e83c62
    at /home/ryoqun/.rustup/toolchains/nightly-2020-12-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343
  std::panic::catch_unwind::h9eed9da9fa29a1cc
    at /home/ryoqun/.rustup/toolchains/nightly-2020-12-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:396
  std::thread::Builder::spawn_unchecked::_$u7b$$u7b$closure$u7d$$u7d$::hd72846f01979a93e
    at /home/ryoqun/.rustup/toolchains/nightly-2020-12-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:473
  core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h6aedf8cd756546ae
    at /home/ryoqun/.rustup/toolchains/nightly-2020-12-13-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227
  _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::hea1090dbdcecbf5a
    at /rustc/7efc097c4fe6e97f54a44cee91c56189e9ddb41c/library/alloc/src/boxed.rs:1328
    in /home/sol/.local/share/solana/install/releases/1.4.23/solana-release/bin/solana-validator
  _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnOnce$LT$Args$GT$$GT$::call_once::h8d5723d3912bd325
    at /rustc/7efc097c4fe6e97f54a44cee91c56189e9ddb41c/library/alloc/src/boxed.rs:1328
  std::sys::unix::thread::Thread::new::thread_start::hc17a425ca2995724
    at library/std/src/sys/unix/thread.rs:71
  start_thread
    in /lib/x86_64-linux-gnu/libpthread.so.0
  __clone
    in /lib/x86_64-linux-gnu/libc.so.6

Changes

Just don't overallocate. In this case, Packets with capacity of 1 seems to be correct.

Also, as this was such an instance, I think Packets::default should be changed. I'll do that as a follow-up PR; I first want to land this pretty low-risk memory saving pr to ship it to both v1.4 & v1.5.

Found via: #14366
related to in that this is overallocation bug: #14435
started to overallocate maybe at?: #9694

@codecov
Copy link

codecov bot commented Jan 23, 2021

Codecov Report

Merging #14806 (fb8ed8b) into master (1d87091) will increase coverage by 0.0%.
The diff coverage is 88.9%.

@@           Coverage Diff            @@
##           master   #14806    +/-   ##
========================================
  Coverage    80.2%    80.3%            
========================================
  Files         403      403            
  Lines      102221   102476   +255     
========================================
+ Hits        82072    82300   +228     
- Misses      20149    20176    +27     

@ryoqun ryoqun changed the title Reduce few GBs mem by avoiding another overalloc. Reduce ~2 GBs mem by avoiding another overalloc. Jan 24, 2021
@ryoqun ryoqun marked this pull request as ready for review January 24, 2021 10:29
@ryoqun ryoqun requested a review from carllin January 24, 2021 10:31
@ryoqun
Copy link
Member Author

ryoqun commented Jan 24, 2021

@sakridge, @carllin could you review this?

@ryoqun ryoqun requested a review from sakridge January 24, 2021 10:36
@ryoqun
Copy link
Member Author

ryoqun commented Jan 24, 2021

I finally filled this pretty un-documented draft pr and promoted out of being draft. :)

@@ -61,7 +65,7 @@ impl Packets {
pub fn to_packets_chunked<T: Serialize>(xs: &[T], chunks: usize) -> Vec<Packets> {
let mut out = vec![];
for x in xs.chunks(chunks) {
let mut p = Packets::default();
let mut p = Packets::with_capacity(chunks);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be Packets::with_capacity(x.len())?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, if the xs doesn't divide by chunks. The last element can still be overallocated. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, probably not a big deal either way, but might as well size to fit :)

@@ -61,7 +65,7 @@ impl Packets {
pub fn to_packets_chunked<T: Serialize>(xs: &[T], chunks: usize) -> Vec<Packets> {
let mut out = vec![];
for x in xs.chunks(chunks) {
let mut p = Packets::default();
let mut p = Packets::with_capacity(chunks);
p.packets.resize(x.len(), Packet::default());
Copy link
Contributor

@carllin carllin Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah does the resize call here not remove the excess bloat (based on the Vec docs, looks like it might only truncate length, not the allocated capacity)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(based on the Vec docs, looks like it might only truncate length, not the allocated capacity)

That's correct. You need to call shrink_to_fit if you wanted that behavior. But, better off avoiding the overallocation to begin with. :)

@ryoqun ryoqun added the automerge Merge this Pull Request automatically once CI passes label Jan 25, 2021
@mergify
Copy link
Contributor

mergify bot commented Jan 25, 2021

automerge label removed due to a CI failure

@mergify mergify bot removed the automerge Merge this Pull Request automatically once CI passes label Jan 25, 2021
@ryoqun ryoqun added the automerge Merge this Pull Request automatically once CI passes label Jan 25, 2021
@mergify mergify bot merged commit 015058e into solana-labs:master Jan 25, 2021
mergify bot pushed a commit that referenced this pull request Jan 25, 2021
* Reduce few GBs mem by avoiding another overalloc.

* Use x.len() for the last item from chunks()

(cherry picked from commit 015058e)
mergify bot pushed a commit that referenced this pull request Jan 25, 2021
* Reduce few GBs mem by avoiding another overalloc.

* Use x.len() for the last item from chunks()

(cherry picked from commit 015058e)
mergify bot added a commit that referenced this pull request Jan 25, 2021
* Reduce few GBs mem by avoiding another overalloc.

* Use x.len() for the last item from chunks()

(cherry picked from commit 015058e)

Co-authored-by: Ryo Onodera <[email protected]>
mergify bot added a commit that referenced this pull request Jan 25, 2021
* Reduce few GBs mem by avoiding another overalloc.

* Use x.len() for the last item from chunks()

(cherry picked from commit 015058e)

Co-authored-by: Ryo Onodera <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Merge this Pull Request automatically once CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants