Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile Nargo's high memory usage during compilation and determine optimization opportunities #6550

Open
TomAFrench opened this issue Nov 19, 2024 · 2 comments
Assignees

Comments

@TomAFrench
Copy link
Member

We're currently getting very high memory usage when compiling the protocol circuits in aztec-packages so we should put in some work in order to determine where this memory is going.

There's a couple of potential tools which can help with this:

I don't know which option is best here so some exploration on how the various tools work and which is best would be very useful

Ideal outcomes:

  • Internal docs for how to install the chosen tool/tools.
  • Internal docs or scripts for usage of the chosen tool/tools.
  • Shortlist of locations where we can improve memory usage.
  • Memory reports which we can track in CI (nice to have)
@guipublic
Copy link
Contributor

I have investigated Heaptrack and Tracy:

Heaptrack

Heaptrack is straightforward to use.

You can install it with:
sudo apt-get install heaptrack
And check the installation with: heaptrack -v
Nargo can be profiled very easily: heaptrack nargo compile
This generates a dump, which must be processed like this:
heaptrack --analyze heaptrack.nargo.409749.gz
The analyse reports all the allocation done by nargo. In particular, the peaks are interesting.
For instance, compiling the blake3 test, which simply performs one call to a blackbox function, mention several peak memory around the elaborator. It makes sense because in this case, most of the work is done on the frontend.

The summary contains the following information:
total runtime: 1.47s.
calls to allocation functions: 1156118 (789159/s)
temporary memory allocations: 272381 (185925/s)
peak heap memory consumption: 78.39M
peak RSS (including heaptrack overhead): 105.05M
total memory leaked: 1.38M

Heaptrack could be added to the CI in order to keep track of the peak heap memory consumption.

Tracy

Tracy is more involved and cannot really be integrated with CI.
In fact Tracy is used to profile Barretenberg, via scripts and code instrumentation, and the scripts are not added to the CI.

Although Tracy is dedicated for profiling C++ applications, there are several Rust adapters.
Tracy requires:

  • code instrumentation, i.e to modify the source code to emit custom trace data to collect
  • server/client architecture, where the program to be profiled emits the data and the Tracy server collect that data
  • a GUI to show the collected data

Fortunately, each step is not too difficult to setup and at the end, you get a very powerful profiler.

Installation

On the mainframe, Tracy needs to be compiled from source:

git clone https://github.com/wolfpld/tracy
git checkout tags/v0.11.0
sudo apt-get install -y libdbus-1-dev libdbus-glib-1-dev libtbb-dev libfreetype-dev ;
cmake -B capture/build -S capture -DCMAKE_BUILD_TYPE=Release
cmake --build capture/build --config Release --parallel

Tracy is very sensitive to versioning, the client/server versions must be identical, which can be an issue due to the rust wrapper for the client. I found that version 0.11.0 works, but in order to get it to compile, I had to add the following line inside capture/CMakelist.txt:
add_compile_options(-Wno-error=stringop-overflow)
The master version does not requires it but I could not find a compatible rust client.
For information, the libraries I had to install for the master version are:
pkg-config, libxkbcommon-dev, libxkbcommon-x11-dev, libwayland-dev, wayland-protocols, libglvnd-dev, libdbus-1-dev

The profiler must be installed locally. For instance, on MacOS you'd simply do
brew install tracy

Instrumentation

Nargo needs to be modified in order to emit trace data. For this I use tracy-client which support memory profiling.
This is done by adding tracy-client = { version = "0.17.1", default-features = false, features = ["enable"] } to noir/tooling/nargo_cli/Cargo.toml
In main.rs, you can add the following to enable memory profiling:
use tracy_client::{Client, ProfiledAllocator};
#[global_allocator]
static GLOBAL: ProfiledAllocator<std::alloc::System> =
ProfiledAllocator::new(std::alloc::System, 100);

and then you start tracy client in main():
fn main() {
let _client = Client::start();
...
Sections of the code can be profiled simply by adding a span on the scope you are interested in. For instance I added one in compile_workspace_full(..):
let _span = span!("compile workspace");
You can additionally emit values or text.

Profiling

Once Nargo is instrumented, you can profile it by running:
capture/build/tracy-capture -o tracy_dump
This should tells you: Connecting to 127.0.0.1:8086...
...waiting for a client to connect. However, since Nargo is instrumented, the lsp might interfere, so if you use VSCode, you should disable the Noir Language Support VSCode extension.

You can now run Nargo as usual (from another terminal, but on the same machine) and you should see that it connects to tracy.
When Nargo finishes, tracy will have collect the data on the provided output (-o tracy_dump).
You can visualise it locally using tracy by running tracy tracy_dump. Since it will open a GUI for visualisation, you need to run this on your computer, not on the mainframe.

I profiled the poseidonsponge_x5_254 test case and it shows that memory usage peaks at 380 MB during the mem2reg pass.

Tracy could be used to help debugging a specific memory issue.

@guipublic
Copy link
Contributor

I profiled the peak memory usage of the test programs, and they are all around 80MB, except for a few below:

Program Peak Memory
bench_2_to_17 3.1G
regression_5252 3G
ram_blowup_regression 2.8G
eddsa 885M
poseidonsponge_x5_254 396M
regression_4709 387M
hashmap 280M
sha256_var_padding_regression 237M
sha256_regression 190M
regression_4449 185M
sha2_byte 150M
sha256_var_size_regression 130M
workspace 128M
poseidon_bn254_hash_width_3 128M
conditional_1 115M
uhashmap 114M
no_predicates_numeric_generic_poseidon 104M
fold_numeric_generic_poseidon 102M
sha256 93M
keccak256 90M
debug_logs 88M
array_dynamic_blackbox_input 87M
nested_array_dynamic 86M
sha256_var_witness_const_regression 84M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📋 Backlog
Development

No branches or pull requests

2 participants