Profile Nargo's high memory usage during compilation and determine optimization opportunities #6550

TomAFrench · 2024-11-19T12:07:23Z

We're currently getting very high memory usage when compiling the protocol circuits in aztec-packages so we should put in some work in order to determine where this memory is going.

There's a couple of potential tools which can help with this:

Heaptrack: Seems like it wouldn't need any changes to our code and just works externally.
Tracy: Cody has recommended this as what the crypto team uses. This is targetted more at C++ but there's a rust crate to hook into it (It does mention the need to use https://docs.rs/tracy-client/0.17.4/tracy_client/index.html to get memory data out though)
Others?

I don't know which option is best here so some exploration on how the various tools work and which is best would be very useful

Ideal outcomes:

Internal docs for how to install the chosen tool/tools.
Internal docs or scripts for usage of the chosen tool/tools.
Shortlist of locations where we can improve memory usage.
Memory reports which we can track in CI (nice to have)

guipublic · 2024-11-22T10:35:14Z

I have investigated Heaptrack and Tracy:

Heaptrack

Heaptrack is straightforward to use.

You can install it with:
sudo apt-get install heaptrack
And check the installation with: heaptrack -v
Nargo can be profiled very easily: heaptrack nargo compile
This generates a dump, which must be processed like this:
heaptrack --analyze heaptrack.nargo.409749.gz
The analyse reports all the allocation done by nargo. In particular, the peaks are interesting.
For instance, compiling the blake3 test, which simply performs one call to a blackbox function, mention several peak memory around the elaborator. It makes sense because in this case, most of the work is done on the frontend.

The summary contains the following information:
total runtime: 1.47s.
calls to allocation functions: 1156118 (789159/s)
temporary memory allocations: 272381 (185925/s)
peak heap memory consumption: 78.39M
peak RSS (including heaptrack overhead): 105.05M
total memory leaked: 1.38M

Heaptrack could be added to the CI in order to keep track of the peak heap memory consumption.

Tracy

Tracy is more involved and cannot really be integrated with CI.
In fact Tracy is used to profile Barretenberg, via scripts and code instrumentation, and the scripts are not added to the CI.

Although Tracy is dedicated for profiling C++ applications, there are several Rust adapters.
Tracy requires:

code instrumentation, i.e to modify the source code to emit custom trace data to collect
server/client architecture, where the program to be profiled emits the data and the Tracy server collect that data
a GUI to show the collected data

Fortunately, each step is not too difficult to setup and at the end, you get a very powerful profiler.

Installation

On the mainframe, Tracy needs to be compiled from source:

git clone https://github.com/wolfpld/tracy
git checkout tags/v0.11.0
sudo apt-get install -y libdbus-1-dev libdbus-glib-1-dev libtbb-dev libfreetype-dev ;
cmake -B capture/build -S capture -DCMAKE_BUILD_TYPE=Release
cmake --build capture/build --config Release --parallel

Tracy is very sensitive to versioning, the client/server versions must be identical, which can be an issue due to the rust wrapper for the client. I found that version 0.11.0 works, but in order to get it to compile, I had to add the following line inside capture/CMakelist.txt:
add_compile_options(-Wno-error=stringop-overflow)
The master version does not requires it but I could not find a compatible rust client.
For information, the libraries I had to install for the master version are:
pkg-config, libxkbcommon-dev, libxkbcommon-x11-dev, libwayland-dev, wayland-protocols, libglvnd-dev, libdbus-1-dev

The profiler must be installed locally. For instance, on MacOS you'd simply do
brew install tracy

Instrumentation

Nargo needs to be modified in order to emit trace data. For this I use tracy-client which support memory profiling.
This is done by adding tracy-client = { version = "0.17.1", default-features = false, features = ["enable"] } to noir/tooling/nargo_cli/Cargo.toml
In main.rs, you can add the following to enable memory profiling:
use tracy_client::{Client, ProfiledAllocator};
#[global_allocator]
static GLOBAL: ProfiledAllocator<std::alloc::System> =
ProfiledAllocator::new(std::alloc::System, 100);

and then you start tracy client in main():
fn main() {
let _client = Client::start();
...
Sections of the code can be profiled simply by adding a span on the scope you are interested in. For instance I added one in compile_workspace_full(..):
let _span = span!("compile workspace");
You can additionally emit values or text.

Profiling

Once Nargo is instrumented, you can profile it by running:
capture/build/tracy-capture -o tracy_dump
This should tells you: Connecting to 127.0.0.1:8086...
...waiting for a client to connect. However, since Nargo is instrumented, the lsp might interfere, so if you use VSCode, you should disable the Noir Language Support VSCode extension.

You can now run Nargo as usual (from another terminal, but on the same machine) and you should see that it connects to tracy.
When Nargo finishes, tracy will have collect the data on the provided output (-o tracy_dump).
You can visualise it locally using tracy by running tracy tracy_dump. Since it will open a GUI for visualisation, you need to run this on your computer, not on the mainframe.

I profiled the poseidonsponge_x5_254 test case and it shows that memory usage peaks at 380 MB during the mem2reg pass.

Tracy could be used to help debugging a specific memory issue.

guipublic · 2024-11-27T16:05:57Z

I profiled the peak memory usage of the test programs, and they are all around 80MB, except for a few below:

Program	Peak Memory
bench_2_to_17	3.1G
regression_5252	3G
ram_blowup_regression	2.8G
eddsa	885M
poseidonsponge_x5_254	396M
regression_4709	387M
hashmap	280M
sha256_var_padding_regression	237M
sha256_regression	190M
regression_4449	185M
sha2_byte	150M
sha256_var_size_regression	130M
workspace	128M
poseidon_bn254_hash_width_3	128M
conditional_1	115M
uhashmap	114M
no_predicates_numeric_generic_poseidon	104M
fold_numeric_generic_poseidon	102M
sha256	93M
keccak256	90M
debug_logs	88M
array_dynamic_blackbox_input	87M
nested_array_dynamic	86M
sha256_var_witness_const_regression	84M

github-project-automation bot added this to Noir Nov 19, 2024

github-project-automation bot moved this to 📋 Backlog in Noir Nov 19, 2024

TomAFrench assigned guipublic Nov 19, 2024

TomAFrench mentioned this issue Nov 22, 2024

chore: remove temporary allocations from num_bits #6600

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile Nargo's high memory usage during compilation and determine optimization opportunities #6550

Profile Nargo's high memory usage during compilation and determine optimization opportunities #6550

TomAFrench commented Nov 19, 2024

guipublic commented Nov 22, 2024

guipublic commented Nov 27, 2024

Profile Nargo's high memory usage during compilation and determine optimization opportunities #6550

Profile Nargo's high memory usage during compilation and determine optimization opportunities #6550

Comments

TomAFrench commented Nov 19, 2024

guipublic commented Nov 22, 2024

Heaptrack

Tracy

Installation

Instrumentation

Profiling

guipublic commented Nov 27, 2024