Long living engine consumes endless memory when modules are loaded from file #4377

webmaster128 · 2023-12-28T09:08:36Z

Describe the bug

When using a single long running headless engine in combination with many loads from a file system cache, the memory usage increases constantly. If modules are just instantiated from in-memory Module instances, this does not happen. This was observed by CosmWasm users.

The issue can be reproduced using the following adapted Wasmer example: master...webmaster128:demo-memory-growth. What this is doing is:

Compile a module and store it to disk in a compiling engine. Singlepass/Cranelift makes no difference. Drop that engine.
Create a headless engine for executing modules
Load modules from disk and drop them later

Steps to reproduce

Checkout https://github.com/webmaster128/wasmer/tree/demo-memory-growth
Run cargo run --example engine-headless --release --features "cranelift" in the repo root

Expected behavior

The memory usage of the example stays below a reasonable bound because the deserialized Modules are dropped

Actual behavior

Running the above example leads to the following memory profile:

A few notes here:

This slows allocation tracking by xcrun xctrace record --template 'Allocations' --launch ./target/release/examples/engine-headless
This run peaks at 27GB memory usage, but we can get arbitrarily high by increasing the number of module deserializations
Where the blue line is the systems starts to swap
There is a 5s pause at the beginning and end of the program. Once the runtime engine is dropped, memory is low again.

A different memory profiling approach shows the route through which the majority of allocations are created:

Additional context

OS:

reproducible example is developed on ARM-based macOS
same problem occurs on x86_64-based GNU Linux systems

Wasmer version:

commit 04600507844bae978239531c8e6265d410aefe6d (HEAD -> master, tag: v4.2.5, wasmerio/master, wasmerio/HEAD, origin/master)
Merge: 1b803c93a7 04c49aee4e
Author: Arshia001 <[email protected]>
Date:   Sat Dec 23 13:30:03 2023 +0400

    Merge pull request #4374 from wasmerio/release-4.2.5
    
    Release 4.2.5

This problem started since we refactored our caching solution such that we only use a single long living runtime engine instead of having to cache (Module, Engine) pairs. It might have been in Wasmer for a long time.

The text was updated successfully, but these errors were encountered:

linear · 2023-12-28T09:08:39Z

RUN-33 Long living engine consumes endless memory when modules are loaded from file

webmaster128 · 2023-12-30T00:47:42Z

Turns out the static global FRAME_INFO is increasing every time a module is deseralized from file (through register_frame_info aka. fn register in lib/compiler/src/engine/trap/frame_info.rs).

The following debug logs

diff --git a/lib/compiler/src/engine/trap/frame_info.rs b/lib/compiler/src/engine/trap/frame_info.rs
index 4222acc1ad..7d714ea746 100644
--- a/lib/compiler/src/engine/trap/frame_info.rs
+++ b/lib/compiler/src/engine/trap/frame_info.rs
@@ -239,5 +239,15 @@ pub fn register(
         },
     );
     assert!(prev.is_none());
+
+    let frame_info_ranges = info.ranges.len();
+    let total_functions: usize = info.ranges.values().map(|mifi| mifi.functions.len()).sum();
+    let total_frame_info: usize = info
+        .ranges
+        .values()
+        .map(|mifi| mifi.frame_infos.len())
+        .sum();
+    println!("Elements of FRAME_INFO.ranges: {frame_info_ranges}, total functions: {total_functions}, total frame infos: {total_frame_info}");
+
     Some(GlobalFrameInfoRegistration { key: max })
 }

print the output

[...]
Creating headless Universal engine 9992...
Elements of FRAME_INFO.ranges: 9992, total functions: 4606312, total frame infos: 4606312
Creating headless Universal engine 9993...
Elements of FRAME_INFO.ranges: 9993, total functions: 4606773, total frame infos: 4606773
Creating headless Universal engine 9994...
Elements of FRAME_INFO.ranges: 9994, total functions: 4607234, total frame infos: 4607234
Creating headless Universal engine 9995...
Elements of FRAME_INFO.ranges: 9995, total functions: 4607695, total frame infos: 4607695
Creating headless Universal engine 9996...
Elements of FRAME_INFO.ranges: 9996, total functions: 4608156, total frame infos: 4608156
Creating headless Universal engine 9997...
Elements of FRAME_INFO.ranges: 9997, total functions: 4608617, total frame infos: 4608617
Creating headless Universal engine 9998...
Elements of FRAME_INFO.ranges: 9998, total functions: 4609078, total frame infos: 4609078
Creating headless Universal engine 9999...
Elements of FRAME_INFO.ranges: 9999, total functions: 4609539, total frame infos: 4609539
Creating headless Universal engine 10000...
Elements of FRAME_INFO.ranges: 10000, total functions: 4610000, total frame infos: 4610000

This probably means the GlobalFrameInfoRegistration instance is not dropped:

/// An RAII structure used to unregister a module's frame information when the
/// module is destroyed.
pub struct GlobalFrameInfoRegistration {

Debugging the drop implementation of GlobalFrameInfoRegistration shows that all the unregistrations are done when the engine is dropped, not when the artifact is dropped. This explains why we have a low memory usage at the end of the program. This is probably due to this code:

        artifact
            .internal_register_frame_info()
            .map_err(|e| DeserializeError::CorruptedBinary(format!("{:?}", e)))?;
        if let Some(frame_info) = artifact.internal_take_frame_info_registration() {
            engine_inner.register_frame_info(frame_info);
        }

Now I wonder if this is a bug or the desired behaviour.

Arshia001 · 2024-01-02T11:27:40Z

@webmaster128 thank you for reporting this issue. I will look into this.

webmaster128 · 2024-01-03T19:07:07Z

Do you have some initial thoughts if and how this can be fixed? Do you think we need to adapt out codebase to use 1 engine per Module to work around the issue?

Arshia001 · 2024-01-03T19:44:55Z

Optimally, I want to fix this in time for the next release.

Do you think we need to adapt out codebase to use 1 engine per Module to work around the issue?

Tagging @theduke @Michael-F-Bryan to answer this, I don't have a full picture of the implications.

jost-s · 2024-01-05T23:14:23Z

I'm tracing your steps here, @webmaster128 ;-) I'm running into the same issue during benchmarks with a long-lived runtime engine. Thanks for reporting this and the reproduction is spot on too. It doesn't need to be deserialization from file by the way, the same happens for modules kept in memory.

fn create_module(runtime_engine: &Engine) -> anyhow::Result<Module> {
    let compiling_engine = Engine::default();
    let wat = r#"
    (module
        (func (export "test"))
    )"#;
    let mut module = Module::new(&compiling_engine, wat.as_bytes())?;
    let serialized_module = module.serialize()?;

    for _ in 0..100 {
        module = unsafe { Module::deserialize(runtime_engine, serialized_module.clone())? };
    }

    Ok(module)
}

pub fn wasm_module(c: &mut Criterion) {
    let mut group = c.benchmark_group("wasm_module");

    let runtime_engine = Engine::headless();
    group.bench_function(BenchmarkId::new("wasm_module", "module"), |b| {
        b.iter(|| create_module(&runtime_engine))
    });

    group.finish()
}

criterion_group!(benches, wasm_module,);

criterion_main!(benches);

Do you think we need to adapt out codebase to use 1 engine per Module to work around the issue?

It seems to me that if you're deserializing modules that are dropped like in the above example, it's a memory leak holding on to the artifacts that are not accessible any longer.

webmaster128 · 2024-01-05T23:44:32Z

it's a memory leak holding on to the artifacts that are not accessible any longer

The artifacts are dropped as expected at the right time (which I tested too along the way). The problem is that there is a bunch of memory for each deserialization that is owned by the engine instead of the artifact. So it lives as long as the engine. Once the engine is dropped, everything is nice and clean.

jost-s · 2024-01-06T01:11:08Z

Okay, then the memory should be freed when the artifact is dropped, shouldn't it? What else is it kept for? Or the memory should be owned by the artifact instead and will be dropped along with the artifact that way.

webmaster128 · 2024-01-11T17:25:52Z

I can confirm that caching a (Module, Engine) pair is a workaround for this issue because then the Engine does not live much longer than the artifact. We did this here: CosmWasm/cosmwasm#1982. Ideally we can revert once this is solved in Wasmer.

jost-s · 2024-01-12T12:03:48Z

I realize that is not a workaround for my scenario where I keep the module and thus engine for the entire time the system is running. It wouldn't make a difference storing the engine with the module or not, since the module is never dropped.

On a positive note, in my use case deserialization is only done once at startup and there are less than 20 modules, so the memory usage shouldn't go through the roof.

tzemanovic · 2024-07-23T12:58:05Z

Hi, we ran into the same issue in https://github.com/anoma/namada. In our use case we're using singlepass compiler and we cache compiled Modules in memory.

This is what we observed on v4.3.5 (and earlier v4.3s) when we use the same Store that's kept alive to make Instance and run cached Modules with it:

The static global FRAME_INFO is leaky as reported above - it keeps growing with every run and never deallocates as long as the Store is alive
When the wasm that's being used contains some globals (e.g. (global (;0;) (mut i32) (i32.const 0)) in WAT), there seems to be an additional leak in Tunables::create_globals that's being invoked from Artifact::instantiate. This keeps allocating in wasmer_vm::global::VMGlobal::new where the Box is also not deallocated after each run

We tried to workaround this by instantiating a new Store every time but ran into another problem (I'll report this with a minimal reproducible example in another issue):

When a Store is dropped, the destructor for FRAME_INFO deallocated, but re-running the same cached Module with a new Store wouldn't re-register the FRAME_INFO and instead crashed with invalid memory access

In the end, what worked for us was to keep a Store that's used to compile cached Modules alive and instantiate a new Store when we want to run them.

webmaster128 mentioned this issue Dec 28, 2023

1.3 -> 1.4 regression bug: Memory usage increases over time CosmWasm/cosmwasm#1978

Closed

Arshia001 self-assigned this Jan 2, 2024

Arshia001 added bug Something isn't working 📦 lib-api About wasmer 📦 lib-engine About wasmer-engine priority-medium Medium priority issue labels Jan 2, 2024

Arshia001 added this to the v4.3 milestone Jan 2, 2024

jost-s mentioned this issue Jan 10, 2024

refactor: reinstate module cache holochain/holochain-wasmer#106

Merged

syrusakbary added the project-confio label May 2, 2024

This was referenced Jul 23, 2024

reusing a Module may trigger invalid memory access #4949

Open

workaround wasmer leak anoma/namada#3529

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long living engine consumes endless memory when modules are loaded from file #4377

Long living engine consumes endless memory when modules are loaded from file #4377

webmaster128 commented Dec 28, 2023

linear bot commented Dec 28, 2023

webmaster128 commented Dec 30, 2023 •

edited

Loading

Arshia001 commented Jan 2, 2024

webmaster128 commented Jan 3, 2024

Arshia001 commented Jan 3, 2024

jost-s commented Jan 5, 2024 •

edited

Loading

webmaster128 commented Jan 5, 2024

jost-s commented Jan 6, 2024

webmaster128 commented Jan 11, 2024

jost-s commented Jan 12, 2024

tzemanovic commented Jul 23, 2024

Long living engine consumes endless memory when modules are loaded from file #4377

Long living engine consumes endless memory when modules are loaded from file #4377

Comments

webmaster128 commented Dec 28, 2023

Describe the bug

Steps to reproduce

Expected behavior

Actual behavior

Additional context

linear bot commented Dec 28, 2023

webmaster128 commented Dec 30, 2023 • edited Loading

Arshia001 commented Jan 2, 2024

webmaster128 commented Jan 3, 2024

Arshia001 commented Jan 3, 2024

jost-s commented Jan 5, 2024 • edited Loading

webmaster128 commented Jan 5, 2024

jost-s commented Jan 6, 2024

webmaster128 commented Jan 11, 2024

jost-s commented Jan 12, 2024

tzemanovic commented Jul 23, 2024

webmaster128 commented Dec 30, 2023 •

edited

Loading

jost-s commented Jan 5, 2024 •

edited

Loading