-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.3 -> 1.4 regression bug: Memory usage increases over time #1978
Comments
Wasmer issue and minimal reproducible example: wasmerio/wasmer#4377 |
Thanks for the report and adding all the details, we will start taking a look |
@webmaster128 It seems workaround would lead to higher memory usage, and that's more like a bandaid fix, our module cache size is larger than the RAM so it won't really work, is there a real fix for this root cause of this issue? |
Just tried the work around to set cache size to 2GB, but it doesn't seem to help |
Sorry to clarify, we are seeing a different memory leak issue than this reported bug, so it might not be a successful repro. |
(cherry picked from commit 1b110c6)
(cherry picked from commit 1b110c6)
(cherry picked from commit 1b110c6)
(cherry picked from commit 1b110c6)
(cherry picked from commit 1b110c6)
Fixes released as part of wasmvm 1.4.3 and 1.5.2 |
We got reports from multiple network operators that after an upgrade to CosmWasm 1.4 or 1.5 the memory usage increases a lot over time. This is clearly a bug in CosmWasm for which at the point of writing there is no fix. However, there are good mitigation strategies which I'll elaborate in here.
What's happening
When you run a node with wasmvm 1.4 or 1.5, the memory usage of the process increases over time. The memory usage profile looks like this:
You might see also experiences the consequences such as:
Why it is happening
Every time you load a contract from the file system cache, the memory usage increase (this is the bug). If contracts kick out each other from the in-memory cache, this happens often. If the cache is large enough to hold the majority of actively used contracts, this happens very rarely.
Workaround
To mitigate the problem, increase the config
wasm.memory_cache_size
in app.toml from 100 MiB to a much larger value depending on the network such as e.g. 2000 MiB:This is a per-node configuration and needs to be done on every node.
How lage should the cache be?
This depends on the usage patterns of the network and the size of the compiled modules. Being able to store all contracts in memory would be one extreme that might make sense for permissioned CosmWasm chains. Permissionless chains are likely to have contracts that are almost never used.
To get a rough idea of the oder of magnitude, you can check the size of the modules using something like this:
du -hs ~/.myd/wasm/wasm/cache/modules/v6-*
du -hs ~/.myd/wasm/wasm/cache/modules/v7-*
du -hs ~/.myd/wasm/wasm/cache/modules/v8-*
Complementary strategies
The above setting is the most important thing. But there is more you can do, like
Overall bear in mind I am not a node operator and I don't know the specifics of your blockchain or system. So I cannot make complete and final recommendations.
The bug
The bug can be reproducted locally in a pure-Rust example using heap profiling shown in #1955. The tools shows us that the memory usage increases over time but is almost zero when the process is ending cleanly. This means this is not a memory leak but rather an undesired memory usage pattern.
This is where the allocations are made. At max memory usage time (t-gmax), 96% are coming through
cosmwasm_vm::modules::file_system_cache::FileSystemCache::load
.At this point it is not clear to me if this is a bug in Wasmer, rkyv or cosmwasm-vm.
The text was updated successfully, but these errors were encountered: