Move function names out of `Module` #3789

alexcrichton · 2022-02-10T19:19:21Z

This commit moves function names in a module out of the
wasmtime_environ::Module type and into separate sections stored in the
final compiled artifact. Spurred on by #3787 to look at module load
times I noticed that a huge amount of time was spent in deserializing
this map. The spidermonkey.wasm file, for example, has a 3MB name
section which is a lot of unnecessary data to deserialize at module load
time.

The names of functions are now split out into their own dedicated
section of the compiled artifact and metadata about them is stored in a
more compact format at runtime by avoiding a BTreeMap and instead
using a sorted array. Overall this improves deserialize times by up to
80% for modules with large name sections since the name section is no
longer deserialized at load time and it's lazily paged in as names are
actually referenced.

This commit moves function names in a module out of the `wasmtime_environ::Module` type and into separate sections stored in the final compiled artifact. Spurred on by bytecodealliance#3787 to look at module load times I noticed that a huge amount of time was spent in deserializing this map. The `spidermonkey.wasm` file, for example, has a 3MB name section which is a lot of unnecessary data to deserialize at module load time. The names of functions are now split out into their own dedicated section of the compiled artifact and metadata about them is stored in a more compact format at runtime by avoiding a `BTreeMap` and instead using a sorted array. Overall this improves deserialize times by up to 80% for modules with large name sections since the name section is no longer deserialized at load time and it's lazily paged in as names are actually referenced.

alexcrichton · 2022-02-10T19:30:21Z

For reference, the 80% number comes from the benchmark in #3790 and looks like:

deserialize/deserialize/rustpython.wasm
                        time:   [666.37 us 666.56 us 666.75 us]
                        change: [-79.458% -79.453% -79.448%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild

deserialize/deserialize/spidermonkey.wasm
                        time:   [1.4307 ms 1.4310 ms 1.4313 ms]
                        change: [-82.183% -82.176% -82.170%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  3 (3.00%) high mild

fitzgen

Nice!

crates/jit/src/instantiate.rs

fitzgen · 2022-02-10T19:37:18Z

Another thing we could do to take this even further is have a header on this new section (or add another new section) that contains the function index -> (offset, length) pairs. This way the CompiledModuleInfo would be even a bit smaller than it is after this PR.

github-actions · 2022-02-10T19:48:06Z

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

alexcrichton · 2022-02-10T19:48:58Z

I was considering doing that but ended up deciding against it because we dont have much other laziness in CompiledModule right now, although theoretically we should be able to have a fair amount more. I was originally tipped off to this by a unnaturally large .wasmtime.info section in the binary:

$ objdump -h spidermonkey.cwasm

spidermonkey.cwasm:     file format elf64-littleaarch64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00dc0000  0000000000000000  0000000000000000  00010000  2**16
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .eh_frame     001bf84c  0000000000000000  0000000000000000  00dd0000  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .wasmtime.addrmap 00bddf9c  0000000000000000  0000000000000000  00f8f84c  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .wasmtime.traps 002aefb7  0000000000000000  0000000000000000  01b6d7e8  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .rodata.wasm  00e5e488  0000000000000000  0000000000000000  01e1c79f  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .wasmtime.info 0042320c  0000000000000000  0000000000000000  02c7ac27  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

where here it's on the order of the size of the text section which was massive. This PR actually doesn't help this too too much and instead yields:

$ objdump -h spidermonkey.cwasm
...
  5 .name.wasm    002e344f  0000000000000000  0000000000000000  02ce31df  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .wasmtime.info 0013fdc4  0000000000000000  0000000000000000  02fc662e  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

where .name.wasm + .wasmtime.info is roughly the size of the old section. This still seems abnormally large for .wasmtime.info though as it's still sort of on the order of the entire text section. I benchmarked after this PR and it looked like all the remaining decoding time was PrimaryMap<DefinedFuncIndex, FunctionInfo>,. I didn't immediately see Vec<FunctionName> in the profile so I wasn't too worried about its decoding time.

I'll leave optimizing Vec<FunctionInfo> for another day, and in the meantime I do agree that we've still got in our back pocket lazily decoding the Vec<FunctionName> entirely as well since we don't need to create that immediately either.

Need to not only sort afterwards but also first to ensure the data of the name section is consistent.

fitzgen · 2022-02-10T19:52:55Z

because we dont have much other laziness in CompiledModule right now,

Wouldn't even need to be lazy inside CompiledModuleInfo, just use the header section directly. If each entry is two words (offset and length) then we can lookup the ith function's name by just doing

#[repr(C)]
struct OffsetAndLength {
    offset: usize,
    length: usize,
}

let header_section_ptr: *const OffsetAndLengthPair = ...;
let header_section_len = ...;
let header_section = slice::from_raw_parts(...);
return header_section[i];

fitzgen · 2022-02-10T19:54:27Z

But yeah no need to hold up this PR or do that now, if we don't want to.

alexcrichton · 2022-02-10T20:02:29Z

Oh actually that's a great point. We already do that for the address map section where there's a special encoder followed by a specialized lookup routine for the data produced by that encoder. While we need to be somewhat careful about alignments and such it's not really the end of the world.

Arguably almost everything in wasmtime::Module should move to something like this because deserializing wasmtime::Module is not a trivial task and is otherwise quite wasteful (lots of allocations, none of which are ever mutated so could continue to use the source-of-truth)...

Anyway I think I'll probably go with this for now since it's relatively lightweight, but there's definitely still more fruit to pick on the accelerate-load-precompiled-modules tree!

* Move function names out of `Module` This commit moves function names in a module out of the `wasmtime_environ::Module` type and into separate sections stored in the final compiled artifact. Spurred on by bytecodealliance#3787 to look at module load times I noticed that a huge amount of time was spent in deserializing this map. The `spidermonkey.wasm` file, for example, has a 3MB name section which is a lot of unnecessary data to deserialize at module load time. The names of functions are now split out into their own dedicated section of the compiled artifact and metadata about them is stored in a more compact format at runtime by avoiding a `BTreeMap` and instead using a sorted array. Overall this improves deserialize times by up to 80% for modules with large name sections since the name section is no longer deserialized at load time and it's lazily paged in as names are actually referenced. * Fix a typo * Fix compiled module determinism Need to not only sort afterwards but also first to ensure the data of the name section is consistent.

fitzgen approved these changes Feb 10, 2022

View reviewed changes

crates/jit/src/instantiate.rs Outdated Show resolved Hide resolved

alexcrichton mentioned this pull request Feb 10, 2022

Use mmap'd *.cwasm as a source for memory initialization images #3787

Merged

github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Feb 10, 2022

alexcrichton added 2 commits February 10, 2022 11:50

Fix a typo

927066b

Fix compiled module determinism

86d761b

Need to not only sort afterwards but also first to ensure the data of the name section is consistent.

alexcrichton merged commit 520a7f2 into bytecodealliance:main Feb 10, 2022

alexcrichton deleted the no-names-in-module branch February 10, 2022 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move function names out of `Module` #3789

Move function names out of `Module` #3789

alexcrichton commented Feb 10, 2022

alexcrichton commented Feb 10, 2022

fitzgen left a comment

fitzgen commented Feb 10, 2022

github-actions bot commented Feb 10, 2022

alexcrichton commented Feb 10, 2022

fitzgen commented Feb 10, 2022

fitzgen commented Feb 10, 2022

alexcrichton commented Feb 10, 2022

Move function names out of Module #3789

Move function names out of Module #3789

Conversation

alexcrichton commented Feb 10, 2022

alexcrichton commented Feb 10, 2022

fitzgen left a comment

Choose a reason for hiding this comment

fitzgen commented Feb 10, 2022

github-actions bot commented Feb 10, 2022

Subscribe to Label Action

alexcrichton commented Feb 10, 2022

fitzgen commented Feb 10, 2022

fitzgen commented Feb 10, 2022

alexcrichton commented Feb 10, 2022

Move function names out of `Module` #3789

Move function names out of `Module` #3789