Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move function names out of Module #3789

Merged
merged 3 commits into from
Feb 10, 2022

Conversation

alexcrichton
Copy link
Member

This commit moves function names in a module out of the
wasmtime_environ::Module type and into separate sections stored in the
final compiled artifact. Spurred on by #3787 to look at module load
times I noticed that a huge amount of time was spent in deserializing
this map. The spidermonkey.wasm file, for example, has a 3MB name
section which is a lot of unnecessary data to deserialize at module load
time.

The names of functions are now split out into their own dedicated
section of the compiled artifact and metadata about them is stored in a
more compact format at runtime by avoiding a BTreeMap and instead
using a sorted array. Overall this improves deserialize times by up to
80% for modules with large name sections since the name section is no
longer deserialized at load time and it's lazily paged in as names are
actually referenced.

This commit moves function names in a module out of the
`wasmtime_environ::Module` type and into separate sections stored in the
final compiled artifact. Spurred on by bytecodealliance#3787 to look at module load
times I noticed that a huge amount of time was spent in deserializing
this map. The `spidermonkey.wasm` file, for example, has a 3MB name
section which is a lot of unnecessary data to deserialize at module load
time.

The names of functions are now split out into their own dedicated
section of the compiled artifact and metadata about them is stored in a
more compact format at runtime by avoiding a `BTreeMap` and instead
using a sorted array. Overall this improves deserialize times by up to
80% for modules with large name sections since the name section is no
longer deserialized at load time and it's lazily paged in as names are
actually referenced.
@alexcrichton
Copy link
Member Author

For reference, the 80% number comes from the benchmark in #3790 and looks like:

deserialize/deserialize/rustpython.wasm
                        time:   [666.37 us 666.56 us 666.75 us]
                        change: [-79.458% -79.453% -79.448%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild

deserialize/deserialize/spidermonkey.wasm
                        time:   [1.4307 ms 1.4310 ms 1.4313 ms]
                        change: [-82.183% -82.176% -82.170%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  3 (3.00%) high mild

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

crates/jit/src/instantiate.rs Outdated Show resolved Hide resolved
@fitzgen
Copy link
Member

fitzgen commented Feb 10, 2022

Another thing we could do to take this even further is have a header on this new section (or add another new section) that contains the function index -> (offset, length) pairs. This way the CompiledModuleInfo would be even a bit smaller than it is after this PR.

@github-actions github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Feb 10, 2022
@github-actions
Copy link

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@alexcrichton
Copy link
Member Author

I was considering doing that but ended up deciding against it because we dont have much other laziness in CompiledModule right now, although theoretically we should be able to have a fair amount more. I was originally tipped off to this by a unnaturally large .wasmtime.info section in the binary:

$ objdump -h spidermonkey.cwasm

spidermonkey.cwasm:     file format elf64-littleaarch64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00dc0000  0000000000000000  0000000000000000  00010000  2**16
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .eh_frame     001bf84c  0000000000000000  0000000000000000  00dd0000  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .wasmtime.addrmap 00bddf9c  0000000000000000  0000000000000000  00f8f84c  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .wasmtime.traps 002aefb7  0000000000000000  0000000000000000  01b6d7e8  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .rodata.wasm  00e5e488  0000000000000000  0000000000000000  01e1c79f  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .wasmtime.info 0042320c  0000000000000000  0000000000000000  02c7ac27  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

where here it's on the order of the size of the text section which was massive. This PR actually doesn't help this too too much and instead yields:

$ objdump -h spidermonkey.cwasm
...
  5 .name.wasm    002e344f  0000000000000000  0000000000000000  02ce31df  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .wasmtime.info 0013fdc4  0000000000000000  0000000000000000  02fc662e  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

where .name.wasm + .wasmtime.info is roughly the size of the old section. This still seems abnormally large for .wasmtime.info though as it's still sort of on the order of the entire text section. I benchmarked after this PR and it looked like all the remaining decoding time was PrimaryMap<DefinedFuncIndex, FunctionInfo>,. I didn't immediately see Vec<FunctionName> in the profile so I wasn't too worried about its decoding time.

I'll leave optimizing Vec<FunctionInfo> for another day, and in the meantime I do agree that we've still got in our back pocket lazily decoding the Vec<FunctionName> entirely as well since we don't need to create that immediately either.

Need to not only sort afterwards but also first to ensure the data of
the name section is consistent.
@fitzgen
Copy link
Member

fitzgen commented Feb 10, 2022

because we dont have much other laziness in CompiledModule right now,

Wouldn't even need to be lazy inside CompiledModuleInfo, just use the header section directly. If each entry is two words (offset and length) then we can lookup the ith function's name by just doing

#[repr(C)]
struct OffsetAndLength {
    offset: usize,
    length: usize,
}

let header_section_ptr: *const OffsetAndLengthPair = ...;
let header_section_len = ...;
let header_section = slice::from_raw_parts(...);
return header_section[i];

@fitzgen
Copy link
Member

fitzgen commented Feb 10, 2022

But yeah no need to hold up this PR or do that now, if we don't want to.

@alexcrichton
Copy link
Member Author

Oh actually that's a great point. We already do that for the address map section where there's a special encoder followed by a specialized lookup routine for the data produced by that encoder. While we need to be somewhat careful about alignments and such it's not really the end of the world.

Arguably almost everything in wasmtime::Module should move to something like this because deserializing wasmtime::Module is not a trivial task and is otherwise quite wasteful (lots of allocations, none of which are ever mutated so could continue to use the source-of-truth)...

Anyway I think I'll probably go with this for now since it's relatively lightweight, but there's definitely still more fruit to pick on the accelerate-load-precompiled-modules tree!

@alexcrichton alexcrichton merged commit 520a7f2 into bytecodealliance:main Feb 10, 2022
@alexcrichton alexcrichton deleted the no-names-in-module branch February 10, 2022 20:34
mpardesh pushed a commit to avanhatt/wasmtime that referenced this pull request Mar 17, 2022
* Move function names out of `Module`

This commit moves function names in a module out of the
`wasmtime_environ::Module` type and into separate sections stored in the
final compiled artifact. Spurred on by bytecodealliance#3787 to look at module load
times I noticed that a huge amount of time was spent in deserializing
this map. The `spidermonkey.wasm` file, for example, has a 3MB name
section which is a lot of unnecessary data to deserialize at module load
time.

The names of functions are now split out into their own dedicated
section of the compiled artifact and metadata about them is stored in a
more compact format at runtime by avoiding a `BTreeMap` and instead
using a sorted array. Overall this improves deserialize times by up to
80% for modules with large name sections since the name section is no
longer deserialized at load time and it's lazily paged in as names are
actually referenced.

* Fix a typo

* Fix compiled module determinism

Need to not only sort afterwards but also first to ensure the data of
the name section is consistent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants