Wasmtime: `cache_accounts_for_opt_level` failing on `main` #4600

fitzgen · 2022-08-03T20:41:19Z

$ cargo test -p 'wasmtime' engine::tests::cache_accounts_for_opt_level
   Compiling wasmtime-environ v0.40.0 (/home/nick/wasmtime/crates/environ)
   Compiling wasmtime-runtime v0.40.0 (/home/nick/wasmtime/crates/runtime)
   Compiling wasmtime-cranelift v0.40.0 (/home/nick/wasmtime/crates/cranelift)
   Compiling wasmtime-jit v0.40.0 (/home/nick/wasmtime/crates/jit)
   Compiling wasmtime v0.40.0 (/home/nick/wasmtime/crates/wasmtime)
   Compiling wiggle v0.40.0 (/home/nick/wasmtime/crates/wiggle)
   Compiling wasi-common v0.40.0 (/home/nick/wasmtime/crates/wasi-common)
   Compiling wasi-cap-std-sync v0.40.0 (/home/nick/wasmtime/crates/wasi-common/cap-std-sync)
   Compiling wasmtime-wasi v0.40.0 (/home/nick/wasmtime/crates/wasi)
    Finished test [unoptimized + debuginfo] target(s) in 33.10s
     Running unittests src/lib.rs (target/debug/deps/wasmtime-c7705c65c4e36c46)

running 1 test
test engine::tests::cache_accounts_for_opt_level ... FAILED

failures:

---- engine::tests::cache_accounts_for_opt_level stdout ----
thread 'engine::tests::cache_accounts_for_opt_level' panicked at 'called `Result::unwrap()` on an `Err` value: ()', crates/jit/src/debug.rs:147:58
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    engine::tests::cache_accounts_for_opt_level

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 10 filtered out; finished in 0.03s

error: test failed, to rerun pass '-p wasmtime --lib'

The text was updated successfully, but these errors were encountered:

fitzgen · 2022-08-03T20:42:02Z

This is my HEAD for main:

commit 301be7438e424a1bcaa5354ebe6d1b73b7f880ac (HEAD -> main, bytecodealliance/main)
Author: Trevor Elliott <[email protected]>
Date:   Wed Aug 3 13:28:52 2022 -0700

    x64: Begin migrating branch instructions to ISLE (#4587)
    
    https://github.com/bytecodealliance/wasmtime/pull/4587

akirilov-arm · 2022-08-04T14:32:33Z

I enabled this test on AArch64 in #3606, so after seeing this issue I thought that I had caused a regression, but I just checked, and the test passed when I ran it via ci/run-tests.sh; after that I changed one of the test assertions to use a bogus value to verify that the test was actually running. Curiously, it fails when executed directly, as specified here.

alexcrichton · 2022-08-04T14:43:11Z

Oh that explains why my bisection run on an AArch64 machine points to #3606 as the cause! A bisection on an x86_64 machine points to #4571 and while that is entirely unrelated I think I see the problem so I'll post a fix soon.

This commit fixes bytecodealliance#4600 in a somewhat roundabout fashion. Currently the `main` branch of Wasmtime exhibits unusual behavior: * If `./ci/run-tests.sh` is run then the `cache_accounts_for_opt_level` test does not fail. * If `cargo test -p wasmtime --lib` is run, however, then the test fails. This test is indeed being run as part of `./ci/run-tests.sh` and it's also passing in CI. The exact failure is that part of the debuginfo support we have takes an existing ELF image, copies it, and then appends some information to inform profilers/gdb about the image. This code is all quite old at this point and not 100% optimal, but that's at least where we're at. The problem is that the appended `ProgramHeader64` is not aligned correctly during `cargo test -p wasmtime --lib`, which is the panic that happens causing the test to fail. The reason, however, that this test passes with `./ci/run-tests.sh` is that the alignment of `ProgramHeader64` is 1 instead of 8. The reason for that is that the `object` crate has an `unaligned` feature which forcibly unaligns all primitives to 1 byte instead of their natural alignment. During `cargo test -p wasmtime --lib` this feature is not enabled but during `./ci/run-tests.sh` this feature is enabled. The feature is currently enabled through inclusion of the `backtrace` crate which only happens for some tests in some crates. The alignment issue explains why the test fails on a single crate test but fails on the whole workspace tests. The next issue I investigated was if this test ever passed. It turns out that on v0.39.0 this test passed, and the regression to main was introduced during bytecodealliance#4571. That PR, however, has nothing to do with any of this! The reason that this showed up as causing a "regression" however is because it changed cranelift settings which changed the size of serialized metadata at the end of a Wasmtime cache object. Wasmtime compiled artifacts are ELF images with Wasmtime-specific metadata appended after them. This appended metadata was making its way all the way through to the gdbjit image itself which mean that while the end of the ELF file itself was properly aligned the space after the Wasmtime metadata was not aligned. This metadata changes in size over time as Cranelift settings change which explains why bytecodealliance#4571 was the "source" of the regression. The fix in this commit is to discard the extra Wasmtime metadata when creating an `MmapVec` representing the underlying ELF image. This is already supported with `MmapVec::drain` so it was relatively easy to insert that. This means that the gdbjit image starts with just the ELF file itself which is always aligned at the end, which gets the test passing with/without the `unaligned` feature in the `object` crate.

This commit fixes #4600 in a somewhat roundabout fashion. Currently the `main` branch of Wasmtime exhibits unusual behavior: * If `./ci/run-tests.sh` is run then the `cache_accounts_for_opt_level` test does not fail. * If `cargo test -p wasmtime --lib` is run, however, then the test fails. This test is indeed being run as part of `./ci/run-tests.sh` and it's also passing in CI. The exact failure is that part of the debuginfo support we have takes an existing ELF image, copies it, and then appends some information to inform profilers/gdb about the image. This code is all quite old at this point and not 100% optimal, but that's at least where we're at. The problem is that the appended `ProgramHeader64` is not aligned correctly during `cargo test -p wasmtime --lib`, which is the panic that happens causing the test to fail. The reason, however, that this test passes with `./ci/run-tests.sh` is that the alignment of `ProgramHeader64` is 1 instead of 8. The reason for that is that the `object` crate has an `unaligned` feature which forcibly unaligns all primitives to 1 byte instead of their natural alignment. During `cargo test -p wasmtime --lib` this feature is not enabled but during `./ci/run-tests.sh` this feature is enabled. The feature is currently enabled through inclusion of the `backtrace` crate which only happens for some tests in some crates. The alignment issue explains why the test fails on a single crate test but fails on the whole workspace tests. The next issue I investigated was if this test ever passed. It turns out that on v0.39.0 this test passed, and the regression to main was introduced during #4571. That PR, however, has nothing to do with any of this! The reason that this showed up as causing a "regression" however is because it changed cranelift settings which changed the size of serialized metadata at the end of a Wasmtime cache object. Wasmtime compiled artifacts are ELF images with Wasmtime-specific metadata appended after them. This appended metadata was making its way all the way through to the gdbjit image itself which mean that while the end of the ELF file itself was properly aligned the space after the Wasmtime metadata was not aligned. This metadata changes in size over time as Cranelift settings change which explains why #4571 was the "source" of the regression. The fix in this commit is to discard the extra Wasmtime metadata when creating an `MmapVec` representing the underlying ELF image. This is already supported with `MmapVec::drain` so it was relatively easy to insert that. This means that the gdbjit image starts with just the ELF file itself which is always aligned at the end, which gets the test passing with/without the `unaligned` feature in the `object` crate.

fitzgen added the bug Incorrect behavior in the current implementation that needs fixing label Aug 3, 2022

alexcrichton mentioned this issue Aug 4, 2022

Shield compiled modules from their appended metadata #4609

Merged

alexcrichton closed this as completed in #4609 Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wasmtime: `cache_accounts_for_opt_level` failing on `main` #4600

Wasmtime: `cache_accounts_for_opt_level` failing on `main` #4600

fitzgen commented Aug 3, 2022

fitzgen commented Aug 3, 2022

akirilov-arm commented Aug 4, 2022

alexcrichton commented Aug 4, 2022

Wasmtime: cache_accounts_for_opt_level failing on main #4600

Wasmtime: cache_accounts_for_opt_level failing on main #4600

Comments

fitzgen commented Aug 3, 2022

fitzgen commented Aug 3, 2022

akirilov-arm commented Aug 4, 2022

alexcrichton commented Aug 4, 2022

Wasmtime: `cache_accounts_for_opt_level` failing on `main` #4600

Wasmtime: `cache_accounts_for_opt_level` failing on `main` #4600