Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust tests fail in dev container locally #3741

Closed
zprobinson opened this issue Feb 7, 2024 · 10 comments · Fixed by #3747
Closed

Rust tests fail in dev container locally #3741

zprobinson opened this issue Feb 7, 2024 · 10 comments · Fixed by #3747

Comments

@zprobinson
Copy link

Description

Creating a tracking issue here for some issues I encountered while working on #3738

After cloning the repository and opening the dev container, I added some new bindings for the Result module to the Rust compiler. To test my changes, I added tests to tests/Rust/tests/src/ResultTests.fs and then ran ./build.sh test rust from the workspace root.

The F# tests compile successfully. Afterwards, the compilation to rust files failed with a weird error:

malloc(): unaligned tcache chunk detected
error: test failed, to rerun pass `--test src`

Caused by:
  process didn't exit successfully: `/workspaces/Fable/temp/tests/Rust/target/debug/deps/src-7c4fdba163c9ac55` (signal: 5, SIGTRAP: trace/breakpoint trap)
Compilation failed
Unhandled exception. SimpleExec.ExitCodeException: The command exited with code 1.
   at SimpleExec.Command.Run(ProcessStartInfo startInfo, Boolean noEcho, String echoPrefix, Func`2 handleExitCode, CancellationToken cancellationToken) in /_/SimpleExec/Command.cs:line 121
   at SimpleExec.Command.Run(String name, String args, String workingDirectory, Boolean noEcho, String echoPrefix, Action`1 configureEnvironment, Boolean createNoWindow, Func`2 handleExitCode, CancellationToken cancellationToken) in /_/SimpleExec/Command.cs:line 54
   at SimpleExec.Command.Fable.Static(CmdLine args, FSharpOption`1 workingDirectory, FSharpOption`1 noEcho, FSharpOption`1 echoPrefix) in /workspaces/Fable/src/Fable.Build/SimpleExec.Extensions.fs:line 26
   at Build.Test.Rust.handle(FSharpList`1 args) in /workspaces/Fable/src/Fable.Build/Test/Rust.fs:line 104
   at Build.Main.main(String[] argv) in /workspaces/Fable/src/Fable.Build/Main.fs:line 127

The weird part is the tests all compile/pass here on remote. Once the merge has been completed, I am fetch/pull down main, and I can run all tests successfully.

Related information

  • Operating system: macOS Sonoma 14.2.1
  • Chip: Apple M2
  • Rust version (from dev container): rustc 1.75.0 (82e1608df 2023-12-21)
@ncave
Copy link
Collaborator

ncave commented Feb 7, 2024

@zprobinson Thanks for reporting.

  • Can the issue be reproduced reliably? Can you describe the steps to reproduce?
  • Are you using an emulator in your dev container? What os/architecture is running in the container?

@zprobinson
Copy link
Author

zprobinson commented Feb 8, 2024

It seems that I can reproduce it. Here are the steps I took to reproduce:

  1. Open dev container in VSCode
  2. Run ./build.sh test rust and ensure that it passes
  3. Open Rust Result module, delete one function
  4. Open Rust Result Test file, delete the test
  5. Run ./build.sh test rust (it passed here, happy news)
  6. Open Rust Result module, add back in function that was removed
  7. Open Rust Result Test file, add back in test that was removed
  8. Run ./build.sh test rust
  9. See the same error as described above and then cry (crying optional, but encouraged)
malloc(): unaligned tcache chunk detected
error: test failed, to rerun pass `--test src`

Caused by:
  process didn't exit successfully: `/workspaces/Fable/temp/tests/Rust/target/debug/deps/src-7c4fdba163c9ac55` (signal: 6, SIGABRT: process abort signal)
Compilation failed
Unhandled exception. SimpleExec.ExitCodeException: The command exited with code 1.
   at SimpleExec.Command.Run(ProcessStartInfo startInfo, Boolean noEcho, String echoPrefix, Func`2 handleExitCode, CancellationToken cancellationToken) in /_/SimpleExec/Command.cs:line 121
   at SimpleExec.Command.Run(String name, String args, String workingDirectory, Boolean noEcho, String echoPrefix, Action`1 configureEnvironment, Boolean createNoWindow, Func`2 handleExitCode, CancellationToken cancellationToken) in /_/SimpleExec/Command.cs:line 54
   at SimpleExec.Command.Fable.Static(CmdLine args, FSharpOption`1 workingDirectory, FSharpOption`1 noEcho, FSharpOption`1 echoPrefix) in /workspaces/Fable/src/Fable.Build/SimpleExec.Extensions.fs:line 26
   at Build.Test.Rust.handle(FSharpList`1 args) in /workspaces/Fable/src/Fable.Build/Test/Rust.fs:line 104
   at Build.Main.main(String[] argv) in /workspaces/Fable/src/Fable.Build/Main.fs:line 127

For clarity, I am able to get past this line successfully:

Test run for /workspaces/Fable/tests/Rust/bin/Release/net8.0/Fable.Tests.Rust.dll (.NETCoreApp,Version=v8.0)
Microsoft (R) Test Execution Command Line Tool Version 17.8.0 (arm64)
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.

Passed!  - Failed:     0, Passed:  1966, Skipped:     0, Total:  1966, Duration: 713 ms - Fable.Tests.Rust.dll (net8.0)

Then it starts the Fable compilation and seemingly succeeds:

Fable plugins, skipping this assembly

Started Fable compilation...
Compiled 51/51: ../../../tests/Rust/tests/src/UnionTests.fsFable compilation finished in 9213ms

./../../../tests/Rust/tests/src/ApplicativeTests.fs(1730,81): (1730,88) warning FSHARP: This construct causes code to be less generic than indicated by the type annotations. The type variable 'Functor has been constrained to be type 'Functor'. (code 64)
./../../../tests/Rust/tests/src/ArithmeticTests.fs(867,5): (867,9) info FSHARP: The use of 'incr' from the F# library is deprecated. See https://aka.ms/fsharp-refcell-ops. For example, please change 'incr cell' to 'cell.Value <- cell.Value + 1'. (code 3370)
./../../../tests/Rust/tests/src/ArithmeticTests.fs(868,7): (868,9) info FSHARP: The use of ':=' from the F# library is deprecated. See https://aka.ms/fsharp-refc
...

Then is begins running a bunch of tests in rust:

.> cargo test
   Compiling fable_library_rust v0.1.0 (/workspaces/Fable/temp/fable-library-rust)
   Compiling fable_tests_rust v0.1.0 (/workspaces/Fable/temp/tests/Rust)
    Finished test [unoptimized + debuginfo] target(s) in 21.18s
     Running tests/src/main.rs (target/debug/deps/src-7c4fdba163c9ac55)

running 2004 tests
test module_112eed2::Fable::Tests::InteropTests::test_emitRustExpr_works_without_parameters ... ok
test module_112eed2::Fable::Tests::InteropTests::simple_float_op_sin_works ... ok
test module_112eed2::Fable::Tests::InteropTests::test_emitRustExpr_works_with_parameters ... ok
test module_112eed2::Fable::Tests::InteropTests::simple_mul_sub_works ... ok
test module_112eed2::Fable::Tests::InteropTests::simple_add_sub_works ... ok
...

Then it fails somewhere down the line on a test run:

test module_19f4eaa5::Fable::Tests::DateTimeOffsetTests::Unspecified ... ok
test module_19f4eaa5::Fable::Tests::DateTimeOffsetTests::UTC_003a_Convert_to_8_hours_behind_UTC ... ok
test module_19f4eaa5::Fable::Tests::DateTimeOffsetTests::UTC_003a_Convert_to_3_hours_ahead_of_UTC ... ok
test module_19f4eaa5::Fable::Tests::DateTimeOffsetTests::UTC_003a_Convert_to_3_hours_and_30_min_ahead_of_UTC ... ok
test module_19f4eaa5::Fable::Tests::DateTimeOffsetTests::DateTimeOffset_Ticks_does_not_care_about_offset ... ok
malloc(): unaligned tcache chunk detected
error: test failed, to rerun pass `--test src`
# Zach - this is the beginning of the error above 

It seems it is able to successfully begin running the tests for the following modules (in order top to bottom):

  • module_112eed2::Fable::Tests::InteropTests
  • module_144a6264::Fable::Tests::RecordTests
  • module_17a60d49::Fable::Tests::DateOnlyTests
  • module_183c9fe2::Fable::Tests::ResizeArrayTests
  • module_185ed530::Fable::Tests::ComparisonTests
  • module_19f4eaa5::Fable::Tests::DateTimeOffsetTests

I'm not doing anything other than opening the container in VSCode. Here is what I get when looking at image architecture and the etc folder from inside the container. Still learning Docker, so if there are other commands I can run to get more information, let me know and I can provide that information as well.

~/Code/Fable
☹  docker image inspect 915284ca04b4 --format '{{ .Os }}/{{ .Architecture }}'                                                                                                main bb09d1123
linux/arm64
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

@zprobinson
Copy link
Author

I also notice that nojaf has added diagnostics into the compilation process. I'm not sure right now how to activate that, but I can dive in and try to utilize that to get more information on what's going on.

@ncave
Copy link
Collaborator

ncave commented Feb 8, 2024

@zprobinson

  • What does uname -m print inside the container?
  • After you get to step 9 (right before the crying part), does it fail every time you run the tests from then on, or is it intermittent?

@zprobinson
Copy link
Author

zprobinson commented Feb 8, 2024

$ uname -m
aarch64

It's hard to see through the tears, but it seems like once I make it fail, I'm able to get a succeeding test on attempt 2 and then afterwards. Once I have a passing test run, I made some additional changes and the tests kept passing (as I would hope). Then performing a git restore to bring it back to current made the tests fail. And then running it again let the tests pass. Seems intermittent, but also seems like a pattern is occurring where the 2nd time fixes something?

I ran around a dozen times with about a 80% PASS rate.

@ncave
Copy link
Collaborator

ncave commented Feb 8, 2024

@zprobinson Right, that makes sense, it might not show up every time if it's a memory issue.
I've tried to reproduce it on the arm64 hardware I have (Raspberry Pi 4), but no dice, it works just fine. I wonder if it only happens inside of a container.

@zprobinson
Copy link
Author

zprobinson commented Feb 8, 2024

Did you mention that this issue occasionally pops up in the CI pipelines running on the github repo? The latest pipeline failure kind of seems to have an exception thrown that looks very similar. Would it be correct to assume that is ran in a container as well?

All else failing, how much for your ironclad raspberry pi 4? Sounds like its made out of magic :)

@ncave
Copy link
Collaborator

ncave commented Feb 8, 2024

@zprobinson

Did you mention that this issue occasionally pops up in the CI pipelines

Yes, happens all the time (and not just with the Rust jobs). Rerunning the failed jobs usually helps fix it.

Don't have a Mac unfortunately, so I'm gonna try running it inside a dev container on the Pi this time. Perhaps some virtualized environments are catching more memory errors? Just shooting in the dark at this point.

Update: Never mind, just found that it can fail running directly on the pi (arm64) too, and not in a container. If I run tests repeatedly, it works just fine. But if I run tests in release mode, then not in release mode, it fails. Not sure what it means, but some progress nonetheless. I guess the memory error just shows randomly if you run the tests repeatedly (no changes needed).

cd temp/tests/Rust
cargo test --release      // <-- some reference equality tests will fail but that's a different issue
cargo test                // <-- it will fail now

@ncave
Copy link
Collaborator

ncave commented Feb 8, 2024

@zprobinson Doesn't look entirely random. Seems to break most often on a few particular tests, which all have in common that they use F# module let bindings, which are implemented in a particular way for Rust, using static MutCell. Perhaps we need a more sound MutCell.get_or_init implementation on arm64 than on amd64.

I'll take a look when I have more time,
Thanks for reporting the issue.

@zprobinson
Copy link
Author

Thanks for taking a look into it! your intuition is invaluable. I appreciate your time spent troubleshooting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants