Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tool.drcacheoff.simple fails on some x86-32-ubuntu22 VMs #6416

Closed
abhinav92003 opened this issue Nov 7, 2023 · 2 comments · Fixed by #6462
Closed

tool.drcacheoff.simple fails on some x86-32-ubuntu22 VMs #6416

abhinav92003 opened this issue Nov 7, 2023 · 2 comments · Fixed by #6462
Assignees

Comments

@abhinav92003
Copy link
Contributor

The tool.drcacheoff.simple test is flaky on the x86-32-ubuntu22 workflow. I was able to reproduce it in a Ubuntu-22-04 VM but not always. I encountered this first on #6408, none of other recent PRs seem to have this failure but I was able to reproduce this on the master branch on a VM.

Details of the VM where I was able to reproduce it:

abhinav92003@instance-1:~/dr/build/dr1$ uname -a
Linux instance-1 6.2.0-1018-gcp #20~22.04.1-Ubuntu SMP Mon Oct 23 12:29:43 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
abhinav92003@instance-1:~/dr/build/dr1$ ldd --version
ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35

Even on the master branch it seems to fail:

test 302
    Start 302: code_api|tool.drcacheoff.simple

302: Test command: /usr/bin/cmake "-D" "precmd=foreach@/usr/bin/cmake@-E@[email protected].*.dir" "-D" "cmd=/home/abhinav92003/dr/build/master_build/bin32/drrun@-s@90@-quiet@-debug@-killpg@-stderr_mask@0xC@-dumpcore_mask@0@-code_api@-t@drcachesim@-offline@[email protected]@--@/home/abhinav92003/dr/build/master_build/suite/tests/bin/simple_app" "-D" "postcmd=firstglob@/home/abhinav92003/dr/build/master_build/clients/bin32/drcachesim@[email protected].*.dir@-chunk_instr_count@10K" "-D" "postcmd2=" "-D" "postcmd3=" "-D" "failok=" "-D" "cmp=/home/abhinav92003/dr/build/master_build/suite/tests/offline-simple.expect" "-D" "code=" "-D" "capture=" "-D" "ignore_matching_lines=" "-P" "/home/abhinav92003/dr/src/dr1/suite/tests/runmulti.cmake"
302: Test timeout computed to be: 90
302: Running cmd |/home/abhinav92003/dr/build/master_build/bin32/drrun;-s;90;-quiet;-debug;-killpg;-stderr_mask;0xC;-dumpcore_mask;0;-code_api;-t;drcachesim;-offline;-subdir_prefix;tool.drcacheoff.simple;--;/home/abhinav92003/dr/build/master_build/suite/tests/bin/simple_app|
302: Running postcmd |/home/abhinav92003/dr/build/master_build/clients/bin32/drcachesim;-indir;/home/abhinav92003/dr/build/master_build/suite/tests/tool.drcacheoff.simple.simple_app.62961.6195.dir;-chunk_instr_count;10K|
302: CMake Error at /home/abhinav92003/dr/src/dr1/suite/tests/runmulti.cmake:111 (message):
302:   *** postcmd failed (1): ERROR: failed to initialize analyzer: raw2trace
302:   failed: Failed to process file for thread 62961: invalid cti
302: 
302:   ***
302: 
302: Call Stack (most recent call first):
302:   /home/abhinav92003/dr/src/dr1/suite/tests/runmulti.cmake:123 (process_cmdline)
302: 
302: 
1/1 Test #302: code_api|tool.drcacheoff.simple ...***Failed    1.41 sec
@derekbruening
Copy link
Contributor

@brettcoon reproduced on an AMD machine and got the log:

  0xf7f2c585  0f 05                syscall  -> %ecx
[drmemtrace]: Appended encoding entry for 0x5796f7b5 sz=2 0x0000050f...
[drmemtrace]: Appended instr fetch for original 0xf7f2c585
[drmemtrace]: Chunk instr count is now 0
[drmemtrace]: Thread 321721 timestamp 0x002f68c0aab7fd88
[drmemtrace]: Chunk instr count is now 0
[drmemtrace]: Appended marker type 3 value 0x11
[drmemtrace]: Chunk instr count is now 0
[drmemtrace]: Appended marker type 25 value 0xbf
[drmemtrace]: Chunk instr count is now 0
[drmemtrace]: Thread 321721 timestamp 0x002f68c0aab7fd8d
[drmemtrace]: Chunk instr count is now 0
[drmemtrace]: Appended marker type 3 value 0x11
[drmemtrace]: Chunk instr count is now 0
[drmemtrace]: Appending 4 instrs in bb 0x5796f7b9 in mod 16 +0x589 = [vdso]
  0xf7f2c589  e9 b4 24 64 54       jmp    $0x4c56ea42
[drmemtrace]: Worker 0 hit error Failed to process file for thread 321721: invalid cti on trace thread 0
ERROR: Conversion failed: Failed to process file for thread 321721: invalid cti

My response:

Oh it's the vsyscall, which DR has to hook on 32-bit AMD

That jmp is the trampoline

Xref #6417 where the kill(SIGSEGV) failures in all the signal and other tests are also from the AMD 32-bit vsyscall trampoline.

That jmp encoding is from the stored vdso bytes, which preserved the hook (the actual tracing did not see the trampoline b/c DR hides it). The question is: if we throw away the special vdso storage in the modules log and replace it with per-block encodings (treat as JIT), we'll get the DR-provided view that doesn't see the trampoline, right?

I remember thinking we might want to do that anyway; there is probably a TODO or an issue mentioning it. Xref #2062.

@derekbruening
Copy link
Contributor

OK I imlemented the encodings for vdso and removed the raw bytes.
For this app there are just 2 blocks:

record_instr_encodings: new block id 0 for 0xf7f88580
record_instr_encodings: Recorded 27 bytes for id 0 @ 0xf7f88580
record_instr_encodings: new block id 1 for 0xf7f88589
record_instr_encodings: Recorded 24 bytes for id 1 @ 0xf7f88589

And the module file entry for vdso has no more binary data:

 15,  12, 0xf7f93000, 0xf7f96000, 0xf7f7c120, 0000000000030bc0, 0x00031000, v#1,0, /usr/lib/i386-linux-gnu/ld-linux.so.2
 16,  16, 0xf7f9b000, 0xf7f9d000, 0xf7f9b580, 0000000000000000, 0x00000000, v#1,0, [vdso]
 17,  17, 0xf7000000, 0xf7022000, 0xf7023a00, 0000000000000000, 0x00000000, v#1,0, /usr/lib/i386-linux-gnu/libc.so.6

derekbruening added a commit that referenced this issue Nov 17, 2023
Removes the vdso raw bytes we were storing in the module file for
offline drmemtraces.  Switches to using per-block encodings instead.
This avoids problems with hooked vsysenter on 32-bit AMD.

Tested on tool.drcacheoff.simple on 32-bit AMD on a machine where that
test failed every time before this fix.

Removes the unused offline_instru_t::get_modoffs() rather than
updating it for the vdso change.

Fixes #6416
derekbruening added a commit that referenced this issue Nov 17, 2023
Removes the vdso raw bytes we were storing in the module file for
offline drmemtraces. Switches to using per-block encodings instead. This
avoids problems with hooked vsysenter on 32-bit AMD.

Tested on tool.drcacheoff.simple on 32-bit AMD on a machine where that
test failed every time before this fix.

Removes the unused offline_instru_t::get_modoffs() rather than updating
it for the vdso change.

Issue: #6416, #2062
Fixes #6416
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants