Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASSERT type_is_instr in multiple drcachesim tests (tool.reuse_distance,tool.drcachesim.warmup-zeros,tool.drcachesim.missfile) on appveyor #3320

Open
derekbruening opened this issue Dec 21, 2018 · 44 comments · Fixed by #3859

Comments

@derekbruening
Copy link
Contributor

The 32-bit version failed once:
https://ci.appveyor.com/project/DynamoRIO/dynamorio/builds/21162519

[00:09:36] test 213
[00:09:36]         Start 213: code_api|tool.reuse_distance
[00:09:36] 
[00:09:36] 213: Test command: C:\projects\dynamorio\build\build_debug-internal-32\bin32\drrun.exe "-s" "90" "-quiet" "-debug" "-use_dll" "C:/projects/dynamorio/build/build_debug-internal-32/lib32/debug/dynamorio.dll" "-exit0" "-stderr_mask" "0xC" "-msgbox_mask" "0" "-dumpcore_mask" "0x7d" "-staged" "-code_api" "-t" "drcachesim" "-ipc_name" "drtestpipe_reuse_distance" "-simulator_type" "reuse_distance" "-reuse_distance_threshold" "256" "--" "C:/projects/dynamorio/build/build_debug-internal-32/suite/tests/bin/simple_app.exe"
[00:09:36] 213: Test timeout computed to be: 600
[00:09:36] 213: Hello, world!
[00:09:36] 213: Assertion failed: type_is_instr(cur_ref.instr.type), file ..\..\clients\drcachesim\reader\reader.cpp, line 132
@derekbruening
Copy link
Contributor Author

Happened again:
https://ci.appveyor.com/project/DynamoRIO/dynamorio/builds/22313260

00:06:34] test 213
[00:06:34]         Start 213: code_api|tool.reuse_distance
[00:06:34] 
[00:06:34] 213: Test command: C:\projects\dynamorio\build\build_debug-internal-32\bin32\drrun.exe "-s" "90" "-quiet" "-debug" "-use_dll" "C:/projects/dynamorio/build/build_debug-internal-32/lib32/debug/dynamorio.dll" "-exit0" "-stderr_mask" "0xC" "-msgbox_mask" "0" "-dumpcore_mask" "0x7d" "-staged" "-code_api" "-t" "drcachesim" "-ipc_name" "drtestpipe_reuse_distance" "-simulator_type" "reuse_distance" "-reuse_distance_threshold" "256" "--" "C:/projects/dynamorio/build/build_debug-internal-32/suite/tests/bin/simple_app.exe"
[00:06:34] 213: Test timeout computed to be: 600
[00:06:34] 213: Hello, world!
[00:06:34] 213: Assertion failed: type_is_instr(cur_ref.instr.type), file ..\..\clients\drcachesim\reader\reader.cpp, line 132

@derekbruening
Copy link
Contributor Author

@derekbruening
Copy link
Contributor Author

This same assert just happened in code_api|tool.drcachesim.warmup-zeros
https://ci.appveyor.com/project/DynamoRIO/dynamorio/builds/23205784

00:20:18] Hello, world!
[00:20:18] Assertion failed: type_is_instr(cur_ref.instr.type), file ..\..\clients\drcachesim\reader\reader.cpp, line 132

@derekbruening
Copy link
Contributor Author

warmup-zeros failed again with that assert: https://ci.appveyor.com/project/DynamoRIO/dynamorio/builds/23571201

@derekbruening derekbruening changed the title tool.reuse_distance test failed once on appveyor ASSERT type_is_instr in multiple drcachesim tests (tool.reuse_distance,tool.drcachesim.warmup-zeros) on appveyor Apr 3, 2019
@derekbruening derekbruening changed the title ASSERT type_is_instr in multiple drcachesim tests (tool.reuse_distance,tool.drcachesim.warmup-zeros) on appveyor ASSERT type_is_instr in multiple drcachesim tests (tool.reuse_distance,tool.drcachesim.warmup-zeros,tool.drcachesim.missfile) on appveyor Apr 6, 2019
@derekbruening
Copy link
Contributor Author

Happened on the missfile test:
https://ci.appveyor.com/project/DynamoRIO/dynamorio/builds/23634836

@derekbruening
Copy link
Contributor Author

@derekbruening
Copy link
Contributor Author

@derekbruening
Copy link
Contributor Author

@hgreving2304
Copy link

@derekbruening
Copy link
Contributor Author

@derekbruening
Copy link
Contributor Author

Happened in tool.drcachesim.coherence with a timeout being reported: #3803.

@derekbruening
Copy link
Contributor Author

Could it be a problem with the queue on the reader side? I put a printout there and
no data is ever read from the queue here. Maybe that is not surprising: it's only
used with skipping.

To see if the buffers are messed up at writing time I added a check in
atomic_pipe_write() for a bundle type after a non-instr, non-bundle record: it loops
over the buffer and checks. No hit when the problem is reproduced: so it seems that
the buffer manipulation has no error and it's somewhere between the write to the
named pipe and the read that messes up. The ipc_reader_t + reader_t layers are
pretty thin: I ruled out the queue already, which doesn't leave much besides the
kernel as the culprit.

Yet, would the original Windows instances be the same underlying culprit if
this were some kernel issue? That seems unlikely.

I tried to repro on an x86 machine w/ a different kernel: it reproduces there too.
Yet we've never seen it in the x86 GA CI VM's (have we?)

@derekbruening
Copy link
Contributor Author

derekbruening commented Apr 1, 2024

The error on a marker with the 80K-atomic setting where we don't split the buffers looks like this:

ZZ Invalid trace entry type marker (28 tid=2654310 vs cur tid=2654310) before a bundle
ZZ #0: 17 2 0x91734aa800000404
ZZ #1: 10 4 0xffff91734b1c
ZZ #2: 1 8 0xffff8de18640
ZZ #3: 10 4 0xffff91734b20
ZZ #4: 0 4 0xffff90f23998
ZZ #5: 10 4 0xffff91734b24
ZZ #6: 10 4 0xffff91734b28
ZZ #7: 1 4 0xffff90f23998
ZZ #8: 10 4 0xffff91734b2c
ZZ #9: 0 4 0xffff90f23998
ZZ #10: 10 4 0xffff91734b30
ZZ #11: 0 4 0xffff90f2399c
ZZ #12: 10 4 0xffff91734b34
ZZ #13: 13 4 0xffff91734b38
ZZ #14: 28 2 0x2f73a4aeda72b6
ZZ #15: 28 3 0x9
ZZ next: 17 1 0x91734b4c00000004    91734b4c is the tag for this bundle
ZZ next: 10 4 0xffff91734b9c
ZZ next: 0 8 0xffff8de10d98
ZZ next: 10 4 0xffff91734ba0
ZZ next: 17 2 0x91734b4c00000404    91734b4c again so further in same block
ZZ next: 17 2 0x91734b4c00000404    2nd part of bundle
ZZ next: 10 4 0xffff91734bac
ZZ next: 0 8 0xffff91746038
ZZ next: 10 4 0xffff91734bb0
ZZ next: 0 4 0xffff9072299c
ZZ next: 10 4 0xffff91734bb4
ZZ next: 17 2 0x91734b4c00000404    91734b4c again so yet further in same block
ZZ next: 17 2 0x91734b4c00000404    2nd part of bundle
ZZ next: 10 4 0xffff91734bc0
ZZ next: 1 8 0xffff8de18638
ZZ next: 10 4 0xffff91734bc4
ZZ next: 0 4 0xffff90722998
ZZ next: 10 4 0xffff91734bc8
ZZ next: 10 4 0xffff91734bcc
ZZ next: 1 4 0xffff90722998
ZZ next: 10 4 0xffff91734bd0
ZZ next: 0 4 0xffff90722998
ZZ next: 10 4 0xffff91734bd4
ZZ next: 0 4 0xffff9072296c
ZZ next: 10 4 0xffff91734bd8
ZZ next: 13 4 0xffff91734bdc        cond branch, presmably backward
ZZ next: 10 4 0xffff91734b4c        the real 91734b4c block start
ZZ next: 17 1 0x91734b4c00000004    the real 91734b4c bundle
ZZ next: 10 4 0xffff91734b54
ZZ next: 0 8 0xffff91746038
ZZ next: 10 4 0xffff91734b58        presumably straight-line to the 4b9c up above
ZZ next: 0 4 0xffff9072299c
ZZ pipe_.read() => 196608           hit end of per-record print; this is later; so
ZZ pipe_.read() => 196608             records above were already in the buffer

This has the atomic_pipe_write() check in place and it didn't write anything out
complaining about a misplaced bundle when writing to the pipe: so this was not the
actual record sequence written.

So the "bad" bundle seems to be part of a sequence that is somehow misplaced: the data is in the wrong place (which matches the thread exit case), or data in between is missing. The large size without splits might have races but these tests that fail are often single-threaded.

@derekbruening derekbruening removed their assignment Apr 4, 2024
@derekbruening
Copy link
Contributor Author

derekbruening commented Oct 3, 2024

Happened on riscv reuse-distance: Invalid trace entry type thread_exit (23) before a bundle https://github.com/DynamoRIO/dynamorio/actions/runs/11171065544/job/31055108423?pr=7018

@xdje42
Copy link
Contributor

xdje42 commented Oct 4, 2024

Data point: Another failure, in master:

====> FAILURE in release-external-64 <====
release-external-64: 116 tests passed, **** 1 tests failed, but ignoring 0 for i#5365: ****
	code_api|tool.drcachesim.TLB-threads

https://github.com/DynamoRIO/dynamorio/actions/runs/11185116691/job/31097308404

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants