-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRASH: tool pc-rel instru mangled by DR can be mistaken as clean call instru and mis-translated #5786
Labels
Comments
derekbruening
added a commit
that referenced
this issue
Dec 15, 2022
Adds new labels delimiting clean call sequences. Converts into a translation record flag when storing translations. Uses the new labels and flag to precisely identify clean call mangling, replacing the previous scheme which incorrectly thought mangled tool pc-relative was a clean call, resulting in incorrect translations and crashes. Adds a test case to api.detach_state by adding a client (by converting it to use static DR) which inserts a pc-relative load. This reproduces the crash on detach, and is fixed with this fix. The added instrumentation caused periodic detach failures which were solved by setting the translation and adding a restore-state event: i#4232 covers trying to improve the situation. Issue: #5786, #4232 Fixes #5786
derekbruening
added a commit
that referenced
this issue
Jan 14, 2023
Adds new labels delimiting clean call sequences. Converts into a translation record flag when storing translations. Uses the new labels and flag to precisely identify clean call mangling, replacing the previous scheme which incorrectly thought mangled tool pc-relative was a clean call, resulting in incorrect translations and crashes. Adds a test case to api.detach_state by adding a client (by converting it to use static DR) which inserts a pc-relative load. This reproduces the crash on detach, and is fixed with this fix. The added instrumentation caused periodic detach failures which were solved by setting the translation and adding a restore-state event: i#4232 covers trying to improve the situation. Adds a new instr_t.offset field. Stops using instr_t.note to hold encoding offsets for pc-releative operands. Adds a new field instr_t.offset which is used for this purpose. This leaves note values in place across encodings, which is needed for new clean call marking labels and also simplifies rseq handling code. This instr_t field is a compatibility break and we bump the version and OLDEST_COMPATIBLE_VERSION here to 990. Updates dr_get_note docs. Augments logging of xl8 info with new flag info. Reduces DR_NOTE_FIRST_RESERVED to give DR more reserved labels. This is another compatibility break, while at it. Fixes several issues hit in tests that happened to trigger on the heap bucket size and other changes: + Fixes a rank order violation at loglevel 5: xref #1649 + Writes real xstate_bv into signal frame when setting the xstate context to avoid lazy AVX restore problems. + Tweaks the thread_churn test to work around non-linearities. Issue: #5786, #4232 Fixes #5786
dolanzhao
pushed a commit
that referenced
this issue
Jan 30, 2023
Adds new labels delimiting clean call sequences. Converts into a translation record flag when storing translations. Uses the new labels and flag to precisely identify clean call mangling, replacing the previous scheme which incorrectly thought mangled tool pc-relative was a clean call, resulting in incorrect translations and crashes. Adds a test case to api.detach_state by adding a client (by converting it to use static DR) which inserts a pc-relative load. This reproduces the crash on detach, and is fixed with this fix. The added instrumentation caused periodic detach failures which were solved by setting the translation and adding a restore-state event: i#4232 covers trying to improve the situation. Adds a new instr_t.offset field. Stops using instr_t.note to hold encoding offsets for pc-releative operands. Adds a new field instr_t.offset which is used for this purpose. This leaves note values in place across encodings, which is needed for new clean call marking labels and also simplifies rseq handling code. This instr_t field is a compatibility break and we bump the version and OLDEST_COMPATIBLE_VERSION here to 990. Updates dr_get_note docs. Augments logging of xl8 info with new flag info. Reduces DR_NOTE_FIRST_RESERVED to give DR more reserved labels. This is another compatibility break, while at it. Fixes several issues hit in tests that happened to trigger on the heap bucket size and other changes: + Fixes a rank order violation at loglevel 5: xref #1649 + Writes real xstate_bv into signal frame when setting the xstate context to avoid lazy AVX restore problems. + Tweaks the thread_churn test to work around non-linearities. Issue: #5786, #4232 Fixes #5786
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We are seeing crashes when using drbbdup on x86 on detach. The drbbdup state restore function seems perfectly correct: it is DR that is messing things up. We finally tracked it down to DR thinking that drbbdup's case load instructions are in fact a clean call setup. DR then tries to restore the context from the dstack and gets incorrect values.
This is with drmemtrace where the drbbdup case operand is a pc-relative load. DR has a feature where it will mangle tool-inserted pc-relative references into two-part sequences if they won't reach in their code cache destination. Thus we end up with something like this (this is with no tool instrumentation aside from drbbdup, so we just have 2 copies of this simple cmp;jnz app block) where diagnostics are marking 3 instrs considered "our_mangling":
The first 2 of those are the 2-part case load.
The current translation code has no precise way to identify clean calls and instead it identifies everything else and says that anything in "our_mangling" with no translation must then be a clean call. But DR mangling tool code breaks this assumption.
We do not want to not mark DR-mangled tool code as not our_mangling as there can be spills and restores in there and we want DR to properly restore those.
We need to either precisely identify mangle_rel_addr() sequences, or precisely identify clean call sequences. Both are complex. Since the pc-rel doesn't need any action beyond regular spill tracking, it seems better to identify clean calls. There are plenty of bits in translation_info_t, but not in instr_t: so I'm thinking of using labels for the instrlist case, and flags for the stored-info case.
The text was updated successfully, but these errors were encountered: