Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

translate the FPU instruction pointer #698

Open
derekbruening opened this issue Nov 28, 2014 · 6 comments
Open

translate the FPU instruction pointer #698

derekbruening opened this issue Nov 28, 2014 · 6 comments

Comments

@derekbruening
Copy link
Contributor

From [email protected] on March 13, 2012 00:01:59

the FPU status includes an instruction pointer selector and offset. DR today does not translate it from the code cache to the corresponding app address.

sample code from Jun Koi:

#include <stdio.h>

int main()
{
int edx, ecx;

   __asm__ __volatile__ (
                   "x: fldz;" \
                   "movl $x, &#37;&#37;ecx;" \
                   "push &#37;&#37;edx;" \
                   "fnstenv -0xc(&#37;&#37;esp);" \
                   "pop &#37;&#37;edx;" : "=d" (edx), "=c" (ecx) : : "memory");

   if (ecx == edx)
           printf("FNSTENV is correctly handled\n");
   else
           printf("FNSTENV is incorrectly handled: edx = &#37;x, ecx = &#37;x\n", edx, ecx);

   return 0;

}

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=698

@derekbruening
Copy link
Contributor Author

From [email protected] on March 13, 2012 07:07:06

Owner: [email protected]

@derekbruening
Copy link
Contributor Author

From [email protected] on March 13, 2012 08:16:05

Note that neither Valgrind (3.7.0) nor Pin (pin-2.10-41150-gcc.3.4.6-ia32_intel64-linux) do the right thing here

There are two possible solutions:

  1. On the final (or any prior to FPU save instr) non-control FPU instr in
    each bb, store the app pc into a TLS slot. On
    fnstenv/fnsave/fxsave/xsave/xsaveopt, replace the cache pc in the FPU
    instruction pointer slot in the memory save area with the stored app pc
    from the TLS slot.

    Since there are many non-control FPU instrs, the overhead here does not
    seem worthwhile.

    There are also implications for persistence here: we don't want
    absolute addresses all over the place. We should store the module
    offset instead, and not persist the save instrs: as fine-grained we'll
    put the module base in at use time.

  2. On every fnstenv/fnsave/fxsave/xsave/xsaveopt, do a full translation.
    This requires a clean call or exiting the cache so we'd end bbs at these
    instrs.

We also need to translate for proc_save_fpstate() where client may well
examine the app's FPU state. For clean calls that preserve fpstate,
perhaps we can argue that we aren't exposing the spot where we saved the
state and so we don't really need to translate and can leave the code as it
is (preserving the value across the clean call).

(If we did want to give clients access to the stored fpstate on clean
calls, we would then have all clean calls that save fpstate call a
wrapper routine in DR instead that does the translation and then calls
the client target.)

What about dr_insert_save_fpstate()? For solution #2, we'll need to do a
clean call or sthg to do the translation. Ugh.

Proposal: implement #1 and measure (on SPECFP especially). If overhead is
significant, or even just noticeable, put this under a runtime option off
by default.

@derekbruening
Copy link
Contributor Author

From [email protected] on July 10, 2013 07:23:49

Decided to go with soln #2 and bail on translation for proc_save_fpstate() and dr_insert_save_fpstate().

Forthcoming commit log has a good summary:

Fixes issue #698 Fixes issue #1199 Update saved floating-point PC values to point at original addresses rather
than the code cache.

Adds fnstenv, fnsave, fxsave, xsave, and xsaveopt to decode_cti().

Adds instr_saves_float_pc() and ends a bb at such an instr.  The instr is
mangled in one of two ways: if a non-control float instr is found earlier
in the bb (and the target mem is not rip-rel, and the state save is not
xsave* as that has an optional FPU component), inlined instru updates the
pc field in the saved state.  Else, we go back to dispatch (and the block
cannot be linked or be added to a trace: should be rare though) where we
translate the pc field.

Such blocks are marked as fine-grained, as any inlined pc update cannot be
persisted.

Adds a new client trace restriction: a client cannot remove the prior fp
instr in a trace with an fp save, as that will ruin the inlined update
strategy used here.

Does not translate the fp pc for proc_save_fpstate() or
dr_insert_save_fpstate().  Documents this.  We can try to solve this later
if a client ever needs it, whether for its own analysis or because it hands
the state to the app in some manner.

In instr_is_floating_ex(), moves OP_fnclex and OP_fninit to
DR_FPSTATE and adds OP_fwait to the same.

Adds a test in suite/tests/common/floatpc.c.

@derekbruening
Copy link
Contributor Author

From [email protected] on July 11, 2013 07:28:18

This issue was closed by revision r2165 .

Status: Fixed

@derekbruening
Copy link
Contributor Author

From [email protected] on July 16, 2013 15:25:33

Re-opening as we have hit a serious problem. On some distros, libm.so uses fnstenv/fldenv pairs in cos() and likely other routines. We see huge perf hits from fnstenv's trace barrier exiting and translating on the critical path of our own tests that call cos() in a loop, like linux.signal0000.

We may want to re-consider solution #1.

For a hacky soln: Can we somehow recognize the pattern of fnstenv used in libm and
distinguish from the pattern in selfmod apps? Sthg like: if we see fldenv,
assume that any fnstenv in that same module is for save+restore pairs and
doesn't look at the pc. Which assumes selfmod code doesn't
statically link libm. Or, disable xl8 in libm. What about fortran?

For the 4.1 release we may disable the implemented soln#2 under an off-by-default option (leaving the inlined-bb soln which may well cause many selfmod apps to still work).

Status: Accepted
Labels: Performance

@derekbruening
Copy link
Contributor Author

From [email protected] on July 16, 2013 15:44:59

Looking at libm asm code, a few patterns:

00007580 :
7580: 83 ec 24 sub $0x24,%esp
7583: 8b 44 24 28 mov 0x28(%esp),%eax
7587: e8 3a 00 00 00 call 75c6 <feclearexcept+0x46>
758c: 81 c1 74 aa 03 00 add $0x3aa74,%ecx
7592: d9 74 24 04 fnstenv 0x4(%esp)
7596: 83 e0 3d and $0x3d,%eax
7599: 89 c2 mov %eax,%edx
759b: 83 f2 3d xor $0x3d,%edx
759e: 66 21 54 24 08 and %dx,0x8(%esp)
75a3: d9 64 24 04 fldenv 0x4(%esp)

00007620 :
...
7656: d9 74 24 04 fnstenv 0x4(%esp)
765a: 66 83 4c 24 08 10 orw $0x10,0x8(%esp)
7660: d9 64 24 04 fldenv 0x4(%esp)

0000e150 : (similar in , )
e150: 56 push %esi
e151: 53 push %ebx
e152: 81 ec 94 00 00 00 sub $0x94,%esp
e158: dd 84 24 a0 00 00 00 fldl 0xa0(%esp)
e15f: dd 5c 24 28 fstpl 0x28(%esp)
e163: e8 38 66 ff ff call 47a0 <ABS@plt+0x250>
e168: 81 c3 98 3e 03 00 add $0x33e98,%ebx
e16e: d9 74 24 74 fnstenv 0x74(%esp)
e172: db e2 fnclex
e174: 0f b7 44 24 74 movzwl 0x74(%esp),%eax

000077c0 :
77c0: 8b 44 24 04 mov 0x4(%esp),%eax
77c4: d9 30 fnstenv (%eax)
77c6: d9 20 fldenv (%eax)

00007820 :
7820: 83 ec 20 sub $0x20,%esp
7823: 8b 44 24 24 mov 0x24(%esp),%eax
7827: d9 74 24 04 fnstenv 0x4(%esp)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant