Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drreg: register spilling mediation framework #511

Open
derekbruening opened this issue Nov 28, 2014 · 15 comments
Open

drreg: register spilling mediation framework #511

derekbruening opened this issue Nov 28, 2014 · 15 comments

Comments

@derekbruening
Copy link
Contributor

From [email protected] on July 10, 2011 16:52:53

we plan to build a "drreg" extension that provides register liveness
analysis and stealing. one thing we didn't plan to do but we may want to
is to add a feature to permanently steal a register. when only need access
to a few fields though, directly-addressable TLS should be more performant:
but when have a lot of fields a stolen register could be more efficient.

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=511

@derekbruening
Copy link
Contributor Author

From [email protected] on July 10, 2011 13:53:03

Summary: drreg: provide permanently-stolen reg support?

@derekbruening
Copy link
Contributor Author

From [email protected] on March 09, 2012 06:51:05

xref other extensions from the master extension discussion: drutil ( issue #295 ), drwrap ( issue #296 ), drmgr ( issue #402 ), drsyscall ( https://code.google.com/p/drmemory/issues/detail?id=822 ), drcallstack ( https://code.google.com/p/drmemory/issues/detail?id=823 ), drmalloc ( https://code.google.com/p/drmemory/issues/detail?id=824 ), umbra ( https://code.google.com/p/drmemory/issues/detail?id=825 )

@derekbruening
Copy link
Contributor Author

From [email protected] on May 11, 2012 11:44:53

*** INFO drmgr and drreg discussion :meeting_DrM_2012_1_25:

xref issue #164/PR 494720: add a post-instru pass that checks for errors in register preservation

drreg and drmgr discussion highlights:

  • drmgr may be complete enough but we need more examples of usage.
    integrating into drmemory with each module of drmem being a
    separate drmgr component will help.
  • maybe later make fancy virtual registers and map to physical in later pass
    instead virtual labels but mapped to physical up front
  • whole-bb for min # used
    pattern would be 0
  • local save+restore for extras
  • have drreg just be a reg picker and help track where
    values are: whether app is in real reg or spill or xchg
  • support steal reg across whole app?

*** TODO drreg framework :meeting_DrM_2012_5_11:

how does client tell us lifetime of values in scratch registers? so drreg
knows whether to let app insr clobber scratch reg or have to preserve
scratch reg value.
proposal: simple for now: either whole-bb reg is preserved across whole bb,
or its lifetime is just between app instrs. for locally-requested spills,
lifetime ends at next app instr.

API to request whole-bb reg during analysis phase:

  • uint flags = liveness, and whether going to use "mark used" flag
  • bitvector of which regs are acceptable
  • if call during instru, gives you local
  • eflags are treated like an extra whole-bb reg:
    updated in spill slot around

API to request app value to be restored (for OP_lea for shadow map, e.g.)

but want more than just pick best regs and keep app values in spill slots
for proper restoring: also want:

  • track who's using what scratch registers so different functions and
    modules of client don't conflict: but really this is an artifact
    of using whole-bb for regs used locally. if could instead have such uses
    use a local API and have the spills/restores and regs used be later
    optimized across whole bb, that would eliminate the conflict problem:
    but that's a lot more complicated for drreg to do post-passes and to
    restore state on a fault.
    for this tracking want a state kept for each reg: avail, holds value that
    needs to be preserved, etc., and then the liveness can be smaller
    granularity than whole bb
  • allow sharing values in scratch regs among components on client:
    so state can hold identifier or sthg?
    but can you search for which reg has identifier, or just use for assert?
  • lazy spilling: API routine to mark when used
    or can drreg analyze and infer which ones were used?

have -conservative option for whether to use dead regs (can turn on if
worried about fault or debugger examining)

for using Extensions:

  • for API calls into extension to insert instru, seems should always pass
    in which regs to use
  • for extension doing a drmgr pass on its own, seems better for extension
    to get reg from drreg directly and use it: but if multiple people call
    drreg for scratch reg can drreg safely give them all the same reg?
  • provide permanently-stolen reg support?

Summary: drreg: register spilling mediation framework
Labels: -Priority-Low Priority-Medium

@derekbruening
Copy link
Contributor Author

From [email protected] on June 14, 2012 10:31:15

xref issue #52 , issue #53

@derekbruening
Copy link
Contributor Author

From [email protected] on July 15, 2013 13:52:20

pasting more notes:

** TODO later thoughts

virtual regs would be by far the easiest to use. however, they require
a reg allocation pass that may need to build CFG => extra overhead that not
every tool will want to live with. so should there be 2 different interfaces?
one that uses virtual regs and one that doesn't?

drmem is not linear: for check_ignore_unaddr mem2mem, jmps down below and
then can jump back up to check_ignore_resume, so reg allocator would have
to build CFG and not just do linear scan. but, that should all go away w/
-replace_malloc, if want to assume that will be long-term soln.
update: actually it's not clear it can go away.

@derekbruening
Copy link
Contributor Author

** TODO revised interface: no up-front whole-bb, just simple reserve+unreserve interface

Originally I had these in drreg_options_t:

    /**
     * The number of scratch registers that need to be reserved across
     * more than one application instruction.  drreg turns these into
     * "whole-basic-block" scratch registers.
     */
    uint num_whole_bb;
    /**
     * Whether to spill the arithmetic flags across each basic block,
     * to minimize per-instruction spills and restores.
     */
    bool aflags_whole_bb;

The only reason to hardcode the number of whole-bb regs up front, and to
spill them at the very top of the bb and restore at the very bottom, is to
simplify state xl8:

/* Our state restoration model: Only whole-bb scratch regs and aflags
 * need to be restored (i.e., all local scratch regs are restored
 * before any app instr).  For each such reg or aflags, we guarantee
 * that either the app value is in TLS at each app instr (where fault
 * might happen) or the app value is dead and it's ok to have garbage
 * in TLS b/c the app will write it before reading (this is all modulo
 * the app's own fault handler going off on a different path (xref
 * DRi#400): so we're slightly risky here).
 */

From a pure user interface point of view, drreg would be much simpler if it
had no concept of global/whole-bb vs local, and instead for whichever regs
are still reserved in crossing an app instr (e.g., drmem's shadow xl8 reg),
drreg restores or updates if the app reads or writes.

Xref discussion above about the original interface of drreg just picking
whole-bb regs and the client having to parcel out who is using which when:
this simpler interface here is much nicer to use.

If too many regs are reserved across an app instr, we may run out of
places to store the values.

To handle fault xl8, we can decode the in-cache bb and walk forward
looking at spills + restores like DR xl8 does. We'll add
dr_is_tls_access() query to DR (looks for both raw and DR TLS).

Might still want "local" hints so drreg saves less-used-in-bb regs
for non-local requests? Plus, if > raw slots and into DR slots,
not supposed to use across app instrs, so complains if non-"local" for those?

Would any unreserved regs warrant keeping spilled across app instrs?
What is perf diff between lazy restore at next app use
vs keep spilled and update spill location at next app use?
Identical for app read I guess; app write is restore + real unreserve
vs re-spill to tls slot. So seems the answer is no: fixed # of whole-bb
regs is a perf cost.

@derekbruening
Copy link
Contributor Author

Xref #1771

@derekbruening
Copy link
Contributor Author

** TODO add lazy restore of aflags

consecutive drx_insert_counter_update() (after #1771 adds drxmgr test using drreg in the counter routine):

pre:

> bin32/drrun -loglevel 4 -c suite/tests/bin/libclient.drxmgr-test.dll.so -- suite/tests/bin/common.eflags
> head -2000 `ls -1td logs/*0|head -1`/l* | grep -A 20 ', tag'
Fragment 1, tag 0xf7774a70, flags 0x1000030, shared, size 70:
  0x49765004  9f                   lahf    -> %ah
  0x49765005  0f 90 c0             seto    -> %al
  0x49765008  64 a3 4c 00 00 00    mov    %eax -> %fs:0x4c[4byte]
  0x4976500e  81 05 08 20 77 f7 01 add    $0x00000001 0xf7772008[4byte] -> 0xf7772008[4byte]
              00 00 00
  0x49765018  64 a1 4c 00 00 00    mov    %fs:0x4c[4byte] -> %eax
  0x4976501e  04 7f                add    $0x7f %al -> %al
  0x49765020  9e                   sahf   %ah
  0x49765021  9f                   lahf    -> %ah
  0x49765022  0f 90 c0             seto    -> %al
  0x49765025  64 a3 4c 00 00 00    mov    %eax -> %fs:0x4c[4byte]
  0x4976502b  81 05 0c 20 77 f7 03 add    $0x00000003 0xf777200c[4byte] -> 0xf777200c[4byte]
              00 00 00
  0x49765035  64 a1 4c 00 00 00    mov    %fs:0x4c[4byte] -> %eax
  0x4976503b  04 7f                add    $0x7f %al -> %al
  0x4976503d  9e                   sahf   %ah
  0x4976503e  89 e0                mov    %esp -> %eax
  0x49765040  68 77 4a 77 f7       push   $0xf7774a77 %esp -> %esp 0xfffffffc(%esp)[4byte]
  0x49765045  e9 d2 ff 01 00       jmp    $0x4978501c

post:

> head -2000 `ls -1td logs/*0|head -1`/l* | grep -A 20 ', tag'
Fragment 1, tag 0xf7786a70, flags 0x1000030, shared, size 63:
  0xef7c1005  9f                   lahf    -> %ah
  0xef7c1006  0f 90 c0             seto    -> %al
  0xef7c1009  64 a3 4c 00 00 00    mov    %eax -> %fs:0x4c[4byte]
  0xef7c100f  81 05 08 40 78 f7 01 add    $0x00000001 0xf7784008[4byte] -> 0xf7784008[4byte]
              00 00 00
  0xef7c1019  81 05 0c 40 78 f7 03 add    $0x00000003 0xf778400c[4byte] -> 0xf778400c[4byte]
              00 00 00
  0xef7c1023  89 e0                mov    %esp -> %eax
  0xef7c1025  64 a3 50 00 00 00    mov    %eax -> %fs:0x50[4byte]
  0xef7c102b  64 a1 4c 00 00 00    mov    %fs:0x4c[4byte] -> %eax
  0xef7c1031  04 7f                add    $0x7f %al -> %al
  0xef7c1033  9e                   sahf   %ah
  0xef7c1034  64 a1 50 00 00 00    mov    %fs:0x50[4byte] -> %eax
  0xef7c103a  68 77 6a 78 f7       push   $0xf7786a77 %esp -> %esp 0xfffffffc(%esp)[4byte]
  0xef7c103f  e9 d8 ff 01 00       jmp    $0xef7e101c

Adding to the drreg test:

interp: start_pc = 0x08048f0b
  0x08048f0b  ba f4 f1 00 00       mov    $0x0000f1f4 -> %edx
  0x08048f10  ba f4 f1 00 00       mov    $0x0000f1f4 -> %edx
  0x08048f15  0f 95 c0             setnz   -> %al
        reads flag before writing it!
  0x08048f18  39 e2                cmp    %edx %esp
        wrote overflow flag before reading it!
  0x08048f1a  eb 00                jmp    $0x08048f1c
interp: direct jump at 0x08048f1a
end_pc = 0x08048f1c

instrument_basic_block ******************

before instrumentation:
TAG  0x08048f0b
 +0    L3              ba f4 f1 00 00       mov    $0x0000f1f4 -> %edx
 +5    L3              ba f4 f1 00 00       mov    $0x0000f1f4 -> %edx
 +10   L3              0f 95 c0             setnz   -> %al
 +13   L3              39 e2                cmp    %edx %esp
 +15   L3              eb 00                jmp    $0x08048f1c
END 0x08048f0b

drreg test #4
drreg test #4
drreg test #4
drreg_reserve_aflags @0x00000000: spilling aflags
drreg test #4
drreg_event_bb_insert_late @0x08048f15 aflags=0x8: lazily restoring aflags
drreg test #4
drreg_event_bb_insert_late @0x08048f18: re-spilling aflags after app write
drreg test #4
drreg_event_bb_insert_late @0x08048f1a aflags=0x11f: lazily restoring aflags

after instrumentation:
TAG  0x08048f0b
 +0    L3              ba f4 f1 00 00       mov    $0x0000f1f4 -> %edx
 +5    L3              ba f4 f1 00 00       mov    $0x0000f1f4 -> %edx
 +10   m4 @0xef85d31c  64 a3 50 00 00 00    mov    %eax -> %fs:0x00000050[4byte]
 +16   m4 @0xef7ca0a4  9f                   lahf    -> %ah
 +17   m4 @0xef85ed1c  64 a3 4c 00 00 00    mov    %eax -> %fs:0x0000004c[4byte]
 +23   m4 @0xef7c9604  64 a1 50 00 00 00    mov    %fs:0x00000050[4byte] -> %eax
 +29   m4 @0xef85df04  3d 00 00 00 00       cmp    %eax $0x00000000
 +34   m4 @0xef7cb9e4                       <label>
 +34   m4 @0xef85d4a4  64 a3 50 00 00 00    mov    %eax -> %fs:0x00000050[4byte]
 +40   m4 @0xef7cddf4  64 a1 4c 00 00 00    mov    %fs:0x0000004c[4byte] -> %eax
 +46   m4 @0xef85d0bc  9e                   sahf   %ah
 +47   m4 @0xef7c9538  64 a1 50 00 00 00    mov    %fs:0x00000050[4byte] -> %eax
 +53   L3              0f 95 c0             setnz   -> %al
 +56   m4 @0xef7ceb4c  64 a3 50 00 00 00    mov    %eax -> %fs:0x00000050[4byte]
 +62   m4 @0xef7c9fd8  9f                   lahf    -> %ah
 +63   m4 @0xef7cb37c  64 a3 4c 00 00 00    mov    %eax -> %fs:0x0000004c[4byte]
 +69   m4 @0xef7cd05c  64 a1 50 00 00 00    mov    %fs:0x00000050[4byte] -> %eax
 +75   L3              39 e2                cmp    %edx %esp
 +77   m4 @0xef7cc8f0  64 a3 50 00 00 00    mov    %eax -> %fs:0x00000050[4byte]
 +83   m4 @0xef7cce48  64 a1 4c 00 00 00    mov    %fs:0x0000004c[4byte] -> %eax
 +89   m4 @0xef85d5d4  04 7f                add    $0x7f %al -> %al
 +91   m4 @0xef7ce710  9e                   sahf   %ah
 +92   m4 @0xef85fd04  64 a1 50 00 00 00    mov    %fs:0x00000050[4byte] -> %eax
 +98   L3              eb 00                jmp    $0x08048f1c
END 0x08048f0b

@derekbruening
Copy link
Contributor Author

** TODO support use in shared gencode outside of bb event, and in instru2instru phase

DrMem's pattern mode wants to save aflags w/ liveness analysis in the
instru2instru phase (in pattern_instrument_repstr()).

Maybe for this pattern case we can add flags saving in insert phase and in
instru2instru just move flags code around? But that could mess up the
liveness assumptions.

Xref the non-drmgr API discussion about a parallel set of routines: drregi_*.
But can we avoid whole separate routines?
We can tell whether in drmgr insert phase w/o extra arg that breaks compat
-- but where is liveness info stored?

We could keep the same API signatures if we store liveness in pt, just like
we're doing w/ drmgr. We'd just add sthg like drreg_analyze_instrlist()?
But what about the current index? Store instr ptrs with live info and
search for instr? OTOH, should we not assume that the instrlist is static
since the analysis in instru2instru?

Or, we add nothing extra in the API, and liveness is computed on the spot
in each routine, if called not during the drmgr insert phase? We could at
least start w/ that, and add explicit storage via new API routine later as
an optimization w/o breaking compatibility.

Gencode support seems a subset: the user probably wants to spill while
appending, and thus there are no further instrs, and thus everything is
live.

@derekbruening
Copy link
Contributor Author

drreg is complete enough that we should be able to close this soon. With 00d39c7 in place, it now matches the efficiency of the hand-coded spilling in the samples and as part of #1273 I was able to convert all the remaining samples to use drreg. I'm going to wait until the Dr. Memory port to drreg is complete as a few issues remain there that might require additions or possibly changes to the interface.

d749f93 i#511 drreg: initial framework, liveness analysis, and aflags implementation
97393af i#511 drreg: register reservation
2883a1f i#511 drreg: register reservation: implement drreg_get_app_value
942fd5c i#511 drreg: support reserved regs across app instrs
4b16998 i#511 drreg: lazy restore of GPR regs
58e8582 i#511 drreg: export instr_is_reg_spill_or_restore()
bf99438 i#511 drreg: add fault handling
7e39d3d i#511 drreg: handle labels added by other components
998070a i#511 drreg: mark as experimental
ee3ad18 i#511 drreg: add vector convenience routines
513bcdb i#511 drreg: lazily restore aflags
1f4378c i#511 drreg: restore lazy aflags on a fault
1367094 i#511 drreg: add drreg_reservation_info()
83833a9 i#511 drreg: advance priorities outside of Dr. Memory ranges
7dd5b31 i#511 drreg: add error handling callback
71bbc5b i#511 drreg: add support for aflags preservation outside insert phase
8ee8d81 i#511 drreg: add support for register preservation outside insert phase
23f2abe i#511 drreg: add drreg_is_register_dead()
2469668 i#511 drreg: add drreg_reserve_dead_register()
ac8d95e i#1273 drmgr, i#511 drreg: convert countcalls to use drmgr + drreg
b2c9aea i#511 drreg: convert drcachesim to use drreg
00d39c7 i#511 drreg: keep aflags in eax where possible

@derekbruening
Copy link
Contributor Author

Adding a note here to remember to remove the "work in progress...interface may be in flux" from drreg.dox when closing this.

Another note: the drreg barrier needed when invoking things like dr_insert_mbr_instrumentation() needs to be better documented.

One annoying thing hit when converting the samples was that a sample not using drreg but using drx_insert_counter_increment and drmgr was forced to init drreg on its own. Perhaps we can solve this by having drx_init call drreg_init but in a "weak" way so that any pre-existing or later call overrides it?

@derekbruening
Copy link
Contributor Author

359f8ee i#511 drreg: document lazy restore "barriers"
583b2fc i#511 drreg: simplify usage by combining multiple inits

@derekbruening
Copy link
Contributor Author

Xref #1963

derekbruening added a commit that referenced this issue Apr 7, 2018
Adds support for several conflicts between aflags and xax on x86: failing
to reserve xax due to lazy aflags still residing in xax; failing to reserve
aflags if xax is taken; and failing to get the app aflags value if xax is
taken.  For the first one, we throw away the lazy aflags.  For the other
two, we reserve a temporary scratch register, xchg it with xax, and restore
it afterward.  We place aflags in TLS and do not try to keep it in a
register.

Issue: #511
derekbruening added a commit that referenced this issue Apr 7, 2018
Adds support for several conflicts between aflags and xax on x86: failing
to reserve xax due to lazy aflags still residing in xax; failing to reserve
aflags if xax is taken; and failing to get the app aflags value if xax is
taken.  For the first one, we throw away the lazy aflags.  For the other
two, we reserve a temporary scratch register, xchg it with xax, and restore
it afterward.  We place aflags in TLS and do not try to keep it in a
register.

Issue: #511
derekbruening added a commit that referenced this issue Apr 15, 2018
Adds two drreg features to help support separate control flow paths such as
a slowpath and a fastpath: drreg_reservation_info_ex(), which provides
information on registers which have been unreserved but not yet lazily
restored, and drreg_statelessly_restore_app_value() which restores app
state without changing drreg state, to retain parity with a separate path.

Adds some tests of the new features.

Issue: #511
derekbruening added a commit that referenced this issue Apr 15, 2018
Adds two drreg features to help support separate control flow paths such as
a slowpath and a fastpath: drreg_reservation_info_ex(), which provides
information on registers which have been unreserved but not yet lazily
restored, and drreg_statelessly_restore_app_value() which restores app
state without changing drreg state, to retain parity with a separate path.

Adds some tests of the new features.

Issue: #511
derekbruening added a commit that referenced this issue Apr 16, 2018
Adds a query routine to identify whether an instruction is a spill or
restore generated by drreg.

Adds a sanity check test.

Issue: #511
derekbruening added a commit that referenced this issue Apr 17, 2018
Adds a query routine to identify whether an instruction is a spill or
restore generated by drreg.

Removes over-zealous asserts about accessing another thread's DR spill slots.

Fixes failure to restore on a fault spills to DR spill slots, but ignores the 3rd DR spill
slot when restoring to avoid problems whose full solution is left to #2933.

Adds a sanity check test.

Issue: #511
hgreving2304 pushed a commit that referenced this issue Dec 8, 2018
…h register slot in drreg (#3301)

Fix drreg to properly recognize and ignore the 3rd DR slot if a spill is detected but not restore is found. This does not fix #2933 as it is still possible to break drreg with a client that is using DR spill slots.

Add test for above.

Issue: #2941, #2933, #511
@johnfxgalea
Copy link
Contributor

johnfxgalea commented Mar 29, 2019

Are there any more available notes on the support of virtual registers by any chance? In particular, I am wondering whether the allocation of physical registers would be done in the final phase related to Instrumentation-to-instrumentation transformations? Moreover, how would DynamoRIO reason over virtual registers in previous stages? I assume there needs to be some new class of special registers that the IR may handle?

@derekbruening
Copy link
Contributor Author

A virtual register feature was never implemented and never got far enough to have a detailed design. If you were interested in adding such a feature I would suggest filing a separate issue and writing some kind of design proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants