Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perblench full mode diffmail run has 3x perf regression #1186

Closed
derekbruening opened this issue Nov 28, 2014 · 4 comments
Closed

perblench full mode diffmail run has 3x perf regression #1186

derekbruening opened this issue Nov 28, 2014 · 4 comments

Comments

@derekbruening
Copy link
Contributor

From [email protected] on April 15, 2013 16:03:10

after some replace_malloc optimizations, perlbench looked good on my home
machine. but on my work machine its diffmail run was still 3x slower than
it should be, leading 36x overall: and this happens w/o -replace_malloc as
well.

2nd run better than it was (see issue #1183) but still 3x slower than expected!
Workload elapsed time (0:1) = 4494.370856 seconds
Workload elapsed time (0:2) = 6564.70585 seconds
Workload elapsed time (0:3) = 2203.749973 seconds
Reported: 13262 826799000 13262.826799
(gdb) p 13262/386.
$1 = 34.357512953367873

need to profile it. why so different from ancalagon?

-prof_pcs:
ITIMER distribution (658769):
1.3% of time in INTERPRETER (8539)
0.1% of time in DISPATCH (838)
0.0% of time in SYSCALL HANDLER (10)
1.9% of time in INDIRECT BRANCH LOOKUP (12310)
16.5% of time in FRAGMENT CACHE (108429)
80.2% of time in UNKNOWN (528643)
RES-pcsamples.0.7528.html
1938 shadow_set_range
2629 replace_alloc_common
2739 replace_free_common
3019 safe_read
4433 packed_callstack_record
5543 ??
6088 is_retaddr.part.5
15813 __i686.get_pc_thunk.bx
18620 rb_in_node
54468 is_dword_defined
100657 shadow_get_dword
112720 find_next_fp

differences vs glaurung:
gcc 4.6.3 vs 4.7.2
spec 1.2 vs 1.1
"-O2 -fno-strict-aliasing -m32" vs "-m32 -O2"
I checked perlbench sources: identical for 1.1 vs 1.2

maybe more fpo or sthg?

but why doesn't this show up in -light?

no suppressions used, -delay_frees_stack is off: must just be malloc
callstacks, which should be there for -light, right?

ancalagon debug run:
2483.38user 6.75system 41:38.86elapsed 99%CPU (0avgtext+0avgdata 429992maxresident)k
app mallocs: 278012393, frees: 277917410, large mallocs: 10026
unique malloc stacks: 11,273,296
callstack fp scans: 277,754,787
callstack is_retaddr: 704881223, backdecode: 1389036002, unreadable: 0

goobuntu debug run:
11462.56user 9.37system 3:11:47elapsed 99%CPU (0avgtext+0avgdata 1658896maxresident)k
app mallocs: 278011932, frees: 277917207, large mallocs: 10026
unique malloc stacks: 11,273,318
callstack fp scans: 2,499,318,434
callstack is_retaddr: 3868333682, backdecode: 3511997706, unreadable: 0

running the goobuntu perlbench exe on ancalagon it is slow there too.
snapshot:
(gdb) bt
#0 0x7387bbb9 in shadow_get_dword (addr=0xffdd64d8 "") at /work/drmemory/git/src/drmemory/shadow.c:429
#1 0x738f4f54 in is_dword_defined (addr=0xffdd64d8 "") at /work/drmemory/git/src/drmemory/report.c:1183
#2 0x738d1c46 in find_next_fp (pt=0x4a6f5e7c, fp=0xffdd6180 "\020\245\004\b\t", top_frame=0 '\000', retaddr=0x0)
at /work/drmemory/git/src/common/callstack.c:981
#3 0x738d38b1 in print_callstack (buf=0x0, bufsz=0, sofar=0x0, mc=0xffdd5d5c, print_fps=0 '\000', pcs=0x4a688e00,
num_frames_printed=1, for_log=0 '\000') at /work/drmemory/git/src/common/callstack.c:1204
#4 0x738d550c in packed_callstack_record (pcs_out=0xffdd5be0, mc=0xffdd5d5c, loc=0xffdd5bd0)
at /work/drmemory/git/src/common/callstack.c:1384
#5 0x738de165 in get_shared_callstack (existing_data=0x0, mc=0xffdd5d5c, post_call=0x738c141b "U\211\345VS\201\354\220\001")
at /work/drmemory/git/src/drmemory/alloc_drmem.c:389
#6 0x738de9ef in client_add_malloc_pre (start=0xbcc0a78 "", end=0xbcc0a7a "", real_end=0xbcc0a88 "L%\353J\360\017",
existing_data=0x0, mc=0xffdd5d5c, post_call=0x738c141b "U\211\345VS\201\354\220\001")
at /work/drmemory/git/src/drmemory/alloc_drmem.c:435
#7 0x738b647b in notify_client_alloc (call_handle=1 '\001', drcontext=0x4a6a8100, ptr=0xbcc0a78 "", head=0xbcc0a68, mc=0xffdd5d5c,
zeroed=0 '\000', realloc=0 '\000', caller=0x738c141b "U\211\345VS\201\354\220\001")
at /work/drmemory/git/src/common/alloc_replace.c:465
#8 0x738be8ef in replace_alloc_common (arena=0x954d000, request_size=2, synch=1 '\001', zeroed=0 '\000', realloc=0 '\000',
invoke_client=1 '\001', drcontext=0x4a6a8100, mc=0xffdd5d5c, caller=0x738c141b "U\211\345VS\201\354\220\001", alloc_type=4)
at /work/drmemory/git/src/common/alloc_replace.c:1319
#9 0x738c15cf in replace_malloc (size=2) at /work/drmemory/git/src/common/alloc_replace.c:1784
#10 0x080f7731 in Perl_safesysmalloc ()
#11 0x080ce68e in Perl_sv_grow ()
#12 0x080dad5c in Perl_sv_setsv_flags ()
#13 0x080db5ab in Perl_newSVsv ()
#14 0x080a1428 in Perl_pp_aassign ()
#15 0x080cb4dd in Perl_runops_standard ()
#16 0x0807da46 in perl_run ()
#17 0x0804a5b4 in main ()

(gdb) info local
tos = 0xffdd6180 "\020\245\004\b\t"
slot1 = 0x0
match = 0 '\000'
fp_defined = 0 '\000'
sp = 0xffdd64d8 ""
match_next_frame = 0 '\000'
buf_pg = 0xffdd6000 "(\335\377" slot0 = 0x0 ret_offs = 0 stop = 0xffdd6980 "" page_buf = 0x4a7f05f4 "(\335\377"
(gdb) up 1
(gdb) info local
drcontext = 0x4a6a8100
pt = 0x4a6f5e7c
num = 9
len = 0
pc = 0xffdd6178
prev_sofar = 0
appdata = {
next_fp = 0x0,
retaddr = 0x804a5e5 "\364\220\220\220\220\220\220\220\220\220\220U\211\345S\203\354\004\200=d\275\024\b"
}
custom_retaddr = 0x0
lowest_frame = 0xffdd6178 ""
first_iter = 0 '\000'
have_appdata = 0 '\000'
scanned = 1 '\001'
last_frame = 0 '\000'
(gdb) p *pt
$6 = {
errbuf = 0x4a7ecb0c "",
errbufsz = 15074,
page_buf = 0x4a7f05f4 "(`\335\377",
stack_lowest_frame = 0xffdd72fb "diffmail.pl"
}
(gdb) p *pcs
$7 = {
refcount = 1,
num_frames = 9,
is_packed = -1 '\377',
first_is_retaddr = 0 '\000',
first_is_syscall = 0 '\000',
frames = {
packed = 0x4ae70e74,
full = 0x4ae70e74
}
}
(gdb) p pcs->frames.packed[6].loc.addr
$15 = (app_pc) 0x80cb4dd "\205\300\243\204\310\024\bu\352\306\005\f\303\024\b"
(gdb) p pcs->frames.packed[7].loc.addr
$16 = (app_pc) 0x43043635 "\211\004$\350\363\206\001"
(gdb) p pcs->frames.packed[8].loc.addr
$17 = (app_pc) 0x804a5e5 "\364\220\220\220\220\220\220\220\220\220\220U\211\345S\203\354\004\200=d\275\024\b"
(gdb) x/2i 0x43043635 -2
0x43043633 <__libc_start_main+243>: call *%edx
(gdb) x/2i 0x0804a5e5-5
0x804a5e0 <_start+28>: call 0x8049700 __libc_start_main@plt

(gdb) p /x mc->xsp
$4 = 0xffdd5d28
(gdb) p /x mc->xbp
$5 = 0xffdd5eb8
(gdb) x/400wx mc->xsp
0xffdd5d28: 0x00000001 0x00000000 0x00000000 0x00000001
0xffdd5d38: 0x4a6a8100 0xffdd5d5c 0x738c141b 0x00000004
0xffdd5d48: 0x4a7b4918 0xffdd5f5c 0xffde0000 0x739b5c00
0xffdd5d58: 0x00000000 0x00000148 0x00000003 0x00000002
0xffdd5d68: 0x080cfc60 0xffdd5eb8 0xffdd5d28 0x09ce98b0
0xffdd5d78: 0xffdd5d98 0x738e28b8 0xffdd5f58 0xffdd5f5c
0xffdd5d88: 0x080cfc60 0xffdd5f5c 0xffdd5e00 0x739b5c00
0xffdd5d98: 0xffdd5dd8 0x00000000 0x00000000 0xffdd5f5c
0xffdd5da8: 0xffdd5d01 0x00000000 0x00dd5f58 0xffdd5f5c
0xffdd5db8: 0x00000001 0x00000000 0x00000000 0xfffffc18
0xffdd5dc8: 0xffdd5f5c 0xffdd5f5c 0x00000000 0x739b5c00
0xffdd5dd8: 0xffdd5f58 0x738c1e92 0x4a6a8100 0x00000000
0xffdd5de8: 0x00000001 0x00000001 0x4a6a8100 0xffdd5e00
0xffdd5df8: 0x738c1cdb 0x00000004 0x00000148 0x00000003
0xffdd5e08: 0x09782417 0x080c8418 0xffdd5f58 0xffdd5de8
0xffdd5e18: 0x00000007 0x0995115b 0x09951166 0x00000000
0xffdd5e28: 0x00000001 0x080c7929 0x00000056 0xf29da2bc
0xffdd5e38: 0x09acfbf8 0x95007f00 0x0971db74 0x0971db30
0xffdd5e48: 0x00000050 0xf70c44c5 0xf70a94f4 0x098a3ff6
0xffdd5e58: 0xf70a94e6 0xf71619d8 0xf70f98bb 0x4a6a8100
0xffdd5e68: 0x00000000 0x080c6f07 0xf70f6284 0xf70f6284
0xffdd5e78: 0x739b5c00 0x739b5c00 0x73929300 0x4a6a8100
0xffdd5e88: 0x00000000 0x7392b006 0xf70f6284 0xf70f6284
0xffdd5e98: 0x7392b006 0x00000000 0x4a6a8100 0x4a6a8100
0xffdd5ea8: 0x0954d000 0x4a6a8100 0x09b85314 0x00000000
0xffdd5eb8: 0x00000001 0x080f7731 0x00000002 0x09951166 <===
0xffdd5ec8: 0x097965c4 0x00000000 0x00000000 0x09b85314
0xffdd5ed8: 0x00000000 0x080ce68e 0x00000002 0x00000057 <===
0xffdd5ee8: 0x09796cd0 0x09b85314 0x09b85314 0x09b50474
0xffdd5ef8: 0x04040804 0x080dad5c 0x09b85314 0x00000002 <===
0xffdd5f08: 0x00000000 0x09bba0a4 0xf70f6284 0xf70f6284
0xffdd5f18: 0x739b5c00 0x09b699ec 0x09951166 0x09796c70
0xffdd5f28: 0x09951110 0x00000000 0x09b699ec 0x00000012
0xffdd5f38: 0x00000001 0x09951165 0x09951166 0x09796c70
0xffdd5f48: 0x09951165 0x09b50474 0x09b85314 0x09b6956c
0xffdd5f58: 0x0000000d 0x080db5ab 0x09b85314 0x09b50474 <===
0xffdd5f68: 0x000000...

Original issue: http://code.google.com/p/drmemory/issues/detail?id=1186

@derekbruening
Copy link
Contributor Author

From [email protected] on April 16, 2013 08:03:01

decoding forward from _start to find bottom retaddr is an improvement, from
2.4B to 1.6B, but still way too high due to scans for intermediate frames
(tackling that next):

3359.52user 5.47system 56:14.44elapsed 99%CPU (0avgtext+0avgdata 425808maxresident)k
app mallocs: 278012393, frees: 277917410, large mallocs: 10026
unique malloc stacks: 11,273,564
callstack fp scans: 1,664,186,682
callstack is_retaddr: 1264594632, backdecode: 3357420230, unreadable: 0

**** TODO why is stack_lowest_frame in argv/envp/auxv area in the first place?

set lowest frame to 0xffdd1308
set lowest frame to 0xffdd22fb
(gdb) p /x mc->xbp
$5 = 0xffdd1118
(gdb) p /x mc->xsp
$6 = 0xffdd0f88
(gdb) x/400wx mc->xsp
0xffdd0f88: 0x00000001 0x00000000 0x00000000 0x00000001
0xffdd0f98: 0x471a0100 0xffdd0fbc 0x738c1197 0x00000004
0xffdd0fa8: 0x739b5c00 0x739b5c00 0xffdd10f8 0xffdd0f88
0xffdd0fb8: 0x00000000 0x00000148 0x00000003 0x00000000
0xffdd0fc8: 0x00000000 0xffdd1118 0xffdd0f88 0x471a0100
0xffdd0fd8: 0x00000000 0x00000000 0x0811ac03 0x01000000
0xffdd0fe8: 0x095f2e30 0x080f7731 0x00000001 0x431d7ff4
0xffdd0ff8: 0xffdd1010 0x4310d262 0x00000000 0x095e530c
0xffdd1008: 0x00000001 0x080ce68e 0x00730001 0xffdd0fe8
0xffdd1018: 0x095f2dd8 0xf70c44c5 0xf70a94f4 0x0812020d
0xffdd1028: 0xf70a94e6 0xf71619d8 0xf70f98bb 0x471a0100
0xffdd1038: 0x00000000 0x0804ac1c 0xf70f6284 0xf70f6284
0xffdd1048: 0x739b5c00 0x739b5c00 0x73929300 0x471a0100
0xffdd1058: 0x00000000 0x7392b006 0x095e530c 0x0812020d
0xffdd1068: 0x7392b006 0x00000000 0x471a0100 0x471a0100
0xffdd1078: 0x095e530c 0x00000001 0x7392affc 0x7392afd8
0xffdd1088: 0x095f2e90 0x080f7731 0x00000002 0x095e5324
0xffdd1098: 0xf70a94e6 0x00000000 0x00000000 0x095e530c
0xffdd10a8: 0x00000000 0x080ce68e 0x00000002 0xf70f6284
0xffdd10b8: 0x739b5c00 0x00000000 0x095e51c8 0x095e530c
0xffdd10c8: 0x00000001 0x0804b52a 0x095e51c8 0x00000002
0xffdd10d8: 0x095e530c 0x00000000 0x471a0100 0x471a0100
0xffdd10e8: 0x095e530c 0x00000000 0x00000000 0x095e530c
0xffdd10f8: 0x00000001 0x08077dbd 0x095e51c8 0x095e530c
0xffdd1108: 0x095e5000 0x471a0100 0xffdd22fb 0xffdd1338
0xffdd1118: 0xffdd22fb 0x080f7731 0x0000000c 0x095e530c
0xffdd1128: 0x00000000 0x00000000 0xffdd1338 0xffdd22fb
0xffdd1138: 0xffdd1338 0x080f87a1 0x0000000c 0x471a0100
0xffdd1148: 0x095f2d48 0xffdd22fb 0xffdd1338 0x00000007
0xffdd1158: 0xffdd22fb 0x0807d3c5 0xffdd22fb 0x00000000
0xffdd1168: 0x00000000 0x00000001 0xf70a94f4 0xf70f6284
0xffdd1178: 0xf70a94e6 0xf70c44c5 0x00000007 0xffdd22b9
0xffdd1188: 0x00000000 0xffdd133c 0x00000001 0x00000000
0xffdd1198: 0x095e5174 0x739b5c00 0xf70f6284 0x00000009
0xffdd11a8: 0xffdd1334 0xffdd2fbd 0x00000000 0xfe5ed13f
0xffdd11b8: 0x4b03cad0 0x00000000 0x471a0100 0x471a0100
0xffdd11c8: 0x7392b006 0x00000000 0x471a0100 0x471a0100
0xffdd11d8: 0x00000ff0 0x095e9210 0x7392affc 0x7392afd8
0xffdd11e8: 0x095ed948 0x080f7731 0x00000ff0 0x0814c778
0xffdd11f8: 0x00000000 0x00000000 0x00000000 0x00000000
0xffdd1208: 0x00000000 0x0805e238 0x095ed948 0x00000000
0xffdd1218: 0x00000000 0x095e5168 0x00000000 0x095e5168
0xffdd1228: 0x00000000 0x0805ea51 0x095e5168 0x0000000b
0xffdd1238: 0x00000000 0x431d7ff4 0x0814c520 0x00000000
0xffdd1248: 0x431d7f00 0x000789bd 0x095e515c 0x00000200
0xffdd1258: 0x00000000 0x431d7ff4 0x00000000 0x00000000
0xffdd1268: 0xffdd1298 0x0804a57f 0x095e50d8 0x080832e0
0xffdd1278: 0x00000009 0xffdd1334 0x00000000 0x00001000
0xffdd1288: 0x081193a9 0x431d7ff4 0x081193a0 0x431d7ff4
0xffdd1298: 0x00000000 0x43043635 0x00000009 0xffdd1334
0xffdd12a8: 0xffdd135c 0xf77c2398 0x00000001 0x00000001
0xffdd12b8: 0x00000000 0x00000001 0x00000002 0x431d7ff4
0xffdd12c8: 0x00000000 0x00000000 0x00000000 0xfe59513f
0xffdd12d8: 0x4c17f446 0x00000000 0x00000000 0x00000000
0xffdd12e8: 0x00000009 0x0804a5c4 0x00000000 0x4301bd50
0xffdd12f8: 0x43043549 0x43026fc4 0x00000009 0x0804a5c4
0xffdd1308: 0x00000000 0x0804a5e5 0x0804a510 0x00000009
0xffdd1318: 0xffdd1334 0x081193a0 0x08119410 0x43016a90
0xffdd1328: 0xffdd132c 0x430278f8 0x00000009 0xffdd22b9
0xffdd1338: 0xffdd22f3 0xffdd22fb 0xffdd2307 0xffdd2309

(gdb) bt
#0 0xf70edcb2 in syscall_0args () from /work/dr/git/exports/lib32/release/libdynamorio.so
#1 0x095f2eb8 in ?? ()
#2 0x738d4b4d in print_callstack (buf=0x0, bufsz=0, sofar=0x0, mc=0xffdd0fbc, print_fps=0 '\000', pcs=0x47301540,
num_frames_printed=1, for_log=0 '\000') at /work/drmemory/git/src/common/callstack.c:1328
#3 0x738d5d7e in packed_callstack_record (pcs_out=0xffdd0e30, mc=0xffdd0fbc, loc=0xffdd0e20)
at /work/drmemory/git/src/common/callstack.c:1453
#4 0x738de9dd in get_shared_callstack (existing_data=0x0, mc=0xffdd0fbc, post_call=0x738c1197 "U\211\345VS\201\354\220\001")
at /work/drmemory/git/src/drmemory/alloc_drmem.c:390
#5 0x738df267 in client_add_malloc_pre (start=0x95f2ea8 "", end=0x95f2eb4 "", real_end=0x95f2eb8 "", existing_data=0x0,
mc=0xffdd0fbc, post_call=0x738c1197 "U\211\345VS\201\354\220\001") at /work/drmemory/git/src/drmemory/alloc_drmem.c:436
#6 0x738b5fed in notify_client_alloc (call_handle=1 '\001', drcontext=0x471a0100, ptr=0x95f2ea8 "", head=0x95f2e98, mc=0xffdd0fbc,
zeroed=0 '\000', realloc=0 '\000', caller=0x738c1197 "U\211\345VS\201\354\220\001")
at /work/drmemory/git/src/common/alloc_replace.c:480
#7 0x738be5be in replace_alloc_common (arena=0x95e5000, request_size=12, synch=1 '\001', zeroed=0 '\000', realloc=0 '\000',
invoke_client=1 '\001', drcontext=0x471a0100, mc=0xffdd0fbc, caller=0x738c1197 "U\211\345VS\201\354\220\001", alloc_type=4)
at /work/drmemory/git/src/common/alloc_replace.c:1371
#8 0x738c134b in replace_malloc (size=12) at /work/drmemory/git/src/common/alloc_replace.c:1857
#9 0x080f7731 in Perl_safesysmalloc ()
#10 0x080f87a1 in Perl_savepv ()
#11 0x0807d3c5 in perl_parse ()
#12 0x0804a57f in main ()
(gdb) p *pcs
$1 = {
refcount = 1,
num_frames = 2,
is_packed = -1 '\377',
first_is_retaddr = 0 '\000',
first_is_syscall = 0 '\000',
frames = {
packed = 0x472fcf80,
full = 0x472fcf80
}
}
(gdb) p pcs->frames.packed[0]
$2 = {
loc = {
addr = 0x738c1197 "U\211\345VS\201\354\220\001",
sysloc = 0x738c1197
},
modoffs = 790935,
modname_idx = 6
}
(gdb) p pcs->frames.packed[1]
$3 = {
loc = {
addr = 0x80f7731 "\205\300\211\303t\021\211؋t$\030\213$\024\203\304\034Ív",
sysloc = 0x80f7731
},
modoffs = 718641,
modname_idx = 1
}
(gdb) info line *0x738c1197
Line 1850 of "/work/drmemory/git/src/common/alloc_replace.c" starts at address 0x738c1197 <replace_malloc>
and ends at 0x738c11ad <replace_malloc+22>.
(gdb) x/2i 0x80f7731
0x80f7731 <Perl_safesysmalloc+33>: test %eax,%eax
0x80f7733 <Perl_safesysmalloc+35>: mov %eax,%ebx

the problem is that the Perl_safesysmalloc retaddr has what looks like a
legitimate fp next to it, though it points into argv[]:
0xffdd1118: 0xffdd22fb 0x080f7731 0x0000000c 0x095e530c

(gdb) x/3s 0xffdd22f3
0xffdd22f3: "-I./lib"
0xffdd22fb: "diffmail.pl"
0xffdd2307: "4"

w/o knowing the initial sp set by the kernel, it's hard to avoid this. so
perhaps my additional _init retaddr check is the right solution.

@derekbruening
Copy link
Contributor Author

From [email protected] on April 16, 2013 13:45:16

actually we can avoid the too-high stack_lowest_frame: just need to revert on a frame that failed. but I'm going to keep the stack_lowest_retaddr (_start call).

perlbench has a lot of FPO and scans between every frame in some cases. I added a cache of the last N results (verified by reading retaddr: so should be safe with -zero_retaddr).

after all 3 of those fixes the regression is gone:

release build:
1833.59user 5.72system 30:45.28elapsed 99%CPU (0avgtext+0avgdata 429768maxresident)k
ITIMER distribution (182407):
0.0% of time in APPLICATION (1)
3.2% of time in INTERPRETER (5785)
0.3% of time in DISPATCH (628)
0.1% of time in SYSCALL HANDLER (103)
3.0% of time in INDIRECT BRANCH LOOKUP (5421)
43.8% of time in FRAGMENT CACHE (79891)
49.7% of time in UNKNOWN (90578)
RES-pcsamples.0.7276.html
897 get_shadow_table
1011 add_to_delay_list
1109 bitmapx2_set
1262 find_free_list_entry
1307 packed_callstack_hash
1438 safe_read
1662 module_lookup
2334 shadow_set_range
2458 address_to_frame
2999 rb_in_node
3925 print_callstack
5895 find_next_fp

so find_next_fp is still the top hit, but it's better than it was.

@derekbruening
Copy link
Contributor Author

From [email protected] on April 16, 2013 16:47:10

This issue was closed by revision r1275 .

Status: Fixed

@derekbruening
Copy link
Contributor Author

From [email protected] on April 17, 2013 08:49:23

these made huge improvements on some spec2k6 benchmarks, reversing most
regressions vs the CGO paper and in fact improving beyond those numbers:

spec2k6cmp result/CINT2006.101.ref.txt namedres/x86.drmem.full_replace/CINT2006.098.ref.txt
400.perlbench 0.61 ( 8103 / 13263)
456.hmmer 1.00 ( 8728 / 8750)
471.omnetpp 0.68 ( 5200 / 7639)
spec2k6cmp result/CFP2006.101.ref.txt namedres/x86.drmem.full_replace/CFP2006.098.ref.txt
447.dealII 0.87 ( 9101 / 10414)
465.tonto 0.22 ( 9777 / 44318)
470.lbm 0.97 ( 2291 / 2353)

after issue #1186, perlbench, omnetpp, and tonto are now better than the paper:

spec2k6cmp result/CINT2006.101.ref.txt namedres/x86.native/CINT2006.088.ref.txt
400.perlbench 20.99 ( 8103 / 386)
456.hmmer 15.07 ( 8728 / 579)
471.omnetpp 17.39 ( 5200 / 299)
spec2k6cmp result/CFP2006.101.ref.txt namedres/x86.native/CFP2006.088.ref.txt
447.dealII 17.99 ( 9101 / 506)
465.tonto 14.36 ( 9777 / 681)
470.lbm 6.33 ( 2291 / 362)

but hmmer, dealII, and lbm are still worse than the paper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant