-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perblench full mode diffmail run has 3x perf regression #1186
Comments
From [email protected] on April 16, 2013 08:03:01 decoding forward from _start to find bottom retaddr is an improvement, from 3359.52user 5.47system 56:14.44elapsed 99%CPU (0avgtext+0avgdata 425808maxresident)k **** TODO why is stack_lowest_frame in argv/envp/auxv area in the first place? set lowest frame to 0xffdd1308 (gdb) bt the problem is that the Perl_safesysmalloc retaddr has what looks like a (gdb) x/3s 0xffdd22f3 w/o knowing the initial sp set by the kernel, it's hard to avoid this. so |
From [email protected] on April 16, 2013 13:45:16 actually we can avoid the too-high stack_lowest_frame: just need to revert on a frame that failed. but I'm going to keep the stack_lowest_retaddr (_start call). perlbench has a lot of FPO and scans between every frame in some cases. I added a cache of the last N results (verified by reading retaddr: so should be safe with -zero_retaddr). after all 3 of those fixes the regression is gone: release build: so find_next_fp is still the top hit, but it's better than it was. |
From [email protected] on April 16, 2013 16:47:10 This issue was closed by revision r1275 . Status: Fixed |
From [email protected] on April 17, 2013 08:49:23 these made huge improvements on some spec2k6 benchmarks, reversing most
after issue #1186, perlbench, omnetpp, and tonto are now better than the paper:
but hmmer, dealII, and lbm are still worse than the paper |
From [email protected] on April 15, 2013 16:03:10
after some replace_malloc optimizations, perlbench looked good on my home
machine. but on my work machine its diffmail run was still 3x slower than
it should be, leading 36x overall: and this happens w/o -replace_malloc as
well.
2nd run better than it was (see issue #1183) but still 3x slower than expected!
Workload elapsed time (0:1) = 4494.370856 seconds
Workload elapsed time (0:2) = 6564.70585 seconds
Workload elapsed time (0:3) = 2203.749973 seconds
Reported: 13262 826799000 13262.826799
(gdb) p 13262/386.
$1 = 34.357512953367873
need to profile it. why so different from ancalagon?
-prof_pcs:
ITIMER distribution (658769):
1.3% of time in INTERPRETER (8539)
0.1% of time in DISPATCH (838)
0.0% of time in SYSCALL HANDLER (10)
1.9% of time in INDIRECT BRANCH LOOKUP (12310)
16.5% of time in FRAGMENT CACHE (108429)
80.2% of time in UNKNOWN (528643)
RES-pcsamples.0.7528.html
1938 shadow_set_range
2629 replace_alloc_common
2739 replace_free_common
3019 safe_read
4433 packed_callstack_record
5543 ??
6088 is_retaddr.part.5
15813 __i686.get_pc_thunk.bx
18620 rb_in_node
54468 is_dword_defined
100657 shadow_get_dword
112720 find_next_fp
differences vs glaurung:
gcc 4.6.3 vs 4.7.2
spec 1.2 vs 1.1
"-O2 -fno-strict-aliasing -m32" vs "-m32 -O2"
I checked perlbench sources: identical for 1.1 vs 1.2
maybe more fpo or sthg?
but why doesn't this show up in -light?
no suppressions used, -delay_frees_stack is off: must just be malloc
callstacks, which should be there for -light, right?
ancalagon debug run:
2483.38user 6.75system 41:38.86elapsed 99%CPU (0avgtext+0avgdata 429992maxresident)k
app mallocs: 278012393, frees: 277917410, large mallocs: 10026
unique malloc stacks: 11,273,296
callstack fp scans: 277,754,787
callstack is_retaddr: 704881223, backdecode: 1389036002, unreadable: 0
goobuntu debug run:
11462.56user 9.37system 3:11:47elapsed 99%CPU (0avgtext+0avgdata 1658896maxresident)k
app mallocs: 278011932, frees: 277917207, large mallocs: 10026
unique malloc stacks: 11,273,318
callstack fp scans: 2,499,318,434
callstack is_retaddr: 3868333682, backdecode: 3511997706, unreadable: 0
running the goobuntu perlbench exe on ancalagon it is slow there too.
snapshot:
(gdb) bt
#0 0x7387bbb9 in shadow_get_dword (addr=0xffdd64d8 "") at /work/drmemory/git/src/drmemory/shadow.c:429
#1 0x738f4f54 in is_dword_defined (addr=0xffdd64d8 "") at /work/drmemory/git/src/drmemory/report.c:1183
#2 0x738d1c46 in find_next_fp (pt=0x4a6f5e7c, fp=0xffdd6180 "\020\245\004\b\t", top_frame=0 '\000', retaddr=0x0)
at /work/drmemory/git/src/common/callstack.c:981
#3 0x738d38b1 in print_callstack (buf=0x0, bufsz=0, sofar=0x0, mc=0xffdd5d5c, print_fps=0 '\000', pcs=0x4a688e00,
num_frames_printed=1, for_log=0 '\000') at /work/drmemory/git/src/common/callstack.c:1204
#4 0x738d550c in packed_callstack_record (pcs_out=0xffdd5be0, mc=0xffdd5d5c, loc=0xffdd5bd0)
at /work/drmemory/git/src/common/callstack.c:1384
#5 0x738de165 in get_shared_callstack (existing_data=0x0, mc=0xffdd5d5c, post_call=0x738c141b "U\211\345VS\201\354\220\001")
at /work/drmemory/git/src/drmemory/alloc_drmem.c:389
#6 0x738de9ef in client_add_malloc_pre (start=0xbcc0a78 "", end=0xbcc0a7a "", real_end=0xbcc0a88 "L%\353J\360\017",
existing_data=0x0, mc=0xffdd5d5c, post_call=0x738c141b "U\211\345VS\201\354\220\001")
at /work/drmemory/git/src/drmemory/alloc_drmem.c:435
#7 0x738b647b in notify_client_alloc (call_handle=1 '\001', drcontext=0x4a6a8100, ptr=0xbcc0a78 "", head=0xbcc0a68, mc=0xffdd5d5c,
zeroed=0 '\000', realloc=0 '\000', caller=0x738c141b "U\211\345VS\201\354\220\001")
at /work/drmemory/git/src/common/alloc_replace.c:465
#8 0x738be8ef in replace_alloc_common (arena=0x954d000, request_size=2, synch=1 '\001', zeroed=0 '\000', realloc=0 '\000',
invoke_client=1 '\001', drcontext=0x4a6a8100, mc=0xffdd5d5c, caller=0x738c141b "U\211\345VS\201\354\220\001", alloc_type=4)
at /work/drmemory/git/src/common/alloc_replace.c:1319
#9 0x738c15cf in replace_malloc (size=2) at /work/drmemory/git/src/common/alloc_replace.c:1784
#10 0x080f7731 in Perl_safesysmalloc ()
#11 0x080ce68e in Perl_sv_grow ()
#12 0x080dad5c in Perl_sv_setsv_flags ()
#13 0x080db5ab in Perl_newSVsv ()
#14 0x080a1428 in Perl_pp_aassign ()
#15 0x080cb4dd in Perl_runops_standard ()
#16 0x0807da46 in perl_run ()
#17 0x0804a5b4 in main ()
(gdb) info local
$16 = (app_pc) 0x43043635 "\211\004$ \350\363\206\001"
tos = 0xffdd6180 "\020\245\004\b\t"
slot1 = 0x0
match = 0 '\000'
fp_defined = 0 '\000'
sp = 0xffdd64d8 ""
match_next_frame = 0 '\000'
buf_pg = 0xffdd6000 "(
\335\377" slot0 = 0x0 ret_offs = 0 stop = 0xffdd6980 "" page_buf = 0x4a7f05f4 "(
\335\377"(gdb) up 1
(gdb) info local
drcontext = 0x4a6a8100
pt = 0x4a6f5e7c
num = 9
len = 0
pc = 0xffdd6178
prev_sofar = 0
appdata = {
next_fp = 0x0,
retaddr = 0x804a5e5 "\364\220\220\220\220\220\220\220\220\220\220U\211\345S\203\354\004\200=d\275\024\b"
}
custom_retaddr = 0x0
lowest_frame = 0xffdd6178 ""
first_iter = 0 '\000'
have_appdata = 0 '\000'
scanned = 1 '\001'
last_frame = 0 '\000'
(gdb) p *pt
$6 = {
errbuf = 0x4a7ecb0c "",
errbufsz = 15074,
page_buf = 0x4a7f05f4 "(`\335\377",
stack_lowest_frame = 0xffdd72fb "diffmail.pl"
}
(gdb) p *pcs
$7 = {
refcount = 1,
num_frames = 9,
is_packed = -1 '\377',
first_is_retaddr = 0 '\000',
first_is_syscall = 0 '\000',
frames = {
packed = 0x4ae70e74,
full = 0x4ae70e74
}
}
(gdb) p pcs->frames.packed[6].loc.addr
$15 = (app_pc) 0x80cb4dd "\205\300\243\204\310\024\bu\352\306\005\f\303\024\b"
(gdb) p pcs->frames.packed[7].loc.addr
(gdb) p pcs->frames.packed[8].loc.addr
$17 = (app_pc) 0x804a5e5 "\364\220\220\220\220\220\220\220\220\220\220U\211\345S\203\354\004\200=d\275\024\b"
(gdb) x/2i 0x43043635 -2
0x43043633 <__libc_start_main+243>: call *%edx
(gdb) x/2i 0x0804a5e5-5
0x804a5e0 <_start+28>: call 0x8049700 __libc_start_main@plt
(gdb) p /x mc->xsp
$4 = 0xffdd5d28
(gdb) p /x mc->xbp
$5 = 0xffdd5eb8
(gdb) x/400wx mc->xsp
0xffdd5d28: 0x00000001 0x00000000 0x00000000 0x00000001
0xffdd5d38: 0x4a6a8100 0xffdd5d5c 0x738c141b 0x00000004
0xffdd5d48: 0x4a7b4918 0xffdd5f5c 0xffde0000 0x739b5c00
0xffdd5d58: 0x00000000 0x00000148 0x00000003 0x00000002
0xffdd5d68: 0x080cfc60 0xffdd5eb8 0xffdd5d28 0x09ce98b0
0xffdd5d78: 0xffdd5d98 0x738e28b8 0xffdd5f58 0xffdd5f5c
0xffdd5d88: 0x080cfc60 0xffdd5f5c 0xffdd5e00 0x739b5c00
0xffdd5d98: 0xffdd5dd8 0x00000000 0x00000000 0xffdd5f5c
0xffdd5da8: 0xffdd5d01 0x00000000 0x00dd5f58 0xffdd5f5c
0xffdd5db8: 0x00000001 0x00000000 0x00000000 0xfffffc18
0xffdd5dc8: 0xffdd5f5c 0xffdd5f5c 0x00000000 0x739b5c00
0xffdd5dd8: 0xffdd5f58 0x738c1e92 0x4a6a8100 0x00000000
0xffdd5de8: 0x00000001 0x00000001 0x4a6a8100 0xffdd5e00
0xffdd5df8: 0x738c1cdb 0x00000004 0x00000148 0x00000003
0xffdd5e08: 0x09782417 0x080c8418 0xffdd5f58 0xffdd5de8
0xffdd5e18: 0x00000007 0x0995115b 0x09951166 0x00000000
0xffdd5e28: 0x00000001 0x080c7929 0x00000056 0xf29da2bc
0xffdd5e38: 0x09acfbf8 0x95007f00 0x0971db74 0x0971db30
0xffdd5e48: 0x00000050 0xf70c44c5 0xf70a94f4 0x098a3ff6
0xffdd5e58: 0xf70a94e6 0xf71619d8 0xf70f98bb 0x4a6a8100
0xffdd5e68: 0x00000000 0x080c6f07 0xf70f6284 0xf70f6284
0xffdd5e78: 0x739b5c00 0x739b5c00 0x73929300 0x4a6a8100
0xffdd5e88: 0x00000000 0x7392b006 0xf70f6284 0xf70f6284
0xffdd5e98: 0x7392b006 0x00000000 0x4a6a8100 0x4a6a8100
0xffdd5ea8: 0x0954d000 0x4a6a8100 0x09b85314 0x00000000
0xffdd5eb8: 0x00000001 0x080f7731 0x00000002 0x09951166 <===
0xffdd5ec8: 0x097965c4 0x00000000 0x00000000 0x09b85314
0xffdd5ed8: 0x00000000 0x080ce68e 0x00000002 0x00000057 <===
0xffdd5ee8: 0x09796cd0 0x09b85314 0x09b85314 0x09b50474
0xffdd5ef8: 0x04040804 0x080dad5c 0x09b85314 0x00000002 <===
0xffdd5f08: 0x00000000 0x09bba0a4 0xf70f6284 0xf70f6284
0xffdd5f18: 0x739b5c00 0x09b699ec 0x09951166 0x09796c70
0xffdd5f28: 0x09951110 0x00000000 0x09b699ec 0x00000012
0xffdd5f38: 0x00000001 0x09951165 0x09951166 0x09796c70
0xffdd5f48: 0x09951165 0x09b50474 0x09b85314 0x09b6956c
0xffdd5f58: 0x0000000d 0x080db5ab 0x09b85314 0x09b50474 <===
0xffdd5f68: 0x000000...
Original issue: http://code.google.com/p/drmemory/issues/detail?id=1186
The text was updated successfully, but these errors were encountered: