Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

officially support post-process w/ USE_DRSYMS to redo callstacks with better symbols #446

Open
derekbruening opened this issue Nov 28, 2014 · 20 comments

Comments

@derekbruening
Copy link
Contributor

From [email protected] on June 09, 2011 08:35:35

xref issue #143
xref issue #290
xref issue #388

On Linux there is already built-in support for re-processing callstacks. This issue is about Windows.

I want to avoid the complexities of having drmem download pdb files online
(DRi#450) or even in its front-end at startup.

xref Memcheck or Tsan: they require user to get symbols on his own.

Proposed model of acquiring symbols for users:

Original issue: http://code.google.com/p/drmemory/issues/detail?id=446

@derekbruening
Copy link
Contributor Author

From [email protected] on August 15, 2011 12:52:29

xref issue #475 and issue #504 where for the first time a default suppression has to
use mod+offs (for rsaenh.dll)

@derekbruening
Copy link
Contributor Author

From [email protected] on September 30, 2011 07:29:51

this will become a cross-platform feature once Linux and Cygwin are unified with Windows. we'll then have to re-do the post-process feature that's already present in the perl scripts.

xref issue #192 .

Summary: officially support post-process w/ USE_DRSYMS to redo callstacks with better symbols

@derekbruening
Copy link
Contributor Author

From [email protected] on September 30, 2011 07:30:00

Labels: -OpSys-Windows

@derekbruening
Copy link
Contributor Author

From [email protected] on November 29, 2011 07:24:18

note that I am now thinking of post-processing symbols as satisfying 4
goals:
A) issue #192: too-large 32-bit apps (having scrapped the sideline plan)
B) issue #614: a general feature to aggregate together reports from multiple
processes that constitute a process tree or a group of related tests
C) to allow re-symbolizing callstacks after a run that didn't have full
symbols available w/o having to re-execute
D) to re-run w/ extra suppressions to test new suppressions

@derekbruening
Copy link
Contributor Author

From [email protected] on December 01, 2011 07:15:49

re: extra suppressions
<Ideally,>
This should also be available with a standalone cross-platform script [like a python script we have for Valgrind, TSan and HeapChecker in Chromium]
to allow cross-platform suppression duties [e.g. you're on Linux, working on a report and suppression from a Windows bot].

This should not need running Linux binary or Windows .exe file.

Short-term, we can of course achieve this for Chromium with a script in the Chromium codebase.

@derekbruening
Copy link
Contributor Author

From [email protected] on December 01, 2011 07:28:18

hmmm, not sure about this one: that would require duplicating code and maintaining two sets of the same code. the whole point of having the post-analysis built-in to the tool is to share the already-present code in the tool. for the general user, if you're using the Windows tool, you have access to Windows, so it doesn't seem too onerous to require running the Windows tool in order to analyze reports from the Windows tool, does it?

@derekbruening
Copy link
Contributor Author

From [email protected] on January 10, 2012 12:42:30

having to download syms gets messy. it's not just an install-time thing
b/c user's machine will get updated. so when do you check? every run?
that's too slow. rely on user to check whenever he gets a weird error?
can't easily do online symserver (i#xxx), so set a flag and frontend on next run asks users whether to download new syms?

ideally we can keep adding suppression control so we don't need syms for
system libs: xref issue #741 . so far I've managed to not need system libs for
anything else including syscalls (xref issue #388 ). RtlpHeapFailureInfo ( issue #292 )
I guess is still symbol-dependent.

@derekbruening
Copy link
Contributor Author

From [email protected] on March 09, 2012 08:41:25

Blocking: 827

@derekbruening
Copy link
Contributor Author

From [email protected] on March 09, 2012 08:42:10

Blocking: 828

@derekbruening
Copy link
Contributor Author

From [email protected] on March 14, 2012 11:44:04

Blocking: 792

@derekbruening
Copy link
Contributor Author

From [email protected] on March 14, 2012 12:09:54

This was on my OKRs, so I'll take this.

I propose adding two options:

-log_for_postprocess: To support post-processing, we need to log all reports, whether suppressed or not, with mod+offs info.

Right now, if we use DRSYMS, we apply suppressions online to prevent log spew from suppressed errors. The error reports that we do emit use the user's -callstack_style, which may not have mod+offs.

I'd like to turn this on by default or always, so that users can quickly re-symbolize their logs after downloading symbols on their first run. However, I need to measure the cost and extra log size.

-(no_)use_online_syms: Avoids ever initializing dbghelp. Turns off results_to_stderr, turns on -log_for_postprocess, and makes the frontend to symbolize reports after exit.

For symbol lookup, we can do one of two things: Rely on the symcache, or unload the module from dbghelp after looking up symbols. Priming the symcache is pretty easy for many apps: just run it with -help or --gtest_filter=DoesNotExist, and all the "staticly" linked dlls will get loaded and cached. I think the second option is better because it just works and shouldn't give us memory problems. It may cause a spike, but unloading modules seems to cleanup all the allocated resources.

Is there any reason to process the logs in a sideline fashion ala postprocess.pl? The only reason I can think of is if there are 1000s of suppressed errors, the sideline model allows you to symbolize and match them in parallel. At first I'll try to do it synchronously on shutdown, though.

Owner: [email protected]

@derekbruening
Copy link
Contributor Author

From [email protected] on March 15, 2012 08:16:54

Linux already supports turning off regular symbolization via the option "-skip_results", and then symbolizing via "-results" (talked about in issue #192 among other places)

for too-large symbol files (which is issue #192 , not this issue) you can't pre-load the symcache by running the app b/c it may not fit with DR's reservation and DrMem's up-front usage: something like symquery is needed that uses as little memory as possible and tries to load just that one pdb

we had discussions in the past on the reasons I have linux using sideline and how just having frontend do symbolization misses all the child processes, though maybe that's ok if it's really rare (i.e., only for the rare app w/ giant pdb) and if we have issue #614 to combine them all (and an easy way to isolate one tree, or tell user to clear out logs dir or pass -logdir)

also don't forget about comment 5: though that may not be doable since we have to load dbghelp.dll and would need WINE or something. really for cross-plaform we should shoot for issue #614 which if symbols are already present would simply combine, de-dup, and re-suppress

re: -results_to_stderr: see issue #192 comment 4

@derekbruening
Copy link
Contributor Author

From [email protected] on March 15, 2012 08:20:55

also note that the plans for this feature included having it go and acquire the latest system library symbols

@derekbruening
Copy link
Contributor Author

From [email protected] on March 15, 2012 09:31:27

Right, -skip_results is like -no_use_online_syms, except it requires a second run with -results, where I was thinking it would be better to have -no_use_online_syms launch a sideline postprocess or do a complete postprocess after the app exits. Otherwise we could collapse those.

All that said, you're right, all that discussion is for too-large symbol files, issue #192 . I put aside the work I started on that and am just focusing on -log_for_postprocess now.

Once the logs look like they can be parsed, I'll send a patch for that just to make sure we're on the same page, and continue on to implement -postprocess and -aggregate in the frontend.

re: comment 5, I was planning on not supporting re-symbolizing with better symbols if you're post-processing on a different platform. Cross-platform post-processing would only be for tweaking new suppressions, which is what most of what sheriffs do anyway.

For tracking child processes, if we're not doing sideline processing, I don't think it would be hard to track down the tree: while parsing the log, we look for child creation events, and add that process's log dir to our work queue of logs to parse.

If we do want sideline processing, we'd have to do something like f_fork, right?

My plan is to do non-sideline first, since it's easier, and we can reuse the code for sideline if we want it.

@derekbruening
Copy link
Contributor Author

From [email protected] on March 15, 2012 10:24:46

re: comment 5: sounds like you're agreeing with my comment about pushing that to issue #614 .

for sideline, the parent doesn't know the logdir name of the child (though perhaps in some cases we could change drmem to create the logdir in the parent but likely only on windows so it doesn't solve the problem), so you have problems from old logdirs as well as logdirs from simultaneous process trees. unless you use a unique log base dir for every invocation of the front-end there are no guarantees.

@derekbruening
Copy link
Contributor Author

From [email protected] on March 27, 2012 07:08:53

*** TODO design of postprocess to share code :meeting_DrM_2012_1_30:

For postprocess ( issue #446 , issue #192 ), re-symbolize ( issue #446 , issue #448 ), aggregate ( issue #614 ):

My proposal is to always start from un-symbolized for all postprocess
scenarios (re-symbolized, re-suppress, aggregate). The only downside is
performance (see below). Then we have the same parser (always reading
global log and never results.txt) and same data type (packed callstack).
(Parsing the controlled-by-us standardized global.log format is easier than
the controlled-by-user results.txt where we'd have to support any
formatting the user might pick). Also, we can't rely on having symbols in
global.log: large module (issue #192), or re-symbolized => results.txt.

For performance, I would keep symbols from global.log in a data structure
(addr to string table or sthg). Always operate on regular packed
callstacks, always re-symbolize on output to user, but use cache from
parsing of global.log so don't have to go to dbghelp most of the time, and
to support cross-platform re-suppressing.

Scenario: re-symbolize on Windows (say automated on bots) and then want to
re-suppress from linux: better to have re-symbolize write a new global.log
rather than parse results.txt.

@derekbruening
Copy link
Contributor Author

From [email protected] on April 07, 2012 08:26:24

Blocking: -792

@derekbruening
Copy link
Contributor Author

From [email protected] on August 03, 2012 07:38:08

Blocking: drmemory:827

@derekbruening
Copy link
Contributor Author

From [email protected] on February 17, 2013 09:27:37

Owner: ---
Labels: GoodContrib

@derekbruening
Copy link
Contributor Author

Xref symbolization needed for Dr. Heapstat: #1447 We should share as much of that work as possible, possibly via #823

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant