Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DrMemory full mode uses too much memory, causing app alloc failures on Chrome unit_tests #792

Open
derekbruening opened this issue Nov 28, 2014 · 11 comments

Comments

@derekbruening
Copy link
Contributor

From [email protected] on February 23, 2012 10:51:40

Example build: http://build.chromium.org/p/chromium.fyi/builders/Windows%20Tests%20%28DrMemory%20full%29/builds/936 See unit_tests shard 1 of 3 in particular: http://build.chromium.org/p/chromium.fyi/builders/Windows%20Tests%20%28DrMemory%20full%29/builds/936/steps/memory%20test%3A%20unit_1/logs/stdio We run for awhile and eventually we get some warnings about failed heap allocations:
[ RUN ] ExtensionServiceTest.InstallTheme
Dr.M
Dr.M Error #133: WARNING: heap allocation failed
Dr.M # 0 (0x22b80048)
Dr.M # 1 Pickle::Resize [base\pickle.cc:405]
Dr.M # 2 Pickle::BeginWrite [base\pickle.cc:383]
Dr.M # 3 Pickle::WriteBytes [base\pickle.cc:330]
Dr.M # 4 Pickle::WriteData [base\pickle.cc:324]
Dr.M # 5 IPC::ParamTraits::Write [content\public\common\common_param_traits.cc:538]
Dr.M # 6 IPC::WriteParam [ipc\ipc_message_utils.h:169]
Dr.M # 7 IPC::ParamTraits<Tuple2<SkBitmap,FilePath> >::Write [ipc\ipc_message_utils.h:897]
Dr.M # 8 IPC::WriteParam<Tuple2<SkBitmap,FilePath> > [ipc\ipc_message_utils.h:169]
Dr.M # 9 IPC::ParamTraitsstd::vector<Tuple2<SkBitmap,FilePath,std::allocator<Tuple2<SkBitmap,FilePath> > > >::Write [ipc\ipc_message_utils.h:541]
Dr.M #10 IPC::WriteParamstd::vector<Tuple2<SkBitmap,FilePath,std::allocator<Tuple2<SkBitmap,FilePath> > > > [ipc\ipc_message_utils.h:169]
Dr.M #11 ExtensionUnpacker::DumpImagesToFile [chrome\common\extensions\extension_unpacker.cc:218]
Dr.M #12 SandboxedExtensionUnpacker::Start [chrome\browser\extensions\sandboxed_extension_unpacker.cc:275]
Dr.M #13 base::internal::RunnableAdapter<void (__thiscall SandboxedExtensionUnpacker::*)(void)>::Run [base\bind_internal.h:132]
Dr.M Note: @0:36:37.300 in thread 3652

The app crashes and does a self-backtrace afterwards.

We should repro this locally and gather statistics about what is taking the most memory. My best guesses are that we're generating and suppressing lots of reports, so we may be holding too many packed callstacks. Alternatively, maybe the delayed freelist is too long. Or dbghelp is using lots of memory.

For the moment I'm going to try splitting unit_tests into 6 shards to see if that lets us get further.

Original issue: http://code.google.com/p/drmemory/issues/detail?id=792

@derekbruening
Copy link
Contributor Author

From [email protected] on February 29, 2012 07:46:46

So, I sharded unit_tests 6 ways (very inconvenient to have 6 separate logs), and the problem is still there. My current belief is that dbghelp is using all of this memory. Only the first unit_test shard is failing, so I'm guessing that it starts with an empty symcache and populates it for the rest of the shards. It has to query drsyms about all of the system dlls loaded by the app, and the pdbs just sit around after that, eating lots of memory. We need some of them to do address to symbol lookup for symbolized stack traces, but most of them we can probably drop. Adding some cache management to drsyms would probably fix this issue.

I'll measure the usage for just the "statically" linked (is there a better term for that?) system dlls by looking at the process in process explorer after reaching main and report back.

Owner: [email protected]

@derekbruening
Copy link
Contributor Author

From [email protected] on February 29, 2012 07:48:14

On second though, I'm not sure this is the case: the bots don't have symbols installed, so they shouldn't be loading large pdbs other than unit_tests.pdb.

@derekbruening
Copy link
Contributor Author

From [email protected] on February 29, 2012 07:56:06

if it is drsyms, this issue covers pdb management: https://code.google.com/p/dynamorio/issues/detail?id=449

@derekbruening
Copy link
Contributor Author

From [email protected] on February 29, 2012 15:06:25

I ran unit_tests.exe locally for a while. This was the heap usage from DrMemory's log:

Heap usage:
shadow: count= 2652, cur= 45084 KB, max= 45084 KB
perbb: count= 847627, cur= 9926 KB, max= 9926 KB
callstack: count= 175180, cur= 9234 KB, max= 9262 KB
hashtable: count= 1997146, cur= 41837 KB, max= 45061 KB
gencode: count= 2, cur= 52 KB, max= 52 KB
rbtree: count= 4887, cur= 152 KB, max= 152 KB
suppress: count= 55879, cur= 1652 KB, max= 1678 KB
misc: count= 263, cur= 468 KB, max= 744 KB

Altogether the overhead is less than 115 MB, and the delay max free size is ~20 MB, so drmemory's data structures don't seem to be the source of the problem.

I used vmmap (from the same guys as Process Explorer) to get a visualization of memory usage by type, and got the attached screenshot some time through execution. Of the 1.7 GB working set, 1.3 is "private data" and only 36 MB is heap. All of the app's heap allocations should be detected as living in the heap, so the app is not using too much heap. DR's heap should be in the "private data" category, however we know that DrMemory's data structures only account for a small percentage of that. A large portion of the private data is RWX, which should only be DR's code cache. I can't get a breakdown by permission to try to estimate the code cache size, though.

I am going to run again with DR -loglevel 1 to get some heap usage stats from there, and see if some privately loaded library is allocating behind our back or if DR itself is letting the code cache grow too large.

Attachment: unit_tests_heap_usage.png

@derekbruening
Copy link
Contributor Author

From [email protected] on March 12, 2012 12:52:32

I ran drmemory on all of unit_tests.exe over the weekend with logging, but I forgot that I had disabled heap accounting temporarily. =/ I'll run again with that soon.

I attached the final stats dump.

The highlights are:
We're hitting the 2 GB of address space usage
203 MB used for fcache, almost all shared bb (as expected)
122 MB heap usage

That isn't enough to really hit the 2 GB limit, so that points to dbghelp's memory.

Attachment: unit_tests_stats.txt

@derekbruening
Copy link
Contributor Author

From [email protected] on March 13, 2012 10:32:14

Yup, it's definitely dbghelp, but it's not a file mapping. Here's DR's heap breakdown:

Updated-at-end Process (max is total of maxes) heap breakdown:
BB Fragments: cur= 0K, max= 354K, #= 6176, 1= 68, new= 354K, re= 0K
Coarse Links: cur= 0K, max= 0K, #= 0, 1= 0, new= 0K, re= 0K
Future Frag: cur= 0K, max= 17K, #= 5506, 1= 12, new= 17K, re= 47K
Frag Tables: cur= 0K, max= 2234K, #= 1890, 1=1947K, new= 2250K, re= 24K
IBL Tables: cur= 0K, max=133140K, #= 11985, 1=74707K, new=150891K, re=51275K
Traces: cur= 0K, max= 234K, #= 939, 1= 161K, new= 234K, re= 0K
FC Empties: cur= 0K, max= 0K, #= 0, 1= 0, new= 0K, re= 0K
Vm Multis: cur= 0K, max= 0K, #= 6176, 1= 24, new= 0K, re= 144K
IR: cur= 0K, max= 8883K, #=29752639, 1=62848, new= 9066K, re=1030321K
RCT Tables: cur= 0K, max= 0K, #= 0, 1= 0, new= 0K, re= 0K
VM Areas: cur= 260K, max= 347K, #= 744895, 1=91200, new= 2049K, re= 9475K
Symbols: cur= 165K, max= 165K, #= 100, 1=16192, new= 119K, re= 67K
TH Counter: cur= 0K, max= 1881K, #= 939, 1=1878K, new= 1881K, re= 0K
Tombstone: cur= 0K, max= 0K, #= 0, 1= 0, new= 0K, re= 0K
Hot Patching: cur= 0K, max= 0K, #= 0, 1= 0, new= 0K, re= 0K
Thread Mgt: cur= 2K, max= 2K, #= 11197, 1= 2048, new= 4K, re= 172K
Memory Mgt: cur= 60K, max= 273K, #= 2881, 1= 3884, new= 17K, re= 8955K
Stats: cur= 0K, max= 0K, #= 0, 1= 0, new= 0K, re= 0K
SpecialHeap: cur= 0K, max= 70K, #= 4817, 1= 15, new= 70K, re= 0K
Client: cur=102753K, max=112946K, #=132858616, 1=16329K, new=154649K, re=1578373K
Lib Dup: cur=864537K, max=864552K, #=21083632, 1=21609K, new=964685K, re=3563380K
Clean Call: cur= 0K, max= 0K, #= 7, 1= 68, new= 0K, re= 0K
Other: cur= 178K, max= 612K, #= 7172, 1= 126K, new= 537K, re=11951K
Total cur usage: 967958 KB

100 MB from the client is consistent with what DrMemory thinks it is using. The 860 MB of heap allocs from dbghelp is what's really clogging things up.

I grepped for just the Lib Dup line in the logs, and you can see occasional 40 MB spikes in usage. Unfortunately, they don't seem to correspond with module load events, because there are none between the spikes in the logs. This suggests to me that dbghelp (or at least our version) has some overgrown cache data structure, or it is leaking memory when we ask it to symbolize.

@derekbruening
Copy link
Contributor Author

From [email protected] on March 13, 2012 15:05:56

I added an LRU cache to drsyms_windows.c for testing purposes. It seems that unloading symbols for modules is enough to prevent run away memory usage, but it still uses quite a bit.

I attached a graph of "lib dup" heap size over "time" with a cache size of 5 modules. It's not really time since it's just whenever the heap stats got dumped, which I think is controlled by # of fragments translated.

I spent a bit of time investigating and the major decreases in the memory usage correlate pretty closely to unloading syms for unit_tests.exe.

This run was also done with a primed symcache, so pdbs are not needed when loading a module, they are only needed when symbolizing a report to be suppressed. The run hasn't finished yet, so I can't say how many stacks have been symbolized.

I also don't know what impact this has on runtime, because this run has logging enabled. Before submitting I'd like to measure the impact on a relase build on a small shard of unit_tests.exe, since that binary tends to load many different dlls and generate many reports to suppress.

Attachment: chart_1.png

@derekbruening
Copy link
Contributor Author

From [email protected] on March 14, 2012 11:44:00

I left unit_tests running over night, and I got the updated attached memory usage graph.

My conclusion is that:

  • dbghelp allocates memory when we lookup addresses (some kind of cache?), which correlate with the spikes (late in the run I got a bunch of reports).
  • dbghelp does not allocate most of its memory on SymLoadModule64.
  • We can get it back by unloading the relevant module, hence the occasional major drops.

The simple LRU cache I wrote isn't really the right way to manage this, because memory associated with a module can grow over time. We were able to run with less usage over time because the LRU cache would unload each module every so often and prevent any one module from growing too huge. It's all coincidental, though, so I don't think it's a solid solution.

I wanted to dig further into this by logging all the drsyms queries and replaying them with a separate exe that we can apply standard heap profiling tools to, but I realized that that's basically implementing issue #446 , callstack post-processing.

I'm also just generally concerned about this excessive usage, and implementing issue #446 will allow us to run larger tests where we can't use dbghelp at all.

Blockedon: 446

Attachment: chart_1 (1).png

@derekbruening
Copy link
Contributor Author

From [email protected] on March 20, 2012 11:15:41

In our meeting we discussed alternative approaches to reducing memory usage for unit_tests, since light mode doesn't have this problem. Our theory is that for light mode, we hit many fewer unaddressables, so we don't need to do as much symbolization. One way to reduce our memory usage is to generate fewer reports and do less symbolization. Here's the suppressions we're currently using on one shard of the full bot:

Suppressions used:
count name
1 https://code.google.com/p/drmemory/issues/detail?id=476 c
1 https://code.google.com/p/drmemory/issues/detail?id=476 e
1 https://code.google.com/p/drmemory/issues/detail?id=748 1 https://code.google.com/p/drmemory/issues/detail?id=815 1 http://crbug.com/115296 1 http://crbug.com/117427 a
1 http://crbug.com/117538 1 http://crbug.com/78784 1 http://crbug.com/TODO o
1 issue #757: RPC binding leaks in sspicli.dll
2 https://code.google.com/p/drmemory/issues/detail?id=17 (2)
2 http://crbug.com/115326 3 https://code.google.com/p/drmemory/issues/detail?id=14 (1)
3 https://code.google.com/p/drmemory/issues/detail?id=17 (3)
3 https://code.google.com/p/drmemory/issues/detail?id=476 f
3 https://code.google.com/p/drmemory/issues/detail?id=512 b
3 https://code.google.com/p/drmemory/issues/detail?id=68 a
3 https://code.google.com/p/drmemory/issues/detail?id=778 a
4 deliberate LazyInstance leak
4 https://code.google.com/p/drmemory/issues/detail?id=810 5 https://code.google.com/p/drmemory/issues/detail?id=12 (8)
11 https://code.google.com/p/drmemory/issues/detail?id=60 12 https://code.google.com/p/drmemory/issues/detail?id=513 a
12 http://crbug.com/TODO p
21 https://code.google.com/p/drmemory/issues/detail?id=513 c
25 https://code.google.com/p/drmemory/issues/detail?id=12 (2)
29 https://code.google.com/p/drmemory/issues/detail?id=18 d
40 https://code.google.com/p/drmemory/issues/detail?id=40 UNADDR
40 https://code.google.com/p/drmemory/issues/detail?id=493 60 https://code.google.com/p/drmemory/issues/detail?id=18 b
159 https://code.google.com/p/drmemory/issues/detail?id=412 f
369 deliberate histogram leak
427 https://code.google.com/p/drmemory/issues/detail?id=12 UNADDR
448 https://code.google.com/p/drmemory/issues/detail?id=511 b
519 http://crbug.com/79933 (2)
646 https://code.google.com/p/drmemory/issues/detail?id=113 rpcrt4.dll wildcard
1024 sqlite3_randomness UNINIT
1577 http://crbug.com/TODO g
3013 http://crbug.com/79933 (1)
18120 https://code.google.com/p/drmemory/issues/detail?id=412 h
27071 https://code.google.com/p/drmemory/issues/detail?id=513 b

The big culprits here are issue #412 and issue #513 .

The suppression we're using for issue #412 covers all uninits in rsaenh.dll. I spent a week trying to figure that out and didn't get anywhere, so I suppressed it and moved on. We could try to fix it properly, but for now it would help just to avoid taking those stack traces and symbolizing them. If we detect a module level suppression for a given error type, we could short-circuit our reporting before we take the stack trace. This should be good enough, and would be easier than trying to change our instru to propagate definedness bits without generating uninit reports. Filed as issue #839 .

I haven't investigated issue #513 personally, but I'll give that a shot. Having the source to V8 should make debugging easier.

@derekbruening
Copy link
Contributor Author

From [email protected] on March 20, 2012 11:18:58

one idea that is a targeted fix to unit_tests is to build unit_tests.exe /largeaddressaware

@derekbruening
Copy link
Contributor Author

From [email protected] on April 07, 2012 08:26:15

Between implementing issue #849 (the bit-level heuristic to mark the whole byte as defined) and issue #839 (whole module suppression), the situation is much better. I don't think we need to depend on postprocessing in order to bring down our memory usage.

However, we're still hitting this "failed to create thread" error fairly frequently. 3/4 of the last 4 builds hit this error. For now, I'm assuming it's lack of address space causing the failure, but I need to repro locally and look at memory usage in order to confirm this. http://build.chromium.org/p/chromium.fyi/builders/Windows%20Tests%20%28DrMemory%20full%29/builds/1277 http://build.chromium.org/p/chromium.fyi/builders/Windows%20Tests%20%28DrMemory%20full%29/builds/1276 http://build.chromium.org/p/chromium.fyi/builders/Windows%20Tests%20%28DrMemory%20full%29/builds/1275 http://build.chromium.org/p/chromium.fyi/builders/Windows%20Tests%20%28DrMemory%20full%29/builds/1274

Blockedon: -446

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant