Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: document how to examine --trace_gc output #372

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 86 additions & 1 deletion documentation/memory/step3/using_gc_traces.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,89 @@ v8.setFlagsFromString('--trace_gc');
setTimeout(() => { v8.setFlagsFromString('--notrace_gc'); }, 60e3);
```

## Examining a trace
## Examining a trace

Obtained traces of garbage collection looks like the following lines.

```
[19278:0x5408db0] 44 ms: Scavenge 2.3 (3.0) -> 1.9 (4.0) MB, 1.2 / 0.0 ms (average mu = 1.000, current mu = 1.000) allocation failure

[23521:0x10268b000] 120 ms: Mark-sweep 100.7 (122.7) -> 100.6 (122.7) MB, 0.15 / 0.0 ms (average mu = 0.132, current mu = 0.137) deserialize GC in old space requested
```

This is how to interpret the trace data (for the second line):

<table>
<tr>
<th>Token value</th>
<th>Interpretation</th>
</tr>
<tr>
<td>23521</td>
<td>PID of the running process</td>
</tr>
<tr>
<td>0x10268db0</td>
<td>Isolate (JS heap instance)</td>
</tr>
<tr>
<td>120</td>
<td>Time since the process start in ms</td>
</tr>
<tr>
<td>Mark-sweep</td>
<td>Type / Phase of GC</td>
</tr>
<tr>
<td>100.7</td>
<td>Heap used before GC in MB</td>
</tr>
<tr>
<td>122.7</td>
<td>Total heap before GC in MB</td>
</tr>
<tr>
<td>100.6</td>
<td>Heap used after GC in MB</td>
</tr>
<tr>
<td>122.7</td>
<td>Total heap after GC in MB</td>
</tr>
<tr>
<td>0.15 / 0.0 </br>
(average mu = 0.132, current mu = 0.137)</td>
<td>Time spent in GC in ms</td>
</tr>
<tr>
<td>deserialize GC in old space requested</td>
<td>Reason for GC</td>
</tr>
</table>

## Examples of diagnosing memory issues with trace option:

A. How to get context of bad allocations using --trace-gc
1. Suppose we observe that the old space is continously increasing.
2. But due to heavy gc, the heap roof is not hit, but the process is slow.
3. Review the trace data and figure out how much is the total heap before and after the gc.
4. Reduce `--max-old-space-size` such that the total heap is closer to the limit.
5. Allow the program to run, hit the out of memory.
6. The produced log shows the failing context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might add another which is similar except we reduce old space size in order to ensure we actually have an OOM. If we see the heap is continually increasing but don't OOM for a long time. Reducing the old space size can help confirm it is an actual leak versus the heap just increasing because there is lots of space and no need to gc or gc aggressively enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhdawson, added another example. PTAL.

B. How to assert whether there is a memory leak when heap growth is observed
1. Suppose we observe that the old space is continously increasing.
2. Due to heavy gc, the heap roof is not hit, but the process is slow.
3. Review the trace data and figure out how much is the total heap before and after the gc.
4. Reduce --max-old-space-size such that the total heap is closer to the limit.
5. Allow the program to run, see if it hits the out of memory.
6. If it hits OOM, increment the heap size by ~10% or so and repeat few times. If the same pattern is observed, it is indicative of a memory leak.
7. If there is no OOM, then freeze the heap size to that value - A packed heap reduces memory footprint and compation latency.

C. How to assert whether too many gcs are happening or too many gcs are causing an overhead
1. Review the trace data, specifically around time between consecutive gcs.
2. Review the trace data, specifically around time spent in gc.
3. If the time between two gc is less than the time spent in gc, the application is severely starving.
4. If the time between two gcs and the time spent in gc are very high, probably the application can use a smaller heap.
5. If the time between two gcs are much greater than the time spent in gc, application is relatively healthy.