Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: document how to examine --trace_gc output #372

Closed
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 65 additions & 3 deletions documentation/memory/step3/using_gc_traces.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,68 @@ Obtained traces of garbage collection looks like the following lines.

This is how to interpret the trace data:

```
[PID: isolate] < time taken since GC started in ms> : < type/phase of GC > <heap used before GC call in MB> ( < allocated heap before GC call in MB > ) -> < heap used after GC in MB> ( < allocated heap after GC in MB>) <time spent in GC in ms> [ < reason for GC >]
```
<table>
<tr>
<th>Token value</th>
<th>Interpretation</th>
</tr>
<tr>
<td>23521</td>
<td>PID of the running process</td>
</tr>
<tr>
<td>0x10268db0</td>
<td>Isolate (JS heap instance)</td>
</tr>
<tr>
<td>120</td>
<td>Time since the process start in ms</td>
</tr>
<tr>
<td>Mark-sweep</td>
<td>Type / Phase of GC</td>
</tr>
<tr>
<td>100.7</td>
<td>Heap used before GC in MB</td>
</tr>
<tr>
<td>122.7</td>
<td>Total heap before GC in MB</td>
</tr>
<tr>
<td>100.6</td>
<td>Heap used after GC in MB</td>
</tr>
<tr>
<td>122.7</td>
<td>Total heap after GC in MB</td>
</tr>
<tr>
<td>0.15 / 0.0 </td>
<td>Time spent in GC in ms</td>
</tr>
<tr>
<td>(average mu = 0.132, current mu = 0.137) deserialize GC in old space requested</td>
<td>Reason for GC</td>
</tr>
</table>

## Examples of diagnosing memory issues with trace option:

A. How to get context of bad allocations using --trace-gc
1. Suppose we observe that the old space is ocntinously increasing.
Flarna marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Suppose we observe that the old space is ocntinously increasing.
1. Suppose we observe that the old space is continously increasing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhdawson, thanks. Fixed it.

2. But due to heavy gc, the heap roof is not hit, but the process is slow.
3. Review the trace data and figure out how much is the total heap before and after the gc.
4. Reduce `--max-old-space-size` such that the total heap is closer to the limit.
5. Allow the program to run, hit the out of memory.
6. The produced log gives shows the failing context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might add another which is similar except we reduce old space size in order to ensure we actually have an OOM. If we see the heap is continually increasing but don't OOM for a long time. Reducing the old space size can help confirm it is an actual leak versus the heap just increasing because there is lots of space and no need to gc or gc aggressively enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhdawson, added another example. PTAL.

B. How to assert whether too many gc are happening or too many gc is causing an overhead
Copy link
Member

@mhdawson mhdawson Apr 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
B. How to assert whether too many gc are happening or too many gc is causing an overhead
B. How to assert whether too many gcs are happening or too many gcs are causing an overhead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhdawson, thanks. Fixed it.

1. Review the trace data, specifically around time between consecutive gcs.
2. Review the trace data, specifically around time spent in gc.
3. If the time between two gc is less than the time spent in gc, the application is sseverely starving.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. If the time between two gc is less than the time spent in gc, the application is sseverely starving.
3. If the time between two gc is less than the time spent in gc, the application is severely starving.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

4. If the time between two gc and the time spent in gc are very high, probably the application can use a smaller heap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. If the time between two gc and the time spent in gc are very high, probably the application can use a smaller heap
4. If the time between two gcs and the time spent in gc are very high, probably the application can use a smaller heap

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

5. If the time between two gc is much greater than the time spent in gc, application is relatively healthy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. If the time between two gc is much greater than the time spent in gc, application is relatively healthy
5. If the time between two gcs is much greater than the time spent in gc, application is relatively healthy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

6. While the actual numbers for these metrics change from workload to workload, a reasonable gap between gcs is 20 minutes, and a reasonable gc time is < 100 ms.
Flarna marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought as @Flarna as well. 20 minutes seems to long. Do we have something to back up suggestions for particular numbers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.I don’t have any data / proof points, so I simply removed it.