diff --git a/docs/img/perfview_0_welcome.png b/docs/img/perfview_0_welcome.png new file mode 100644 index 00000000000..41712f886d5 Binary files /dev/null and b/docs/img/perfview_0_welcome.png differ diff --git a/docs/img/perfview_10_no_grouping_applied.png b/docs/img/perfview_10_no_grouping_applied.png new file mode 100644 index 00000000000..e964386fe24 Binary files /dev/null and b/docs/img/perfview_10_no_grouping_applied.png differ diff --git a/docs/img/perfview_11_unsolved_symbols.png b/docs/img/perfview_11_unsolved_symbols.png new file mode 100644 index 00000000000..3a6d9d3cbc5 Binary files /dev/null and b/docs/img/perfview_11_unsolved_symbols.png differ diff --git a/docs/img/perfview_12_lookup_symbols.png b/docs/img/perfview_12_lookup_symbols.png new file mode 100644 index 00000000000..9d43e2a4eb4 Binary files /dev/null and b/docs/img/perfview_12_lookup_symbols.png differ diff --git a/docs/img/perfview_13_set_time_range.png b/docs/img/perfview_13_set_time_range.png new file mode 100644 index 00000000000..43f1cd96bf9 Binary files /dev/null and b/docs/img/perfview_13_set_time_range.png differ diff --git a/docs/img/perfview_14_set_time_range_before_vs_after.png b/docs/img/perfview_14_set_time_range_before_vs_after.png new file mode 100644 index 00000000000..78cc86a7d43 Binary files /dev/null and b/docs/img/perfview_14_set_time_range_before_vs_after.png differ diff --git a/docs/img/perfview_15_by_name.png b/docs/img/perfview_15_by_name.png new file mode 100644 index 00000000000..57fe6207fdd Binary files /dev/null and b/docs/img/perfview_15_by_name.png differ diff --git a/docs/img/perfview_16_go_to_callers.png b/docs/img/perfview_16_go_to_callers.png new file mode 100644 index 00000000000..71beccc2638 Binary files /dev/null and b/docs/img/perfview_16_go_to_callers.png differ diff --git a/docs/img/perfview_17_callers.png b/docs/img/perfview_17_callers.png new file mode 100644 index 00000000000..da1461e6a58 Binary files /dev/null and b/docs/img/perfview_17_callers.png differ diff --git a/docs/img/perfview_18_call_tree.png b/docs/img/perfview_18_call_tree.png new file mode 100644 index 00000000000..fd8274051f9 Binary files /dev/null and b/docs/img/perfview_18_call_tree.png differ diff --git a/docs/img/perfview_19_flame_graph.png b/docs/img/perfview_19_flame_graph.png new file mode 100644 index 00000000000..f672f2d8179 Binary files /dev/null and b/docs/img/perfview_19_flame_graph.png differ diff --git a/docs/img/perfview_1_collect.png b/docs/img/perfview_1_collect.png new file mode 100644 index 00000000000..bf4b10d6be8 Binary files /dev/null and b/docs/img/perfview_1_collect.png differ diff --git a/docs/img/perfview_20_flame_graph_exclusive_time.png b/docs/img/perfview_20_flame_graph_exclusive_time.png new file mode 100644 index 00000000000..1f6b3feeccb Binary files /dev/null and b/docs/img/perfview_20_flame_graph_exclusive_time.png differ diff --git a/docs/img/perfview_21_goto_source.png b/docs/img/perfview_21_goto_source.png new file mode 100644 index 00000000000..25a72bc68c5 Binary files /dev/null and b/docs/img/perfview_21_goto_source.png differ diff --git a/docs/img/perfview_22_source_code.png b/docs/img/perfview_22_source_code.png new file mode 100644 index 00000000000..dc455986f3d Binary files /dev/null and b/docs/img/perfview_22_source_code.png differ diff --git a/docs/img/perfview_23_regression.png b/docs/img/perfview_23_regression.png new file mode 100644 index 00000000000..74296048e94 Binary files /dev/null and b/docs/img/perfview_23_regression.png differ diff --git a/docs/img/perfview_24_regression_report.png b/docs/img/perfview_24_regression_report.png new file mode 100644 index 00000000000..150879aae0f Binary files /dev/null and b/docs/img/perfview_24_regression_report.png differ diff --git a/docs/img/perfview_2_run_corerun.png b/docs/img/perfview_2_run_corerun.png new file mode 100644 index 00000000000..3eceeb61034 Binary files /dev/null and b/docs/img/perfview_2_run_corerun.png differ diff --git a/docs/img/perfview_3_log_button.png b/docs/img/perfview_3_log_button.png new file mode 100644 index 00000000000..90ba9ab6897 Binary files /dev/null and b/docs/img/perfview_3_log_button.png differ diff --git a/docs/img/perfview_4_cpu_stacks.png b/docs/img/perfview_4_cpu_stacks.png new file mode 100644 index 00000000000..0378a7b51d8 Binary files /dev/null and b/docs/img/perfview_4_cpu_stacks.png differ diff --git a/docs/img/perfview_5_select_process.png b/docs/img/perfview_5_select_process.png new file mode 100644 index 00000000000..e8f0e90f57b Binary files /dev/null and b/docs/img/perfview_5_select_process.png differ diff --git a/docs/img/perfview_6_cpu_stacks_not_filtered.png b/docs/img/perfview_6_cpu_stacks_not_filtered.png new file mode 100644 index 00000000000..d6f14059875 Binary files /dev/null and b/docs/img/perfview_6_cpu_stacks_not_filtered.png differ diff --git a/docs/img/perfview_7_cpu_metric.png b/docs/img/perfview_7_cpu_metric.png new file mode 100644 index 00000000000..829ac36c780 Binary files /dev/null and b/docs/img/perfview_7_cpu_metric.png differ diff --git a/docs/img/perfview_8_grouppats_on.png b/docs/img/perfview_8_grouppats_on.png new file mode 100644 index 00000000000..4a1c78ef6ea Binary files /dev/null and b/docs/img/perfview_8_grouppats_on.png differ diff --git a/docs/img/perfview_9_grouppats_off.png b/docs/img/perfview_9_grouppats_off.png new file mode 100644 index 00000000000..32d0d1be7f6 Binary files /dev/null and b/docs/img/perfview_9_grouppats_off.png differ diff --git a/docs/profiling-workflow-corefx.md b/docs/profiling-workflow-corefx.md index 3b791991412..2ccb538e36b 100644 --- a/docs/profiling-workflow-corefx.md +++ b/docs/profiling-workflow-corefx.md @@ -12,6 +12,12 @@ - [CPU](#CPU) - [CoreRun](#CoreRun) - [Allocation Tracking](#Allocation-Tracking) + - [PerfView](#PerfView) + - [CPU Investigation](#CPU-Investigation) + - [Filtering](#Filtering) + - [Analyzing the Results](#Analyzing-the-Results) + - [Viewing Source Code](#Viewing-Source-Code) + - [Identifying Regressions](#Identifying-Regressions) ## Prerequisites @@ -251,3 +257,135 @@ Again, if you have configured everything properly you are able to right click on ![Go to Source File](img/vs_profiler_14_memory_goto_source.png) ![Actual Source File](img/vs_profiler_15_memory_source_file.png) + +### PerfView + +PerfView is the ultimate .NET Profiler and if you are new to PerfView it's recommended to read it's tutorial or watch the tutorial [videos](https://channel9.msdn.com/Series/PerfView-Tutorial). + +![Welcome Screen](img/perfview_0_welcome.png) + +#### CPU Investigation + +We can **Collect** profile data by either **Run**ning a standalone executable (or command) or **Collect**ing the data machine wide with explicit start and stop. + +![Collect Menu Items](img/perfview_1_collect.png) + +When profiling local CoreFX build it's recommended to Run a standalone repro app using CoreRun. + +![Run CoreRun](img/perfview_2_run_corerun.png) + +If for some reason PerfView fails to run the command you need to click the `Log` button in left bottom corner and read the logs to find out what happened. + +![Log button](img/perfview_3_log_button.png) + +Once PerfView finishes the run you need to open the `CPU Stacks` + +![CPU Stacks](img/perfview_4_cpu_stacks.png) + +By default PerfView collects the profile data machine wide. Once you click on the `CPU Stacks` it shows you a window with the list of processes active during profile session. It sorts the processes descending by CPU time so the most active processes are at the top. You just need to select `CoreRun` and click `OK`. + +![Select Process](img/perfview_5_select_process.png) + +Once you do that it opens the `CPU Stacks` window with the default filtering applied. + +![Non filtered CPU Stacks](img/perfview_6_cpu_stacks_not_filtered.png) + +The first thing you need to do is to make sure that you have at least a few 1000 samples (preferably over 5000) and that the process is actually CPU bound. + +The `Metric/Interval` is a quick measurement of how CPU bound the trace is as a whole. A value of 1 indicates a program that on average consumes all the CPU from a single processor. Unless that is high, your problem is not CPU (it can be some blocking operation like network/disk read). + +![CPU Metric](img/perfview_7_cpu_metric.png) + +##### Filtering + +Fundamentally, what is collected by the PerfView profiler is a sequence of stacks. A stack is collected every millisecond for each hardware processor on the machine. This is very detailed information and hence by default PerfView groups the stacks. This is very useful when you are profiling a real-world application in a production environment, but when you work on the .NET Team and you profile some simple repro app you care about all details and you don't want the results to be grouped by modules. + +![Group pats](img/perfview_8_grouppats_on.png) + +This is why you should disable the grouping by selecting `[no grouping]` item from `GroupPats` combo box. If you want to learn more about `GroupPats` just click on it (it looks like a web link and it opens help). + +![No grouping](img/perfview_9_grouppats_off.png) + +Now you can see much more details: + +![No grouping applied](img/perfview_10_no_grouping_applied.png) + +But some of the results contain `!?` which means unsolved symbols. + +![Unsolved symbols](img/perfview_11_unsolved_symbols.png) + +To solve the symbols you just need to select given names, right click and choose `Lookup symbols`. You can also just press `Ctrl+A` (select all) and `Alt+S` (Lookup symbols). + +![Lookup symbols](img/perfview_12_lookup_symbols.png) + +PerfView does an excellent job at solving symbols. It's capable of solving not only the app code symbols but also .NET Runtime and Windows OS symbols. It's one of the moments when it shines compared to other profilers that very often struggle with that. + +When we are using the `Collect -> Run` option PerfView profiles entire process lifetime. It's very useful when you are interested in things like .NET Runtime startup performance. But typically you just care about specific method. So the next thing you need to do is to set the Time Range. You can do that by selecting the interesting method, right clicking and choosing `Set Time Range`. Or just pressing `Alt+R`. + +![Set Time Range](img/perfview_13_set_time_range.png) + +As you can see, all the methods that were executed before the first and after last call to `DateTime.UtcNow` got filtered out: + +![Time Range Applied](img/perfview_14_set_time_range_before_vs_after.png) + +`When` - This is a visualization of how the INCLUSIVE samples collected for that node vary over time. The total range (from the Start and End text boxes), is divided into 32 time 'TimeBuckets' and the inclusive samples for that node are accumulated into those 32 buckets. Each bucket is then represented as a digit that represents a scaled value. + +This simple text representation of histogram can be very useful when profiling more complex scenarios, but in this case it just shows us that `DateTime.UtcNow` was executed all the time. But this is exactly what we wanted! + +##### Analyzing the Results + +Once we get the data filtered we can start the analysis. + +The `By Name` tab contains the list of all methods captured during profile time. This list is by default sorted in descending order by **Exc**lusive CPU time. It means that the most expensive methods which perform actual CPU computation are at the top of the table. These are the methods you care about. + +![By name tab](img/perfview_15_by_name.png) + +To find out who is calling the most time consuming method you can right click on it and select `Goto -> Goto Item in Callers` or just press `F10`. + +![Go to callers](img/perfview_16_go_to_callers.png) + +![Callers](img/perfview_17_callers.png) + +If you wish you can see the entire `Call Tree` by clicking on the `Call Tree` tab: + +![Call Tree](img/perfview_18_call_tree.png) + +`Flame Graph` offers you a different form of visualization:: + +![Flame Graph](img/perfview_19_flame_graph.png) + +The graph starts at the bottom. Each box represents a method in the stack (inclusive CPU time). Every parent is the caller, children are the callees. The wider the box, the more time it was on-CPU. + +For the leaf nodes the inclusive time == exclusive time. The difference between the parent and children box width (marked with red on the image below) is the exclusive parent (caller) time. + +```cs +parent.InclusiveTime - children.InclusiveTime = parent.ExclusiveTime +``` + +![Flame Graph Exclusive time](img/perfview_20_flame_graph_exclusive_time.png) + +##### Viewing Source Code + +If you want to view the Source Code of the given method you need to right-click on it and select `Goto Source (Def)` menu item. Or just press `Alt+D`. + +![Goto Source](img/perfview_21_goto_source.png) + +![Source Code](img/perfview_22_source_code.png) + +If PerfView fails to show you the source code you should read the `Log` output. If you are unable to make it work and you really care about it you should switch to Visual Studio Profiler. + +**Note:** As of today, PerfView keeps the `.pdb` files [opened](https://github.com/microsoft/perfview/pull/979) after showing the source code. It means that if you keep the trace file opened in PerfView and try to rebuild CoreFX/CLR the build is going to fail. You might need to close PerfView to rebuild CoreFX/CLR. + +##### Identifying Regressions + +PerfView has a built-in support for identifying regressions. To use it you need to: + +* profile two different versions of .NET Core running exact same C# code +* open both trace files in PerfView and apply exactly the same filters +* choose `Regression -> With Baseline` from the top Menu + +![Two trace files](img/perfview_23_regression.png) + +![Regression Report](img/perfview_24_regression_report.png) + +It's recommended to use it instead of trying to eyeball complex Flame Graphs.