Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HistoryDictionary: Replace manually sorted list with SortedDictionary #298

Merged
merged 1 commit into from
Jul 12, 2017

Conversation

ilabutin
Copy link
Contributor

This is the attempt to fix issue #297.

I made an attempt to replace manually sorted list which spends a lot of time in linear search inside Add with tree-based SortedDictionary.

After the fix the same file is opened within half a minute as opposed to 4 minutes in the original case:

159 distinct processes.
Totals
  2 856 343 events.
  1 771 483 events with stack traces.
   141 728 events with code addresses in them.
  62 855 364 total code address instances. (stacks or other)
   168 745 unique code addresses. 
  1 699 553 unique stacks.
     2 971 unique managed methods parsed.
   101 257 CLR method event records.
[Conversion complete 2 856 343 events.  Conversion took 25 sec.]
A total of 168745 symbolic addresses were looked up.
Addresses outside any module: 2091 out of 168745 (1,2%)
Done with symbolic lookup.
ETL Size 1 035,141 MB ETLX Size 531,210 MB
Completed: Opening PerfViewData-mosaictest12.etl.zip   (Elapsed Time: 32,745 sec)

I didn't experience any issues using PerfView with these changes for a few weeks.

The only 'business logic' change I made is that only single entry is stored for any given timestamp as opposed to the old version. But the lookup algorithm from the old version would have always found the latest added entry and not others, so I made a conclusion that my code should work as well.

If more performance comparison are required - let me know about particular scenarios to test.

@msftclas
Copy link

@ilabutin,
Thanks for having already signed the Contribution License Agreement. Your agreement was validated by Microsoft. We will now review your pull request.
Thanks,
Microsoft Pull Request Bot

@codecov-io
Copy link

codecov-io commented Jun 30, 2017

Codecov Report

Merging #298 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #298      +/-   ##
==========================================
+ Coverage   24.35%   24.35%   +<.01%     
==========================================
  Files         199      199              
  Lines      112479   112483       +4     
  Branches    11228    11229       +1     
==========================================
+ Hits        27397    27399       +2     
- Misses      85055    85057       +2     
  Partials       27       27
Flag Coverage Δ
#2015 24.35% <100%> (ø) ⬆️
#2017 24.35% <100%> (ø) ⬆️
#Debug 15.81% <100%> (ø) ⬆️
#Release 14.89% <100%> (ø) ⬆️
Impacted Files Coverage Δ
src/TraceEvent/TraceUtilities/HistoryDictionary.cs 73.78% <100%> (-0.97%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7a2884d...008773d. Read the comment docs.

@vancem
Copy link
Contributor

vancem commented Jul 5, 2017

Thanks for looking into this. Changing the underlying data structure of HistoryDictionary is a big enough change that I need to think about it first. The original idea is of a history dictionary is that a particular ID would have few changes over time (it is a Dictionary first). I think this is true for most uses, but breaks down for OS handles that are heavily reused (which seems to happen more now that it did before). Thus there are options to weigh here.

It will probably take a couple of days for me to get time to do this, but I will have a response for you by next Monday.

@ilabutin
Copy link
Contributor Author

ilabutin commented Jul 5, 2017

I had a thought that maybe more clever approach is necessary. If number of entries for given ID is less than, e.g. 5 or 10, then keep them in simple list like it is now. But if number of entries per ID grows then switch to more heavy data structure like SortedDictionary or even something custom.

@vancem
Copy link
Contributor

vancem commented Jul 5, 2017

Yes, that is one of the options. The fact that there are very different ways of fixing the problem is why I want to think about it a bit before committing.

@sharwell
Copy link
Member

sharwell commented Jul 5, 2017

Some people on a few VS teams have observed performance issues related to HistoryDictionary as well. There are two things delaying us (the group I referred to) from evaluating this change in that context:

  1. We're at one of those points in time where some high-priority items are taking us off things like this change for a bit (not our fault)
  2. We did a poor job of keeping track of exactly which scenarios were being impacted by this specifically (definitely our fault)

Vance (@vancem) is outside/above this group and has the most knowledge of PerfView scenarios, so he may beat us in getting this change verified.

@ilabutin
Copy link
Contributor Author

ilabutin commented Jul 5, 2017

I have more input now which may affect the decision of what solution to use.

I was thinking that it was related to running a trace with File I/O enabled, but after more careful trace analysis it turned out that it happens for ClrTraceEventParserState._typeIDToName dictionary.

See the PerfView view attached with the hot callstack.
HistoryDictonaryCPUConsumingView.perfView.xml.zip

I've ran a debug session again with some quickly crafted tracing and indeed I got information that for _typeIDToName dictionary the Add method was called 198439 times, while total count of unique items was 165. And all IDs had exactly one item associated, not more. This looks logical as I assume typeID is not changed as long as CLR assigns it to particular type.

Most constributing type names are:
System.Windows.Forms.Internal.DeviceContext - 133498 times
System.Windows.Forms.Internal.WindowsFont - 44427 times
ThreadMethodEntry - 14822 times

So, in fact it is not heavy File I/O, it's weird implementation of UI that caused that amount of data to be pushed to HistoryDictionary.

Given the fact that at the end we get HistoryDictionary where real history is only single entry per ID, I think that SortedDictionary is an overkill and it would be just enough to maintain uniqueness of entries (i.e. timestamps) per each ID (now they are added anyway, but only the latest one is used later, so we can just discard previous ones already during Add). This will keep number of entries per ID close to 1 and thus maintain acceptable lookup speed during subsequent Add.

I can prepare modified pull request with this simple fix and measure how performance is changed in my scenario.

PS
The _fileIDToName dictionary (which I suppose is used to track file handles) has 13396 different IDs with no more than 4 entries per ID. And there were no any collisions of timestamps for file IDs. Given that information I don't have feeling that we need to care too much about file handling (and handle reuse).

PPS
I wonder, though, why so many identical timestamps for given typeID are pushed into the dictionary. And maybe even if proposed fix is valid by itself there is something else to look at for better fix (not to push that many identical information into the dictionary).

@ilabutin
Copy link
Contributor Author

I have finally reverted my SortedDictionary approach and only adjusted original PerfView behavior to avoid duplicate entries with same timestamp in each individual list.

Now conversion takes even less - 15 seconds as opposed to 23 seconds with SortedDictionary.

I hope this change is less massive.

entry = firstEntry.skipAhead;

HistoryValue last = null;
for (; ; )
Copy link
Member

@sharwell sharwell Jul 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 while (true) is standard for C#. Relevant because different versions of the C# formatter will do different things with for (;;), but all of them handle while (true) consistently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest version does not have this change (I just kept the original code). Should I change it to while (true)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no. Only if you were changing already.

@ilabutin ilabutin force-pushed the HistoryDictionaryCpuUsage branch from da91db8 to 008773d Compare July 11, 2017 23:34
@ilabutin
Copy link
Contributor Author

ilabutin commented Jul 11, 2017

Squashed all commits into one to keep the simplest change only.

@vancem
Copy link
Contributor

vancem commented Jul 11, 2017

This looks fine (very safe), I will merge it when checks finish.

@sharwell sharwell merged commit 61359bf into microsoft:master Jul 12, 2017
vancem pushed a commit to vancem/perfview that referenced this pull request Dec 20, 2017
HistoryDictionary: Replace manually sorted list with SortedDictionary
@ilabutin ilabutin deleted the HistoryDictionaryCpuUsage branch March 5, 2018 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants