Improve performance with better CommandInfo cache locking making Invoke-ScriptAnalyzer nearly twice as fast #1162

bergmeister · 2019-03-04T08:01:02Z

PR Summary

This nearly halves the execution time (measured using PS 5.1 and 6.2). For example when recursively analyzing the test folder of the PowerShell repo, the time goes down from 170 seconds to 100 seconds. When analysing the build.psm1 module of the PowerShell repo (clean, fresh, shell), then the time went down from 11 seconds to 7 seconds.

Background: The bottleneck of PSSA's performance are the CommandInfo lookups (where basically Get-Command gets called), not just because they are expensive but even when results are cached, there was was 1 heavy lock (all PSSA rules are executing in parallel threads and access the Singleton Helper class) around the 3 following operations:

Check if CommandInfo is in Cache
If CommandInfo was not cached, execute Get-Command to get details. This is the most expensive part
Add retrieved CommandInfo to Cache

This PR improves it by:

Using a ConcurrentDictionary for the Cache instead to avoid the heavy lock. This means effectively that step 1 is only under a read lock and step 3 under a write lock.
Because threads can now proceed to step 2 without being blocked, it can happen that a thread issues the same Get-Command request, therefore resulting in unnecessary CPU churn and potentially slowing down execution. To counteract this, step 2 now submits its requests as a Task to another ConcurrentDictionary so that one can check if there is already a request for a particular command, get that task and wait for its result.
The outputwriter field on the Helper class is not used and therefore removed. Some other members could be made readonly

In the future we might enhance it further to

Go Async, but this requires more work since we need to go up to the top to be able to receive value from it.
Pre-initialize the cache with all commands via a background task. This would speed it up by a factor of 10 but is currently not possible to do due to Get-Command: Returned CommandInfo object does not populate ScriptBlock property when -Name parameter is not used PowerShell#8910 because some rules need those additional properties, applying this only to a subset of rules wouldn't lead to a speedup since the slowest rule is the bottleneck of PSSA's execution (because foreach file it waits until each thread for each rule has finished)

PR Checklist

PR has a meaningful title
- Use the present tense and imperative mood when describing your changes
Summarized changes
Change is not breaking
Make sure all .cs, .ps1 and .psm1 files have the correct copyright header
Make sure you've added a new test if existing tests do not effectively test the code changed and/or updated documentation
This PR is ready to merge and is not Work in Progress.
- If the PR is work in progress, please add the prefix WIP: to the beginning of the title and remove the prefix when the PR is ready.

rjmholt · 2019-03-04T17:51:00Z

The principle at work here seems not unlike that I wrote for the profile cache (which had to optimise for concurrent callers wanting optimum parallel execution):

PSScriptAnalyzer/PSCompatibilityAnalyzer/Microsoft.PowerShell.CrossCompatibility/Utility/CompatibilityProfileLoader.cs

Lines 108 to 130 in dff9522

    
           /// <summary> 
        
           /// Load a profile from a path. 
        
           /// Caches profiles based on path, so that repeated calls do not require JSON deserialization. 
        
           /// </summary> 
        
           /// <param name="path">The path to load a profile from.</param> 
        
           /// <returns>A query object around the loaded profile.</returns> 
        
           private Lazy<Task<CompatibilityProfileCacheEntry>> GetProfileFromPath(string path) 
        
           { 
        
               if (path == null) 
        
               { 
        
                   throw new ArgumentNullException(nameof(path)); 
        
               } 
        
               return _profileCache.GetOrAdd(path, new Lazy<Task<CompatibilityProfileCacheEntry>>(() => Task.Run(() => { 
        
                   CompatibilityProfileDataMut compatibilityProfileMut = _jsonSerializer.DeserializeFromFile(path); 
        
                   var compatibilityProfile = new CompatibilityProfileData(compatibilityProfileMut); 
        
                   return new CompatibilityProfileCacheEntry( 
        
                       compatibilityProfileMut, 
        
                       compatibilityProfile); 
        
               }))); 
        
           }

I think we could easily avoid the duplicated work in (2) with lazy task entries -- it would probably also decrease the amount of logic you've had to write here.

If you'd prefer, I could also give this a shot myself.

bergmeister · 2019-03-04T18:51:35Z

Hmm, I think our 2 scenarios are still a bit different. I for example have 2 concurrent dictionaries, one is for the results and one is a way of managing and retrieving currently running requests. This is to keep the memory usage minimal and release allocations of finished tasks because my result dictionary will potentially hold thousands of objects. I don't see how Lazy would be useful in my case. Also, I am not async yet (this would be another PR using runspaces).
Feel free to give it a go but to me this looks like quite a bit of effort to unify the code. Or maybe I am not understanding correctly how you'd apply your approach for my scenario. I am happy to be convinced otherwise :)
Or were you thinking of applying your principle of using a Dictionary of lazy Tasks only to my 2nd dictionary?

rjmholt · 2019-03-04T19:23:30Z

I'll look into it and see what I come up with

rjmholt · 2019-03-05T01:36:31Z

I had a look and talked about it a bit with @daxian-dbw, who made the point that if we want the result immediately, there may not be a good reason to use a Task.

I just wrote a simple implementation around more granular locks in #1166, which seems to get the same performance (compared both modules with current development and got about a 17% speed up in both your and my implementations).

rjmholt · 2019-03-05T01:37:54Z

I also did some experimentation with async, since the PowerShell type has BeginInvoke() and EndInvoke() methods, but didn't see much reason to keep them in the code since we can't quite use them yet

JamesWTruher

very nice - I wonder if there's a way that we can target tests at this (I couldn't immediately think of anything)? It also makes me think that we really need some performance benchmarking (a much longer-term thing). Should we have some sort of xunit tests as we have in PowerShell?
do we have a clear favorite between this and #1166 @rjmholt?

bergmeister · 2019-03-08T05:32:15Z

Rob's version is a bit better performance wise but before his version gets merged we'd have to revert 1 small change as per my comment to preserve legacy behaviour

bergmeister added 7 commits March 2, 2019 21:36

Use ReaderWriterLockSlim for faster performance

d560c09

cleanup

5cf4249

re-order for cleaner diff and make lock readonly

0e12540

remove redundant outputWriter and optimise using job management

6977c56

remove unused variable

e50361f

remove redundant comment

66da684

remove items from processing list

3555f3a

bergmeister added the Area - Engine label Mar 4, 2019

bergmeister requested review from rjmholt and JamesWTruher March 4, 2019 08:06

bergmeister added the (Re-)Review Needed Feedback has been addressed during PR stage or is required in the first place. label Mar 4, 2019

rjmholt mentioned this pull request Mar 5, 2019

Increase lock granularity for CommandInfo cache #1166

Merged

6 tasks

bergmeister removed the (Re-)Review Needed Feedback has been addressed during PR stage or is required in the first place. label Mar 5, 2019

JamesWTruher approved these changes Mar 7, 2019

View reviewed changes

JamesWTruher closed this in #1166 Mar 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance with better CommandInfo cache locking making Invoke-ScriptAnalyzer nearly twice as fast #1162

Improve performance with better CommandInfo cache locking making Invoke-ScriptAnalyzer nearly twice as fast #1162

bergmeister commented Mar 4, 2019

rjmholt commented Mar 4, 2019

bergmeister commented Mar 4, 2019 •

edited

Loading

rjmholt commented Mar 4, 2019

rjmholt commented Mar 5, 2019

rjmholt commented Mar 5, 2019

JamesWTruher left a comment •

edited

Loading

bergmeister commented Mar 8, 2019

Improve performance with better CommandInfo cache locking making Invoke-ScriptAnalyzer nearly twice as fast #1162

Improve performance with better CommandInfo cache locking making Invoke-ScriptAnalyzer nearly twice as fast #1162

Conversation

bergmeister commented Mar 4, 2019

PR Summary

PR Checklist

rjmholt commented Mar 4, 2019

bergmeister commented Mar 4, 2019 • edited Loading

rjmholt commented Mar 4, 2019

rjmholt commented Mar 5, 2019

rjmholt commented Mar 5, 2019

JamesWTruher left a comment • edited Loading

Choose a reason for hiding this comment

bergmeister commented Mar 8, 2019

bergmeister commented Mar 4, 2019 •

edited

Loading

JamesWTruher left a comment •

edited

Loading