Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance with better CommandInfo cache locking making Invoke-ScriptAnalyzer nearly twice as fast #1162

Closed

Conversation

bergmeister
Copy link
Collaborator

PR Summary

This nearly halves the execution time (measured using PS 5.1 and 6.2). For example when recursively analyzing the test folder of the PowerShell repo, the time goes down from 170 seconds to 100 seconds. When analysing the build.psm1 module of the PowerShell repo (clean, fresh, shell), then the time went down from 11 seconds to 7 seconds.

Background: The bottleneck of PSSA's performance are the CommandInfo lookups (where basically Get-Command gets called), not just because they are expensive but even when results are cached, there was was 1 heavy lock (all PSSA rules are executing in parallel threads and access the Singleton Helper class) around the 3 following operations:

  1. Check if CommandInfo is in Cache
  2. If CommandInfo was not cached, execute Get-Command to get details. This is the most expensive part
  3. Add retrieved CommandInfo to Cache

This PR improves it by:

  • Using a ConcurrentDictionary for the Cache instead to avoid the heavy lock. This means effectively that step 1 is only under a read lock and step 3 under a write lock.
  • Because threads can now proceed to step 2 without being blocked, it can happen that a thread issues the same Get-Command request, therefore resulting in unnecessary CPU churn and potentially slowing down execution. To counteract this, step 2 now submits its requests as a Task to another ConcurrentDictionary so that one can check if there is already a request for a particular command, get that task and wait for its result.
  • The outputwriter field on the Helper class is not used and therefore removed. Some other members could be made readonly

In the future we might enhance it further to

  • Go Async, but this requires more work since we need to go up to the top to be able to receive value from it.
  • Pre-initialize the cache with all commands via a background task. This would speed it up by a factor of 10 but is currently not possible to do due to Get-Command: Returned CommandInfo object does not populate ScriptBlock property when -Name parameter is not used PowerShell#8910 because some rules need those additional properties, applying this only to a subset of rules wouldn't lead to a speedup since the slowest rule is the bottleneck of PSSA's execution (because foreach file it waits until each thread for each rule has finished)

PR Checklist

@bergmeister bergmeister added the (Re-)Review Needed Feedback has been addressed during PR stage or is required in the first place. label Mar 4, 2019
@rjmholt
Copy link
Contributor

rjmholt commented Mar 4, 2019

The principle at work here seems not unlike that I wrote for the profile cache (which had to optimise for concurrent callers wanting optimum parallel execution):

/// <summary>
/// Load a profile from a path.
/// Caches profiles based on path, so that repeated calls do not require JSON deserialization.
/// </summary>
/// <param name="path">The path to load a profile from.</param>
/// <returns>A query object around the loaded profile.</returns>
private Lazy<Task<CompatibilityProfileCacheEntry>> GetProfileFromPath(string path)
{
if (path == null)
{
throw new ArgumentNullException(nameof(path));
}
return _profileCache.GetOrAdd(path, new Lazy<Task<CompatibilityProfileCacheEntry>>(() => Task.Run(() => {
CompatibilityProfileDataMut compatibilityProfileMut = _jsonSerializer.DeserializeFromFile(path);
var compatibilityProfile = new CompatibilityProfileData(compatibilityProfileMut);
return new CompatibilityProfileCacheEntry(
compatibilityProfileMut,
compatibilityProfile);
})));
}

I think we could easily avoid the duplicated work in (2) with lazy task entries -- it would probably also decrease the amount of logic you've had to write here.

If you'd prefer, I could also give this a shot myself.

@bergmeister
Copy link
Collaborator Author

bergmeister commented Mar 4, 2019

Hmm, I think our 2 scenarios are still a bit different. I for example have 2 concurrent dictionaries, one is for the results and one is a way of managing and retrieving currently running requests. This is to keep the memory usage minimal and release allocations of finished tasks because my result dictionary will potentially hold thousands of objects. I don't see how Lazy would be useful in my case. Also, I am not async yet (this would be another PR using runspaces).
Feel free to give it a go but to me this looks like quite a bit of effort to unify the code. Or maybe I am not understanding correctly how you'd apply your approach for my scenario. I am happy to be convinced otherwise :)
Or were you thinking of applying your principle of using a Dictionary of lazy Tasks only to my 2nd dictionary?

@rjmholt
Copy link
Contributor

rjmholt commented Mar 4, 2019

I'll look into it and see what I come up with

@rjmholt
Copy link
Contributor

rjmholt commented Mar 5, 2019

I had a look and talked about it a bit with @daxian-dbw, who made the point that if we want the result immediately, there may not be a good reason to use a Task.

I just wrote a simple implementation around more granular locks in #1166, which seems to get the same performance (compared both modules with current development and got about a 17% speed up in both your and my implementations).

@rjmholt
Copy link
Contributor

rjmholt commented Mar 5, 2019

I also did some experimentation with async, since the PowerShell type has BeginInvoke() and EndInvoke() methods, but didn't see much reason to keep them in the code since we can't quite use them yet

@bergmeister bergmeister removed the (Re-)Review Needed Feedback has been addressed during PR stage or is required in the first place. label Mar 5, 2019
Copy link
Contributor

@JamesWTruher JamesWTruher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice - I wonder if there's a way that we can target tests at this (I couldn't immediately think of anything)? It also makes me think that we really need some performance benchmarking (a much longer-term thing). Should we have some sort of xunit tests as we have in PowerShell?
do we have a clear favorite between this and #1166 @rjmholt?

@bergmeister
Copy link
Collaborator Author

Rob's version is a bit better performance wise but before his version gets merged we'd have to revert 1 small change as per my comment to preserve legacy behaviour

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants