-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance: Eliminate Regex overhead in AvoidTrailingWhitespace -> Speedup of 5% (PowerShell 5.1) or 2.5 % (PowerShell 7.1-preview.2) #1465
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use regexes anywhere else in the codebase, we could probably save some performance by just making the regex static and constructing it with RegexOptions.Compile
Rules/AvoidTrailingWhitespace.cs
Outdated
)); | ||
continue; | ||
} | ||
if (line[line.Length - 1] != ' ' && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be better as char.IsWhiteSpace(line[line.Length - 1])
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjmholt because of
- readablity
- perfomance
- covering the variety of unicode chars? from the docs here, it would probably be good but what about the UnicodeCategory.LineSeparator char? I don't have much Unicode experience to make a judgement call here tbh if this list includes too much or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking here is actually just that PowerShell uses that API to see whitespace.
Given how we split the string already, it's possibly dangerous to go by unicode whitespace, but possibly not...
I suspect that really this won't make much difference; leaving non-ASCII whitespace at the ends of lines isn't something I can imagine being an issue for anyone really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so that sounds more like a tendency to use IsWhiteSpace
? I'd be OK with that, you are right that the impact is probably quite low, especially since this rules is not enabled by default for vs-code users.
@@ -36,52 +36,67 @@ public IEnumerable<DiagnosticRecord> AnalyzeScript(Ast ast, string fileName) | |||
|
|||
var diagnosticRecords = new List<DiagnosticRecord>(); | |||
|
|||
string[] lines = Regex.Split(ast.Extent.Text, @"\r?\n"); | |||
string[] lines = ast.Extent.Text.Split(new[] { "\r\n", "\r", "\n" }, StringSplitOptions.None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me wonder: if we're just trying to find the extents of trailing whitespace, there's no need to split the string at all; we should just read through ourselves without allocating all these strings... But too much burden for this PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yh, I hear what you say, I guess for perf what counts is the 80-20 rule :-) Technically speaking string.IndexOf
would probably the fastest way of finding the indices where \s\r
or \s\n
occurs....
I'm aware of lot's of other small micro optimisations that one can make and even tried some but they didn't have a measurable outcome. Therefore I am focussed on just fixing what gives at least a measurable return.
Co-Authored-By: Robert Holt <[email protected]>
PR Summary
Whitespace ignoring diff makes it clearer. This was the most expensive script analysis rule when being run in warm mode and also easy to fix :-)
It also shows the performance improvements in .Net Core 5
PR Checklist
.cs
,.ps1
and.psm1
files have the correct copyright headerWIP:
to the beginning of the title and remove the prefix when the PR is ready.