Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance by more than 400X #350

Merged
merged 2 commits into from
Oct 19, 2018

Conversation

bergbria
Copy link

Scenarios:

  • IEnumerable option with 9000 entries: 15+ hours -> 150ms
  • Options class w/20 string options all set on command line: 960ms -> 2.2ms

The speedups were predominantly achieved by memoizing IEnumerables that are enumerated multiple times. A comparatively minor speedup was achieved by using a HashSet rather than LINQ's .Contains().

There should be no behavioral differences introduced by this change.

It should resolve #186.

Scenarios:
* IEnumerable option with 9000 entries: 15+ hours -> 150ms
* Options class w/20 string options all set on command line: 960ms -> 2.2ms

The speedups were all achieved by memoizing IEnumerables that are enumerated multiple times.
@@ -28,13 +28,15 @@ static class InstanceBuilder
var typeInfo = factory.MapValueOrDefault(f => f().GetType(), typeof(T));

var specProps = typeInfo.GetSpecifications(pi => SpecificationProperty.Create(
Specification.FromProperty(pi), pi, Maybe.Nothing<object>()));
Specification.FromProperty(pi), pi, Maybe.Nothing<object>()))
.Memorize();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used .Memorize() throughout this PR since that seems to be the dominant existing convention.

However, it feels like an anti-pattern to me to pass IEnumerable everywhere rather than, say, IReadOnlyCollection. It makes it difficult to detect multi-enumeration issues visually or via static analysis such as Resharper. IEnumerable definitely has its place, it just feels overused.

What is the rationale behind the current approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale behind the current approach?

IReadOnlyCollection was introduced in .NET 4.5. This library still targets 4.0 on the fat framework.

That, plus the developer was really inspired by functional languages and the "lazy evaluation" that's built in to many of them. Unfortunately, there are (perf) penalties for using C# as if it were a functional language.

I initially added .ToSet() to EnumerableExtensions.cs, not realizing it was a paket file and thus not directly editable in this repo.
@nemec nemec merged commit 8c13152 into commandlineparser:develop Oct 19, 2018
@nemec
Copy link
Contributor

nemec commented Oct 19, 2018

Thank you @bergbria! These changes are awesome. I'm curious if your perf tests showed much of a difference in memory usage due to the caching, but even if there's a difference I imagine it's pretty small.

@bergbria
Copy link
Author

bergbria commented Oct 19, 2018 via email

@bergbria
Copy link
Author

@nemec, if you get a chance today, mind commenting on my deployment question? I'll just default to creating my own package to unblock myself if I don't hear from you.

@bergbria bergbria deleted the bergbria/perf branch October 22, 2018 16:16
@ericnewton76
Copy link
Member

ericnewton76 commented Oct 22, 2018

@bergbria wow, thanks for addressing that! If your perf improvements are what you say, thats incredible.

I'm trying to get a v3 out with updated references to prevent a couple of issues that are popping up with netcore and netframework package dependencies.

We can probably issue a new v2 release to include this. (I'm a little perplexed that you can't move to v2 without it though... lol)

My only real question on these improvements is whether they are using static variables (which will hang around the entire process lifetime) and using a ctor reference that can be GC'd. If its the latter, perfect! (Haven't had a chance to peruse the PR yet)

@bergbria
Copy link
Author

@ericnewton76 I didn't introduce anything static. My only changes were to local variable definitions, so persistent memory usage shouldn't be an issue.

Our code is currently using v1 with some very large rsp files (10k+ arguments passed inside a file). It's performant enough to not be an issue. We can't really move to v2 if arg parsing starts taking 15+ hours. Removing the rsps on our end would be difficult and less desirable than taking the perf fix.

@bergbria
Copy link
Author

@ericnewton76 any chance of getting a new release out today or early next week?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants