Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DryIoc cold start performance #27

Closed
Fruchuxs opened this issue Sep 28, 2018 · 72 comments
Closed

DryIoc cold start performance #27

Fruchuxs opened this issue Sep 28, 2018 · 72 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request
Milestone

Comments

@Fruchuxs
Copy link

Fruchuxs commented Sep 28, 2018

After one week of investigations we found out, that DryIoc 3.0.2 is rather slow in comparison to 2.12.8 at the first execution time (cold start). In most cases the first call of a C# application doesn't count. But we measured a time increase of ca. six seconds for greater object graphs in our application. So for example our asp.net core 2.1 application starts, DryIoc resolves some small Services, than the applications tries to resolve a controller and .. we can make a cup of coffee. For a web applications this is just uncomfortable (because after the JIT compiling it's fast), but for one of our console applications this is really painful.

We looked at the expression trees which dryioc generates in version 3.0.2 and 2.12.8 for one of our object graphs. Too much to post here and to analyze in depth, but we can say, that the dryioc 3.0.2 expression tree is roundabout 90% bigger and seems even more complex.

We didn't changed the service registrations much. We just replaced Reuse.InWebrequest through Reuse.Scoped and two RegisterDelegates through RegisterMany as you mentioned out here.

Maybe you can say more about this? In most of the cases we use simple service registrations like Register<IA, A>, or factories, or made: Parameters.Of.Type ... We also use the IEnumerable-resolve Feature.

@dadhi
Copy link
Owner

dadhi commented Sep 28, 2018

Hi @Fruchuxs ,

v3 does not reuse expressions across the object graph, so if you have A in multiple places in the object graph, you will have new expressions each time. Basically, the expression cache was removed, because it was really hard to figure out the invalidation pattern.

In upcoming v3.1 the expression reuse (cache) is back. But it has a different transient nature, which is simple to manage. It should improve cold-start performance.

For the other use-cases, I am not sure. Could you provide some notable parts of the object graphs, where you see an increase comparing to v2?

@dadhi
Copy link
Owner

dadhi commented Sep 28, 2018

You can use container.Resolve<LambdaExpression>(serviceType) to get an actual expression. Just don't forget to OpenScope if you are resolving a scoped service.

@Fruchuxs
Copy link
Author

Hello again, thanks for your quick responses. I think I will await v3.1. With your explanations it makes sense, why the graphs seems more complex, because of the nested graphs. So no problems here I think. Can you give a time span you think you are ready with v3.1? Just to examinate if we need to downgrade.

@dadhi
Copy link
Owner

dadhi commented Sep 28, 2018

Next week, hopefully.

@dadhi
Copy link
Owner

dadhi commented Sep 28, 2018

Another idea, I can release a preview of 3.1 now.

Would you be able to check performance using it?

@Fruchuxs
Copy link
Author

Sounds nice. I can test it, maybe till monday.

@dadhi
Copy link
Owner

dadhi commented Oct 2, 2018

The late update..

While working in 3.1 preview release, I have found (remembered) other problem with .net core targeting. That .net core won't use FEC, oops :(

That was done close to v3 because of complexity (new projects needed) to support .netstandard 1.3 and 2.0 versions.

But the main obstacle is that I have used old csproj system.

I went ahead, and started to bring everything under new csprojs with multi-targeting. At the moment I am almost complete to release the preview. Hopefully, tomorrow.

@dadhi
Copy link
Owner

dadhi commented Oct 3, 2018

Preview is out: https://www.nuget.org/packages/DryIoc.dll/3.1.0-preview-01

@Fruchuxs
Copy link
Author

Fruchuxs commented Oct 4, 2018

Thank you for the prerelease.
But we got an exception in our main project we can't explain. I assume a problem with
DryIoc.Microsoft.DependencyInjection
. A Testproject with only DryIoc works fine.

System.IO.FileLoadException: 'Could not load file or assembly 'DryIoc, Version=3.0.2.0, Culture=neutral, PublicKeyToken=null'. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)'

I checked all our projects and can't find a reference to 3.0.2.0, but VS generates 3.0.2.0 references in *.assembly.cache and *.deps.json files. Looks strange to me.

@dadhi
Copy link
Owner

dadhi commented Oct 4, 2018

This is strange, the problem either with assembly versions in preview, or with Di.Ms.Di deps. Will check.

@dadhi
Copy link
Owner

dadhi commented Oct 4, 2018

@Fruchuxs
Copy link
Author

Fruchuxs commented Oct 4, 2018

Got the same exception with different assembly version number (3.1.0.0). We looked into the .net standard 2.0 assembly with ILSpy and it seems, that the assembly version is not set ("0.0.0.0"). Dunno how this happened, your cproj Files seems to be correct.

@dadhi
Copy link
Owner

dadhi commented Oct 4, 2018

But now I know a possible reason. Thank you. I will release new DryIoc preview 2 with fix.

@dadhi
Copy link
Owner

dadhi commented Oct 4, 2018

@Fruchuxs
Copy link
Author

Fruchuxs commented Oct 5, 2018

Not so fast as 2.12.8, but a greatly increase to 3.0 in our cases (~0,5 seconds difference). Over all it looks great. Thank you.

@dadhi
Copy link
Owner

dadhi commented Oct 5, 2018

Interesting,
I would like to look at your object graph. Maybe it something simple to fix or optimize.

@Fruchuxs
Copy link
Author

Fruchuxs commented Oct 5, 2018

Here it is.
tree.txt

@dadhi
Copy link
Owner

dadhi commented Oct 5, 2018

Thank you very much. It is a deep tree indeed.

Could you try to add the rule to the container:

new Container(rules => rules.WithoutDependencyDepthToSplitObjectGraph());

or via:

container.With(rules => rules.WithoutDependencyDepthToSplitObjectGraph());

or with DependencyDepth set to a bigger than 8 (a default value):

container.With(rules => rules.WithDependencyDepthToSplitObjectGraph(20));

and check the results.

The reason is the default Rules.DefaultDependencyDepthToSplitObjectGraph is set to 8. Maybe, it is not a good default value. It would be good to find out.

The number is used to split an object sub-graph deeper than the 8th level to the separate Resolve call,
But this nested call requires a context from the upper graph (for instance, if you need a condition based injection, or find a mismatching lifestyle), so there goes call-stack recreation with Request.Empty.Push(..).Push(..).Push(..),.., etc. So, here is the tradeoff.

BTW, The mechanizm in v2 for splitting the large object graph was different, based on whole amount of dependencies rather than on the depth.

@Fruchuxs
Copy link
Author

Fruchuxs commented Oct 5, 2018

Thank you for the information and explanation. But with WithoutDependencyDepthToSplitObjectGraph the container resolves slower and the graph seems bigger (a little bit). In conclusion it doesn't matter.

@Fruchuxs
Copy link
Author

Fruchuxs commented Oct 5, 2018

Okay, I tested with another graph (but with a similar depth).
Here the results:

WithDependencyDepthToSplitObjectGraph(20)
00:00:00.1715299

WithDependencyDepthToSplitObjectGraph(2) // just for the science
00:00:00.1683051

WithDependencyDepthToSplitObjectGraph(8)
00:00:00.1934737

With Default Constructor
00:00:00.1531935

WithoutDependencyDepthToSplitObjectGraph
00:00:00.1361311

@Fruchuxs
Copy link
Author

Fruchuxs commented Oct 5, 2018

But the measurements fluctating .. As I said, I don't think it matters in our cases.

@dadhi
Copy link
Owner

dadhi commented Oct 5, 2018

Thank you for measuring!

Seems like using the depth split does not add any value in your case.

Need to go deeper..

@dadhi
Copy link
Owner

dadhi commented Oct 5, 2018

Could you use a ExpressionToCodeLib nuget package and dump the expression without depth argument?

@Fruchuxs
Copy link
Author

Fruchuxs commented Oct 5, 2018

Here the Tree To Code output from the tree I used for measuring.

treetocode.txt

@dadhi
Copy link
Owner

dadhi commented Oct 5, 2018

That's what I need! Thank you.

@dadhi
Copy link
Owner

dadhi commented Oct 5, 2018

Shame on me, the FastExpressionCompiler was not included yet because of different compile constant :-(
Now fixed in DryIoc v3.1.0-preview-03.
Please, check.

@Fruchuxs
Copy link
Author

Fruchuxs commented Oct 5, 2018

Btw. (I don't tested pv03 yet) we investigated with pv02, that we got an lifespan exception in our web project:

ContainerException: Dependency AuthStateManager as parameter "aAuthStateManager" reuse Scoped {Lifespan=100} lifespan shorter than its parent's: scoped AuthenticationEvents #284 from Container with Scope {no name} with Rules with {TrackingDisposableTransients, SelectLastRegisteredFactory} and without {ThrowOnRegisteringDisposableTransient} with FactorySelector= with Made.Of= To turn Off this error, specify the rule with new Container(rules => rules.WithoutThrowIfDependencyHasShorterReuseLifespan()).

The problem is, that AuthStateManager is added via dryIoc as reuse: Reuse.Scoped and AuthenticationEvents via .net DI with AddScoped. We replaced the .net DI registration with a dryIoc equivalent and we don't got the exception anymore. We checked the Code in Di.Ms.Di and discovered that AddScoped is registred in dryIoc as Reuse.ScopedOrSingleton. So we changed our replaced service registration with Reuse.Scoped to Reuse.ScopedOrSingletone and the exception occured again. We tested out if our services was added as singleton by dryioc, because maybe no scope was open and the new version is just more restrictive, but we had to recognize that the object is created at every request and not just first.
In conclusion it seems, that dryioc means that ScopedOrSingleton is more healthy as Scoped which seems strange.

@dadhi
Copy link
Owner

dadhi commented Oct 5, 2018

Mm, ok. Another thing to check.

Have an idea, though...

@ahydrax
Copy link

ahydrax commented Nov 6, 2018

Same, I even tried to diff both files: exactly the same. ~17346kB

@dadhi
Copy link
Owner

dadhi commented Nov 6, 2018

Ok, could you try a different value for .WithDependencyDepthToSplitObjectGraph(5) like 5 or something?

@ahydrax
Copy link

ahydrax commented Nov 6, 2018

Tried with 2 and 5. Nothing changed. Could that be a problem with ExpressionToCode itself?

@dadhi
Copy link
Owner

dadhi commented Nov 6, 2018

Maybe... try to do just Expression.ToString()

@ahydrax
Copy link

ahydrax commented Nov 6, 2018

Tried with ToString(). Got 21mb file for 2, 5, 8, Without.

@dadhi
Copy link
Owner

dadhi commented Nov 6, 2018

Ohh sorry, a container.GenerateResolutionExpressions(...) maybe not the best tool, cause it tries to remove the run-time optimizations for generational scenarios. Instead, try to use:

var expr = container.Resolve<LambdaExpression>(typeof(MyControllerWithLotOfDependencies));

You need to open a scope if MyControllerWithLotOfDependencies registered with scope reuse:

using (var scope = container.OpenScope(Reuse.WebRequestScopeName))
    var expr = scope.Resolve<LambdaExpression>(typeof(MyControllerWithLotOfDependencies));

@ahydrax
Copy link

ahydrax commented Nov 6, 2018

@dadhi
Finally got different results:
without - 14mb
8 - 3.7mb
5 - 937kb
2 - 6kb

Does that mean that I should use 2? How it's connected with other aspects: lifetime control/peformance/etc?

@dadhi
Copy link
Owner

dadhi commented Nov 6, 2018

Thanks for results. The value is greatly depends on context. Like how many and what are types of reuse, do you have a lazy/func deps, how deep or wide a graph. Try to peek some sensible value between 5 and 8 for instance. Previous DryIoc default was 8 (v3.0). I did other changes to decrease a result expression size (see the whole thread). I am planning a new changes/options as well (#40).

Maybe I will return back the default 8. So your and @Fruchuxs inputs are very valuable here.

@dadhi
Copy link
Owner

dadhi commented Nov 6, 2018

I wanted to add on resolving as LambdaExpression. You may experiment further by compiling the expression and (optionally) caching it in whatever suitable data structure. This way you are excluding all the container "magic" and use the IoC container as an automated object creation tool.

var expr = scope.Resolve<LambdaExpression>(typeof(MyControllerWithLotOfDependencies));

var factory = (FactoryDelegate)expr.Compile(); 
var controller = (MyControllerWithLotOfDependencies)factory(container);

Replace container with scope for scoped controller.

Btw, this also excludes FEC from the equatation, see #39.

@dadhi
Copy link
Owner

dadhi commented Nov 6, 2018

Just in case, I would also love to see a speed/memory comparison on your real graph between expr.Compile() and expr.CompileFast().

@ahydrax
Copy link

ahydrax commented Nov 7, 2018

Hi @dadhi,

As you requested, here is benchmark of expr.Compile() and expr.CompileFast().
The benchmark code is here.

Benchmark parameters:

Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3190.0
  Job-MZJNWR : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3190.0

InvocationCount=10001  IterationCount=2  LaunchCount=1
RunStrategy=Monitoring  UnrollFactor=1  WarmupCount=1

With .WithDependencyDepthToSplitObjectGraph(2)

        Method |      Mean | Error |    StdDev | Ratio | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
-------------- |----------:|------:|----------:|------:|------------:|------------:|------------:|--------------------:|
 CompileNormal | 634.05 us |    NA | 1.1571 us |  1.00 |      6.5993 |      2.1998 |           - |            27.07 KB |
   CompileFast |  82.54 us |    NA | 0.0558 us |  0.13 |      6.6993 |      3.2997 |      0.2000 |            27.47 KB |

With .WithDependencyDepthToSplitObjectGraph(3)

        Method |     Mean | Error |   StdDev | Ratio | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
-------------- |---------:|------:|---------:|------:|------------:|------------:|------------:|--------------------:|
 CompileNormal | 985.9 us |    NA | 8.852 us |  1.00 |     30.3970 |           - |           - |           124.76 KB |
   CompileFast | 714.4 us |    NA | 7.205 us |  0.72 |     90.5909 |     46.6953 |      5.7994 |           359.43 KB |

With .WithDependencyDepthToSplitObjectGraph(5)

        Method |      Mean | Error |    StdDev | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
-------------- |----------:|------:|----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
 CompileNormal |  4.312 ms |    NA | 0.0017 ms |  1.00 |    0.00 |    172.8272 |     85.9141 |           - |             1.04 MB |
   CompileFast | 17.192 ms |    NA | 0.0625 ms |  3.99 |    0.02 |   2928.0719 |    161.8382 |     52.9471 |             12.3 MB |

With .WithDependencyDepthToSplitObjectGraph(7)

        Method |     Mean | Error |    StdDev | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
-------------- |---------:|------:|----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
 CompileNormal | 11.17 ms |    NA | 0.1530 ms |  1.00 |    0.00 |    504.9900 |    221.5569 |     75.8483 |             2.52 MB |
   CompileFast | 87.91 ms |    NA | 0.3260 ms |  7.87 |    0.08 |  16988.0240 |   1221.5569 |    189.6208 |            71.72 MB |

Seems that CompileFast grows exponentially depending on depth of graph.

@dadhi
Copy link
Owner

dadhi commented Nov 7, 2018

Thanks for the tests, very interesting. I would try the latest FEC v2 (after it is released) and will decide what to do.

Btw, seems that actual problems in your case was caused by FEC. Hopefully, will fix this soon(ish).

@dzmitry-lahoda
Copy link
Contributor

May be related to dadhi/FastExpressionCompiler#89 (comment).

Once I proposed to get real graphs into benchmark, created issue for voting, but it was closed without consideration danielpalme/IocPerformance#86.

@dzmitry-lahoda
Copy link
Contributor

Sot it next will not work for some reasons?

var expr = scope.Resolve<FastExpressionCompiler.LightExpression.LambdaExpression>(typeof(MyControllerWithLotOfDependencies)

@dadhi
Copy link
Owner

dadhi commented Nov 7, 2018

FEC.LightExpression is not supported cause FEC 2.0 is not yet integrared into DryIoc. Originally, the ExpressionInfo was not supported as a wrapper, as I did not think it was useful for the end client. After integration with FEC 2.0 maybe I will reconsider.. maybe

@dadhi
Copy link
Owner

dadhi commented Nov 7, 2018

@ahydrax,

I have released v3.1-preview-06 which sets a default depth to 5 for now, and moreover, simplifies the expressions with root scoped service (controller in your case). The result expression should have one less nested lambda (minus one compile step).

@ahydrax
Copy link

ahydrax commented Nov 8, 2018

Hi @dadhi, here is benchmark results for preview-06.

BenchmarkDotNet=v0.11.2, OS=Windows 10.0.17763.55 (1809/October2018Update/Redstone5)
Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3190.0
  Job-GOPLAQ : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3190.0

InvocationCount=1001  IterationCount=2  LaunchCount=1
RunStrategy=Monitoring  UnrollFactor=1  WarmupCount=1

        Method |      Mean | Error |    StdDev | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
-------------- |----------:|------:|----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
 CompileNormal |  4.138 ms |    NA | 0.0266 ms |  1.00 |    0.00 |    172.8272 |     85.9141 |           - |             1.04 MB |
   CompileFast | 16.984 ms |    NA | 0.0015 ms |  4.10 |    0.03 |   2578.4216 |    162.8372 |     52.9471 |            12.29 MB |

My observations (container created with Rules.Default):

  • My app doesn't crash anymore with StackOverflowException under IIS Express
  • Memory footprint drastically decreased (~1Gb instead of 4Gb and more - had the same with Autofac, I consider it's ok)

@Fruchuxs
Copy link
Author

Fruchuxs commented Nov 8, 2018

Yes, using .WithoutDependencyDepthToSplitObjectGraph() on 3.1.0-preview gives bigger output than stable version.

As I mentioed before, but our graph is not as big as @ahydrax Graph.

uses 14 dependencies

I don't want to critisize your code nor your approaches, but with 14 dependencies there are maybe other problems here. If your actions uses all of these dependencies (high cohesion) it's okay, but if not, you shall split your controller into different ones. The Dependency injection pattern respectively the inversion of control principle assumes, that your classes fulfill the single responsibility principle.

This doesn't mean, that DryIoc has problems here, but an IoC system has some constraints.

@ahydrax
Copy link

ahydrax commented Nov 8, 2018

I don't want to critisize your code nor your approaches, but with 14 dependencies there are maybe other problems here. If your actions uses all of these dependencies (high cohesion) it's okay, but if not, you shall split your controller into different ones. The Dependency injection pattern respectively the inversion of control principle assumes, that your classes fulfill the single responsibility principle.

I agree with you: having 14 dependencies might be an indicator that something wrong with the code itself.
Anyway, I didn't expect that changing DI container will break overall performance and/or its stability, especially that DryIoC is positioned as more performant container.

@dadhi
Copy link
Owner

dadhi commented Nov 8, 2018

Performance of containers depends on the use-case. Here, we likely hit the bug + specific case situation. But I glad for feetback, cause it pushed the boundaries for me.

@dadhi
Copy link
Owner

dadhi commented Dec 1, 2018

@ahydrax Hi, I have released a new preview v4.0.0-preview-01 with all the improvements from #45
Could you try it out in your big-bang setup?

@ahydrax
Copy link

ahydrax commented Dec 2, 2018

Hi @dadhi , sure, approximately on Monday I'll post the results.

@CoskunSunali
Copy link

@ahydrax Do you have any updates to share with us?

@dadhi dadhi modified the milestones: 3.1.0, 4.0.0 Feb 22, 2019
@dadhi dadhi self-assigned this Feb 22, 2019
@dadhi dadhi added the enhancement New feature or request label Feb 22, 2019
@dadhi
Copy link
Owner

dadhi commented Feb 22, 2019

Here is the latest results: #26 (comment)

I will consider this issue closed, please open a new one if you see the problem.

@dadhi dadhi closed this as completed Feb 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants