Improve startup time by caching compiled queries on disk #16496

roji · 2019-07-07T16:12:16Z

In order to improve startup time, we could cache compiled delegates on disk. Technically this doesn't seem very complicated to do, with LambdaExpression.CompileToMethod(). This would make EF Core have some "AOT" characteristics in addition to the current "JIT" behavior.

Some questions that would need to be answered:

How much startup time is really taken by query compilation - this may simply not be worth it
This depends on Reflection.Emit, whose status in .NET Standard isn't clear (see https://github.com/dotnet/corefx/issues/29365). It seems like this could work by checking at runtime whether emit is supported or not, and bypass this in AOT environments (even though our support of AOT environments isn't ideal in any case).
We'd have to decide when the query cache is persisted, how it's evicted (Improve query cache eviction #12905), how the user determines where it's stored, etc. etc.

This is conceptually similar to #1906, which is for caching the model.

mburbea · 2019-07-09T14:24:56Z

LambdaExpression.CompileToMethod is only available in .net framework :(

roji · 2019-07-09T19:54:26Z

@mburbea that's true. However, since System.Reflection.Emit is supported (although that's also a bit of a mess), it may be possible to recreate the same thing in .NET Core (see https://stackoverflow.com/questions/41520319/alternatives-of-compiletomethod-in-net-standard). Definitely nothing certain here.

divega · 2019-07-26T01:39:25Z

I am going to assume we are talking about both implicitly compiled and explicitly compiled queries, but I believe most of what I am going to describe applies regardless:

At a high level, query compilation takes a source expression tree and arguments (e.g. closure variables, compiled query arguments, DbContext state that affects the query, etc.) and produces a few distinct outputs:

A target query in whatever representation the target database uses (typically but not always SQL)
Parameter bindings from any inputs into the target query
An object shaper that can be used to convert rows of results form the target database into object instances.

Persisting the query cache has then has several distinct aspects, some of which can be addressed independently:

A serialized representation of the target query. This is trivial if the target query is already SQL or any text representation
The parameter binding, probably easier to rebuild every time.
The object shaper itself. This is where being able to store compiled delegates would help, but in the past we have entertained the idea of generating the code of shapers at compile/design-time using Roslyn and including them in the compiled model which seems to be a more promising approach.
A serialized representation of the source three and any other input that is part of the query cache key (in our cases, nullability of the arguments and mutable DbContext settings), so that implicitly compiled queries can lookup already compiled queries re-hydrated from disk.

roji · 2019-07-26T13:13:06Z

but in the past we have entertained the idea of generating the code of shapers at compile/design-time using Roslyn and including them in the compiled model which seems to be a more promising approach.

It's an interesting idea... The Roslyn-based compile/design-time approach seems like it can be called AOT, as opposed to the purely runtime approach. Here are some preliminary thoughts:

It's true that the purely runtime approach above has the challenge of how to dump a compiled delegate (i.e. the shaper) to disk. I think there are reasons to be optimistic, as this was fully possible in .NET Framework - but of course we'd need to check.
For the AOT approach, unless we intend to reimplement our entire pipeline over the Roslyn syntax/semantic model, I'm assuming this would mean some sort of translation from Roslyn models to Expression trees, no? If so, don't we run into the same problem as above, i.e. how to persist the compiled delegate?
The Roslyn to Expression translation also seems like it could be very non-trivial to do correctly for all cases, have you already gone into the details?
Another possible disadvantage of AOT is that it would likely require the user to specifically indicate the queries to be included in compile time - an extra step. In that sense it seems close to the compiled query concept we already have, and may even replace it.
On the other hand, the up-front AOT eliminates questions such as which queries to persist, when to persist, and possibly also where to persist (since AOT would likely simply go into the compiled model, but the runtime approach would instead have a sort of "cache").
Finally, the AOT approach may end up being faster, since it could eliminate the initial cache lookup, a bit like how today's compiled query API does (not sure if this can actually work). With a runtime approach we'll always have to take an Expression tree, funcletize it and then use it as a key in some cache lookup.

PS One unrelated crazy thought I had on cache lookups, is to use caller information as cache key - the file name and line number could maybe have been used as a much more efficient lookup key. This could theoretically be implemented via DbContext.Set<TEntity>(), but unfortunately properties can't have caller information in C#. Of course there's also the matter of two different queries on the same line.

GSPP · 2019-07-28T14:46:40Z

Some customer feedback:

I like this idea. Latency improvements in startup time would be appreciated.
On the other hand EF compilation time tends to be a small portion of the total startup time in large applications. A large ASP.NET app can easily take 20 seconds to start (which causes quite a loss in productivity).
An alternative strategy would be to persist just the queries to disk. When the application starts, low priority background threads eagerly compile those queries so that they are ready when the critical startup path needs them. The .NET JIT does this ("Background JIT").
From a prioritization standpoint, my top priority is migration from LINQ to SQL. Next priority is steady-state throughput (mainly through reduction of CPU overhead). Next priority is quality of LINQ translation. No further priorities.

Hope this helps.

roji · 2019-07-28T16:26:12Z

@GSPP thanks for your input, FWIW I agree that steady state perf should be prioritized over startup, at least at the moment.

roji · 2021-06-01T11:35:44Z

Replacing with #25009

roji added the area-perf label Jul 7, 2019

ajcvickers added the type-enhancement label Jul 8, 2019

ajcvickers added this to the Backlog milestone Jul 8, 2019

smitpatel added the area-query label Nov 19, 2019

roji mentioned this issue Dec 2, 2019

Translate to sql at compile time #19121

Closed

ajcvickers mentioned this issue Jan 23, 2020

Provide guidance on warming up DbContext #19675

Closed

AndriySvyryd mentioned this issue May 13, 2021

Don't use reflection at runtime #24903

Open

5 tasks

roji mentioned this issue Jun 1, 2021

Precompiled queries #25009

Closed

1 task

roji closed this as completed Jun 1, 2021

roji added closed-duplicate and removed area-perf area-query type-enhancement labels Jun 1, 2021

roji removed this from the Backlog milestone Jun 1, 2021

ajcvickers reopened this Oct 16, 2022

ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve startup time by caching compiled queries on disk #16496

Improve startup time by caching compiled queries on disk #16496

roji commented Jul 7, 2019

mburbea commented Jul 9, 2019 •

edited

Loading

roji commented Jul 9, 2019

divega commented Jul 26, 2019

roji commented Jul 26, 2019

GSPP commented Jul 28, 2019 •

edited

Loading

roji commented Jul 28, 2019

roji commented Jun 1, 2021

Improve startup time by caching compiled queries on disk #16496

Improve startup time by caching compiled queries on disk #16496

Comments

roji commented Jul 7, 2019

mburbea commented Jul 9, 2019 • edited Loading

roji commented Jul 9, 2019

divega commented Jul 26, 2019

roji commented Jul 26, 2019

GSPP commented Jul 28, 2019 • edited Loading

roji commented Jul 28, 2019

roji commented Jun 1, 2021

mburbea commented Jul 9, 2019 •

edited

Loading

GSPP commented Jul 28, 2019 •

edited

Loading