Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve startup time by caching compiled queries on disk #16496

Closed
roji opened this issue Jul 7, 2019 · 7 comments
Closed

Improve startup time by caching compiled queries on disk #16496

roji opened this issue Jul 7, 2019 · 7 comments

Comments

@roji
Copy link
Member

roji commented Jul 7, 2019

In order to improve startup time, we could cache compiled delegates on disk. Technically this doesn't seem very complicated to do, with LambdaExpression.CompileToMethod(). This would make EF Core have some "AOT" characteristics in addition to the current "JIT" behavior.

Some questions that would need to be answered:

  • How much startup time is really taken by query compilation - this may simply not be worth it
  • This depends on Reflection.Emit, whose status in .NET Standard isn't clear (see https://github.com/dotnet/corefx/issues/29365). It seems like this could work by checking at runtime whether emit is supported or not, and bypass this in AOT environments (even though our support of AOT environments isn't ideal in any case).
  • We'd have to decide when the query cache is persisted, how it's evicted (Improve query cache eviction #12905), how the user determines where it's stored, etc. etc.

This is conceptually similar to #1906, which is for caching the model.

@roji roji added the area-perf label Jul 7, 2019
@ajcvickers ajcvickers added this to the Backlog milestone Jul 8, 2019
@mburbea
Copy link

mburbea commented Jul 9, 2019

LambdaExpression.CompileToMethod is only available in .net framework :(

@roji
Copy link
Member Author

roji commented Jul 9, 2019

@mburbea that's true. However, since System.Reflection.Emit is supported (although that's also a bit of a mess), it may be possible to recreate the same thing in .NET Core (see https://stackoverflow.com/questions/41520319/alternatives-of-compiletomethod-in-net-standard). Definitely nothing certain here.

@divega
Copy link
Contributor

divega commented Jul 26, 2019

I am going to assume we are talking about both implicitly compiled and explicitly compiled queries, but I believe most of what I am going to describe applies regardless:

At a high level, query compilation takes a source expression tree and arguments (e.g. closure variables, compiled query arguments, DbContext state that affects the query, etc.) and produces a few distinct outputs:

  1. A target query in whatever representation the target database uses (typically but not always SQL)

  2. Parameter bindings from any inputs into the target query

  3. An object shaper that can be used to convert rows of results form the target database into object instances.

Persisting the query cache has then has several distinct aspects, some of which can be addressed independently:

  1. A serialized representation of the target query. This is trivial if the target query is already SQL or any text representation

  2. The parameter binding, probably easier to rebuild every time.

  3. The object shaper itself. This is where being able to store compiled delegates would help, but in the past we have entertained the idea of generating the code of shapers at compile/design-time using Roslyn and including them in the compiled model which seems to be a more promising approach.

  4. A serialized representation of the source three and any other input that is part of the query cache key (in our cases, nullability of the arguments and mutable DbContext settings), so that implicitly compiled queries can lookup already compiled queries re-hydrated from disk.

@roji
Copy link
Member Author

roji commented Jul 26, 2019

but in the past we have entertained the idea of generating the code of shapers at compile/design-time using Roslyn and including them in the compiled model which seems to be a more promising approach.

It's an interesting idea... The Roslyn-based compile/design-time approach seems like it can be called AOT, as opposed to the purely runtime approach. Here are some preliminary thoughts:

  • It's true that the purely runtime approach above has the challenge of how to dump a compiled delegate (i.e. the shaper) to disk. I think there are reasons to be optimistic, as this was fully possible in .NET Framework - but of course we'd need to check.
  • For the AOT approach, unless we intend to reimplement our entire pipeline over the Roslyn syntax/semantic model, I'm assuming this would mean some sort of translation from Roslyn models to Expression trees, no? If so, don't we run into the same problem as above, i.e. how to persist the compiled delegate?
  • The Roslyn to Expression translation also seems like it could be very non-trivial to do correctly for all cases, have you already gone into the details?
  • Another possible disadvantage of AOT is that it would likely require the user to specifically indicate the queries to be included in compile time - an extra step. In that sense it seems close to the compiled query concept we already have, and may even replace it.
  • On the other hand, the up-front AOT eliminates questions such as which queries to persist, when to persist, and possibly also where to persist (since AOT would likely simply go into the compiled model, but the runtime approach would instead have a sort of "cache").
  • Finally, the AOT approach may end up being faster, since it could eliminate the initial cache lookup, a bit like how today's compiled query API does (not sure if this can actually work). With a runtime approach we'll always have to take an Expression tree, funcletize it and then use it as a key in some cache lookup.

PS One unrelated crazy thought I had on cache lookups, is to use caller information as cache key - the file name and line number could maybe have been used as a much more efficient lookup key. This could theoretically be implemented via DbContext.Set<TEntity>(), but unfortunately properties can't have caller information in C#. Of course there's also the matter of two different queries on the same line.

@GSPP
Copy link

GSPP commented Jul 28, 2019

Some customer feedback:

  1. I like this idea. Latency improvements in startup time would be appreciated.
  2. On the other hand EF compilation time tends to be a small portion of the total startup time in large applications. A large ASP.NET app can easily take 20 seconds to start (which causes quite a loss in productivity).
  3. An alternative strategy would be to persist just the queries to disk. When the application starts, low priority background threads eagerly compile those queries so that they are ready when the critical startup path needs them. The .NET JIT does this ("Background JIT").
  4. From a prioritization standpoint, my top priority is migration from LINQ to SQL. Next priority is steady-state throughput (mainly through reduction of CPU overhead). Next priority is quality of LINQ translation. No further priorities.

Hope this helps.

@roji
Copy link
Member Author

roji commented Jul 28, 2019

@GSPP thanks for your input, FWIW I agree that steady state perf should be prioritized over startup, at least at the moment.

@roji
Copy link
Member Author

roji commented Jun 1, 2021

Replacing with #25009

@roji roji closed this as completed Jun 1, 2021
@roji roji removed this from the Backlog milestone Jun 1, 2021
@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants