-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General QueryContext-based mechanism for caching of generated RelationalCommand #17598
Comments
First one is filed #13617 |
Oh nice, I wasn't aware of that issue (and didn't know so many people have run into it). This issue could provide a general way to deal with that. |
@smitpatel @AndriySvyryd @ajcvickers updated the above based on our recent discussions, hope it's more or less accurate. |
Few things
At very first, we should look into the cache design and delegates based on parameter nullability so that we can avoid having parameter nullability affect cache when parameter is not used. Everything else can be low priority. |
The idea of caching something based on parameter values isn't necessarily specific to relational - especially as all we'd be doing is registering bool-returning lambdas there. But of course if we have another context (or possibly RelationalQCC?) that works too.
@smitpatel sure. I think that means doing part 1 (Delegate-based parameter cache lookup) and leaving QueryContextDependentExpression and client-side parameter transformation for later? |
The idea of caching anything after first level cache is not necessarily norm. The compiled query delegate should remain regardless of parameter values. But relational does not work that way since it generates different SQL command. Let's not pollute core for relational concepts. InMemory certainly does not work that way. Same could be true for Cosmos or other non-relational providers. |
OK. In that case we could extend and have a RelationalQCC, or possibly just have the relevant visitor(s) expose these lambdas on themselves via a property, and have the calling code collect them. |
While it is implementation detail, we don't relationalQCC either. You are going to have a multiplexer to run the caching, just make it part of it. Or as @AndriySvyryd suggested, each visitor can compose over the cache key so they don't have to register anything outside. |
With the limitation that the lambda can only return a bool I don't see how this can be used to fix #13617 as there the cache key would need to contain the length of the list parameters. Here's a counter-proposal that @smitpatel referred to:
For example the default cache key would be class RelationalCommandCacheKey
{
SelectExpression _selectExpression;
bool _useRelationalNulls;
BitArray _paramNullability;
} With class RelationalListCacheKey
{
object _relationalCommandCacheKey;
int _listLength;
} Using value types for cache keys is not beneficial as they are always boxed in |
That's true. If we want to support non-boolean scenario, the cache key can also be a simple
The first part of that makes sense to me as a mechanism for registering the lambdas, on the first time a query is being processed (when we're building the ParameterBasedCacheKeyGenerator). But I'm not understanding the second part: how would we match a query whose parameter values are already in our parameter-based cache? We need to have a single cache key generator somewhere that can be executed on the QueryContext once we reach the parameter-based cache, and which yields a single cache key that we can then look up in the IMemoryCache... In my mental model, things could look something like this: class ParameterBasedCacheKeyGeneratorBuilder
{
// Registration methods for use while building the generator, when first compiling the query variant
public void RegisterLambda(Func<QueryContext, bool> lambda) {...}
public void RegisterLambda(Func<QueryContext, int> lambda) {...}
// Possibly registration for other lambda types if we think we need them
// At the end of compilation we call this to build the generator
public Func<QueryContext, byte[]> Build() {
// Based on all registers lambda, returns a function that calculates the cache key for a given QueryContext.
// For extra perf we could accept lambda expressions, combine them and compile out a single delegate here.
}
} For each query we just execute the lambda returned by Build to get the cache key (a
It may be better to simply scope RelationalCommandCache to a given query, instead of making it global (so instead of a single static dictionary we'd have a dictionary per QueryingEnumerable). This would make the dictionaries smaller, and we wouldn't have to go through useless comparisons with cache keys of other queries. The downside is managing eviction/cache explosion per-query instead of globally.
We could simply include this kind of info inside the cache key, just like parameters on the QueryContext (i.e. as another value packed into the byte[]).
Yeah, that seems unfortunate. One allocation for the cache key is probably OK, although we could look at high-perf alternatives (I have some ideas, but we probably shouldn't look at this for now). |
I am still concerned that by completely decoupling the visitors might result in duplication for nullability, but we can let benchmarks dictate whether this is something that needs to be optimized out.
We still need to wrap it to override
Agree, the lambda expressions registered by the visitors can also just take the
We had already decided that we would use a single global cache to allow a single global memory limit. But we can revisit this, though we should still always have a hard limit on the 2nd level cache.
As long as we produce reasonably distributed hash codes this shouldn't be an issue in practice |
You mean because multiple providers would register lambdas for the same thing (i.e. the nullability of a parameter)? Hopefully the visitors and pipeline can be designed so that this doesn't happen... But we can cross that bridge when we get to it.
True.
Yeah, it's possible - although we may want to involve non-parameter values in the cache. One possibility is the relational null setting, there may be others...
Yeah, that's the downside. If we really want to, we could have a hard limit without necessarily having a global cache, but that would be a custom implementation and not MemoryCache. That's just an idea, to start with we can definitely keep the current system instead and profile/benchmark afterwards. |
I'm facing a problem with Oracle that I believe could benefit from this. The basic use case is an indexed prefix search (StartsWith or EF.Functions.Like) on a million row table.
What I've been doing so far (in EF6, it seems possible in EFC3 too) is creating a custom scalar function with an unguessable name, wrap the parameter like so during query compilation:
then a command interceptor inlines the actual parameter value just before the command executes. The actual query expression would be the same regardless of the parameters but the marked parameter values should always be inlined in the command text. This is very far from ideal and probably should be handled in the rdbms (slim chance in the foreseeable future) but for the time being I'm stuck with this hack. And the occassional devs fleeing in terror when they come across the code. The statement cache trashing has much less perf impact than a full-table scan. |
@bachratyg - I think this issue is bit more advanced that the particular usecase. This issue tracks a way to actually cache the relational command for a set of values for given parameter. In your case, you actually want to inline the parameter value always, which means the set of values the parameter can take is not finite and this caching should not be used for that. Instead you should use non-cached version of RelationalCommand. When |
@bachratyg don't hesitate to ping us if you need some guidance. The constantization of InExpression.Values which @smitpatel mentioned above can be found here, and don't forget to disable caching (as that code does) by calling DoNotCache. |
Thanks for pointing me in the right direction. Indeed this is what I was looking for. No simple way to bypass the command cache was the main problem I was trying to work around when coming up with this solution. This will greatly simplify the code when we finally make the move to 5+. |
Split out "client-side parameter transformations" to #28028. |
Carefully consider how this works (or not) in AOT context. With nullability we only have two possibilities per parameter (null, not null), and can pregenerate them relatively easily. Once this mechanism is generalized, things become more complicated; we can't simply have a lambda producing the cache key, since we don't have actual parameters in AOT context. So we'd need some way of generating all the options. |
/cc @AndriySvyryd this is the issue for generalized, arbitrary SQL caching. |
Our parameter-based query cache currently operates on parameter nullability; this is a mechanism we had in previous versions which allows us to vary SQL based on actual parameter value nullability and is potentially important for perf (#17543). This issue proposes a more general design that could unlock parameter-based caching for other scenarios - not just null checks on parameters.
Scenarios
listParam.Contains(x)
, we expand the list parameter to a constant in the SQL, resulting inx IN (constant1, constant2...)
, since there are no list parameters. Instead, we could expand to a parameterized list:x IN (@p1, @p2)
, based on the number of elements in the given list (i.e. we'd cache one version for 1 param, another for 2 params, etc.). This would allow us to avoid the additional execution-time visitation, and utilise database query plans in a much better way (we need to think about cache explosion and eviction though). This is tracked by IN() list queries are not parameterized, causing increased SQL Server CPU usage #13617, but this feature could be a building block.LEFT(LEN())
expression as we do now (although this would require exposure to method translators). We could even rewrite the search pattern to escape the wildcard characters.@p ? A : B
)x = ANY (@listParam)
. However, depending on whether the parameter contains any nulls, an additional null check needs to be added (OR x IS NULL
). We could cache two command versions - one for when the list contains null, and another when it doesn't.x = @listParamElement
, which apparently is more efficient (PG apparently doesn't reuse generic query plans with array parameters? /cc @vonzshik). For empty arrays we could also optimize away.Proposal
Delegate-based parameter cache lookup
For the following, let's concentrate on reimplementing the existing nullability-based caching.
boolParam ? x : y
we would elide the condition entirely in SQL. See also Additional refactoring of Null Semantics #19168.Assumption/limitation:
QueryContextDependentExpression
The above takes care of nullability-based caching, where the visitor(s) involved run after the parameter-based cache has been consulted. However, we also want to vary SQL for parts of the tree that are handled before the parameter-based cache is consulted, and therefore cannot yet access the QueryContext. For example, we want to emit different SQLs for StartsWith, which is translated early. To support this, we can introduce a new expression type, QueryContextDependentExpression, which can be "resolved" into another, concrete expression, later in the pipeline when the QueryContext is known.
Optional: client-side parameter transformations
While not technically part of this proposal, this feature is important to support the scenarios above. Ideally, we'd have a way to perform arbitrary parameter transformations before sending out parameters. Scenarios for this include:
We can discuss later whether a new parameter would be added or the existing value would be mutated.
The text was updated successfully, but these errors were encountered: