Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Issue with EF Core 7 DbContext ServiceProviderCache #31539

Closed
easaevik opened this issue Aug 23, 2023 · 18 comments
Closed

Memory Issue with EF Core 7 DbContext ServiceProviderCache #31539

easaevik opened this issue Aug 23, 2023 · 18 comments
Labels
area-type-mapping closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported Servicing-approved type-bug
Milestone

Comments

@easaevik
Copy link

We have a .net 7 app using EF Core 7 running on Azure App Service. It's an API for both our web app and mobile app. Memory usage typically grows from 50MB to 3GB during peak hours (8-16) and only get released when things calm down at approximately 16 in the afternoon. During these 8 hours we have around 60k requests to our API. It's a multitenant app with Azure Sql database per tenant and approximately 180 tenants. We use ModelCacheKeyFactory to get correct global query filters for each tenant. Not only that, but we also have to differentiate between user roles in a tenant, given that we do authorization through EF Core global query filters.

image

Not sure what happens in the afternoon, but could it be IIS recycling? It's not restarting, that I have checked. Anyway, what could possibly cause such a big pile-up of memory in Gen 2 and the DbContext ServiceProviderCache?

image

image

image

As for the code I have tried to short it down to the essentials. In out startup.cs we use services.AddDbContext<IXDbContext, XDbContext>(); so nothing fancy there, but in the DbContext there is a lot more going on:

image

image

The EntityDbContextManager uses reflection to get the correct Authorization handler:

image

A typical Authorization handler for a given entity (i.e. User) looks like this:

image

So all in all a lot going on here. The question is why does ServiceProviderCache retain so much memory under load? The system works as it should, just not happy with all this retained bytes ending up in Gen 2. I fear there could be problems ahead as more tenants are added. Any help would be greatly appreciated.

@roji
Copy link
Member

roji commented Aug 23, 2023

It seems like what's actually occupying memory is many object[] instances; can you please drill down to see the path which roots them (e.g. inside ServiceProviderCache), and also report what they actually contain?

@abydal
Copy link

abydal commented Aug 23, 2023

Second this - we are having an issue that is very similar to this.

  • Multitenant app - database per tenant, ~160 tenants, Azure SQL
  • We use this pattern for switching between models:
public sealed class DynamicModelCacheKeyFactory : IModelCacheKeyFactory
{
	public object Create(DbContext context, bool designTime) =>
		context is IDynamicModelDbContext { CacheKey: not null } dynamicDbModelContext
			? new DynamicModelCacheKey(context, designTime, dynamicDbModelContext.CacheKey)
			: new ModelCacheKey(context, designTime);
}
public sealed class DynamicModelCacheKey : ModelCacheKey
{
	private readonly string cacheKey;

	public DynamicModelCacheKey(DbContext context, bool designTime, string cacheKey)
		: base(context, designTime) =>
		this.cacheKey = cacheKey;

	protected override bool Equals(ModelCacheKey other) => other is DynamicModelCacheKey otherCacheKey
			&& base.Equals(otherCacheKey)
			&& otherCacheKey.cacheKey == cacheKey;

	public override int GetHashCode() => HashCode.Combine(base.GetHashCode(), cacheKey);
}
  • We see the SqlServerValueGeneratorCache grow steadily during peak hours
    image

    • But we reboot servers in the evening in order to wipe the memory - it does not seem to get released by itself.

One difference is that we run on linux machines.

We also have some memory analysis that we can post, I just dont have access to them right now.

@roji
Copy link
Member

roji commented Aug 23, 2023

@abydal you seem to be reporting a case where the number of SqlServerValueGeneratorCache is growing, whereas the original report above seems to have only two instances (which themselves hold a huge number of object[] instances.

Though I might be misunderstanding here. The easiest thing really is for someone to produce a minimal, runnable code sample with a loop, showing the memory growing and not getting collected.

@easaevik
Copy link
Author

@roji It's a bit of a mystery to me what these object arrays are. The content seems to be a bit of everything from what I can see in dotMemory. There are 103 of them in this dump, but they are not the same kind of objects. Only one retain a lot of memory, the others don't. Drilling into the one instance with 2.4GB retained bytes shows the following:

image

image

image

So it's an object array of size 4080. A lot of different stuff in there, including 1 reference to ServiceProviderCache, holding onto 2.4GB of memory:

image

Not really been able to reproduce this scenario locally with profiling, so I'm looking at prod memory dumps.

@roji
Copy link
Member

roji commented Aug 24, 2023

@easaevik can you drill down into ServiceProviderCache to show the ownership path leading to the huge data?

@abydal
Copy link

abydal commented Aug 24, 2023

This is our dominators tree, with SqlServerValueGeneratorCachebeing the top footprint.
image
We do services.AddScoped<OurDbContext, OurDbContext>() so it seems that all the DI wrappings are not present in our stack.
To me these two cases seems related, but do you still believe this is not the case @roji ?

@easaevik
Copy link
Author

@roji Here is the drill down into the ServiceProviderCache:

image

@abydal
Copy link

abydal commented Aug 28, 2023

I think we can say that we have identified the issue on our end at least. I suspect this is your problem as well @easaevik.
I will try to explain.
The root issue is in the class ValueGeneratorCache here. This class sets up a cache for all value generators for all models for all dbcontexts where the CacheKey is a struct that contains IProperty and ITypeBase. When the application is designed to be multi-tenant using one database per tenant, it leads to, in our case, that we get one entry in this cache per model per tenant. And the cache keys in on themselves takes up quite a bit of memory each. We have over 100 models in our db context and close to 150 tenants which leads to 100*150 entries * the size of the entry and the CacheKey. Its just too big a footprint per customer in our case.

We have currently solved this by implementing our own version of this class where we have a slightly different implementation of the Equals function and differentiate on EntityType.ClrType instead of just EntityType.

public class MultiTenantValueGeneratorCache : IValueGeneratorCache
{
	public MultiTenantValueGeneratorCache(ValueGeneratorCacheDependencies dependencies) => Dependencies = dependencies;
	protected virtual ValueGeneratorCacheDependencies Dependencies { get; }
	private readonly ConcurrentDictionary<CacheKey, ValueGenerator> cache = new();
	private readonly struct CacheKey : IEquatable<CacheKey>
	{
		public CacheKey(IProperty property, IEntityType entityType)
		{
			Property = property;
			EntityType = entityType;
		}
		public IProperty Property { get; }
		public IEntityType EntityType { get; }
		public bool Equals(CacheKey other) =>
			EntityType.ClrType == other.EntityType.ClrType && Property.Name == other.Property.Name;
		public override bool Equals(object obj) => obj is CacheKey cacheKey && Equals(cacheKey);
		public override int GetHashCode() => HashCode.Combine(Property.Name, EntityType.ClrType);
	}
	public virtual ValueGenerator GetOrAdd(
		IProperty property,
		IEntityType entityType,
		Func<IProperty, IEntityType, ValueGenerator> factory)
		=> cache.GetOrAdd(new CacheKey(property, entityType), static (ck, f) => f(ck.Property, ck.EntityType), factory);
}

This will eliminate duplicate entries in this cache and it reduced our memory-footprint quite drastically.
There are probably some internal functionality in EFCore that will break with this implementation. I'm imagining HiLo will not work, but we dont use it so that is noe issue for us. So far it seems to work for us. All tests are green. Maybe you can chime in on this @roji

@roji
Copy link
Member

roji commented Aug 28, 2023

Thanks for the analysis... Definitely interesting and important stuff. Just to set expectations, we're heads-down stabilizing the 8.0 release, so it's likely to take a bit of time before we can look into this.

/cc @ajcvickers

@abydal
Copy link

abydal commented Aug 28, 2023

So far this works for us, so there is no rush.

@easaevik
Copy link
Author

Interesting @abydal. How do you replace ValueGeneratorCache with your custom MultiTenantValueGeneratorCache?

@ajcvickers
Copy link
Contributor

@abydal Thanks for the analysis. This is certainly something we should try to fix on the EF side.

@ajcvickers ajcvickers self-assigned this Aug 29, 2023
@easaevik
Copy link
Author

easaevik commented Aug 29, 2023

@abydal I've implemented this in DbContext OnConfiguring now. I guess this is how you do it as well? Seems to work without issues locally, so interesting to see if this has the same effect as you had:

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder.ReplaceService<IValueGeneratorCache, MultiTenantValueGeneratorCache>();
    optionsBuilder.UseSqlServer(ConnectionStringHelper.GetSqlServerTenantConnectionString(
	_currentUserService.DatabaseName));
            
    base.OnConfiguring(optionsBuilder);

    optionsBuilder.ReplaceService<IModelCacheKeyFactory, DynamicModelCacheKeyFactory>();            
}

@abydal
Copy link

abydal commented Aug 29, 2023

Yes, that is how we do it as well.

@abydal
Copy link

abydal commented Aug 29, 2023

We found one more instance of this issue. In EntityMaterializerSource here There are two ConcurrentDictionaries which also use IEntityType as the key. In our case its _emptyMaterializers that grows very big. _materializers does not seem to contain the same amount of data.

image

We see that our previous fix alleviates the memory pressure somewhat, and this Singleton class now ranks as top retained bytes. We will probably try to apply the same type of fix on this issue as we did with the previous one.
image

@easaevik
Copy link
Author

easaevik commented Sep 3, 2023

@abydal We've tried your solution by changing the ValueGeneratorCache, but it didn't really make any difference to the memory consumption in our scenario. So the behavior of the ServiceProviderCache is still the problem for us.

@abydal
Copy link

abydal commented Sep 6, 2023

@easaevik, Did you also look into EntityMaterializerSource that I pointed out in my last post?
We had to override the class in order to tweak the caching-keys there as well:

public class MultiTenantEntityMaterializerSource : EntityMaterializerSource
{
	private readonly ConcurrentDictionary<Type, IEntityType> map = new();

	public MultiTenantEntityMaterializerSource(EntityMaterializerSourceDependencies dependencies) : base(dependencies) { }

	public override Func<MaterializationContext, object> GetEmptyMaterializer(IEntityType entityType) =>
		base.GetEmptyMaterializer(map.GetOrAdd(entityType.ClrType, _ => entityType));

	public override Func<MaterializationContext, object> GetMaterializer(IEntityType entityType) =>
		base.GetMaterializer(map.GetOrAdd(entityType.ClrType, _ => entityType));
}

After these two changes to EntityMaterializerSource and ValueGeneratorCache our memory footprint have stabilized around 40-50% during peak hours.

@ajcvickers
Copy link
Contributor

Split out the EntityMaterializerSource issue to #31866.

ajcvickers added a commit that referenced this issue Sep 26, 2023
Fixes #31539

### Description

Storing references to the model resulted in a customer reported memory leak in multi-tenant application.

### Customer impact

Customer-reported large memory leak.

### How found

Customer reported.

### Regression

No.

### Testing

Existing tests cover that there is no regression with the cache key changed. Manual testing that the model is no longer referenced.

### Risk

Low. Also quirked.
@ajcvickers ajcvickers modified the milestones: 6.0.x, 6.0.24 Sep 26, 2023
@ajcvickers ajcvickers added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Oct 11, 2023
@ajcvickers ajcvickers removed their assignment Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-type-mapping closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported Servicing-approved type-bug
Projects
None yet
Development

No branches or pull requests

4 participants