Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue when querying varbinary(MAX), varchar(MAX), nvarchar(MAX) or XML with Async #18221

Closed
tibitoth opened this issue Oct 4, 2019 · 12 comments

Comments

@tibitoth
Copy link

tibitoth commented Oct 4, 2019

Description

I would like to resurrect the following EF6 issue: dotnet/ef6#88 because it's still present in EF Core 2.2 and 3.0 also.

There's a huge performance issue when querying a table containing varbinary(MAX) columns with .ToListAsync(). The problem is EF, despite the presence of varbinary(max) column, uses CommandBehavior.Default with ExecuteReaderAsync(). (instead of CommandBehavior.SequentialAccess)

Related SO question and answer: https://stackoverflow.com/questions/28543293/entity-framework-async-operation-takes-ten-times-as-long-to-complete

This is a huge issue, we cannot use ToList as a workaround because this requires impossible checks at development time.

Steps to reproduce

I have created a small repro codebase and benchmarks.

public class AppDbContext : DbContext
{
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        optionsBuilder.UseSqlServer("Server=(localdb)\\mssqllocaldb;Database=EFCore-repro-ToListAsync;Trusted_Connection=True;");
    }

    public DbSet<TestEntity> Blobs { get; set; }
}

public class TestEntity
{
    public int Id { get; set; }
    public string LargeTextBlob { get; set; }
}

public class Program
{
    static void Main(string[] args)
    {
        var summary = BenchmarkRunner.Run<Benchmarks>();
        Console.ReadKey();
    }
}

public class Benchmarks
{
    private int id;

    public Benchmarks()
    {
        using (var context = new AppDbContext())
        {
            var array = new char[5 * 1024 * 1024];
            var random = new Random();
            for (int i = 0; i < array.Length; i++)
            {
                array[i] = (char)random.Next(32, 126);
            }

            var entity = new TestEntity { LargeTextBlob = new string(array) };
            context.Blobs.Add(entity);
            context.SaveChanges();

            id = entity.Id;
        }
    }

    [Benchmark]
    public void GetWithToList()
    {
        using (var context = new AppDbContext())
        {
            var entity = context.Blobs.Where(b => b.Id == id).ToList();
        }
    }

    [Benchmark]
    public async Task GetWithToListAsync()
    {
        using (var context = new AppDbContext())
        {
            var entity = await context.Blobs.Where(b => b.Id == id).ToListAsync();
        }
    }
}

Benckmarks

Method Mean Error StdDev
GetWithToList 45.63 ms 0.4871 ms 0.4318 ms
GetWithToListAsync 15,088.37 ms 178.2588 ms 158.0218 ms

Further technical details

EF Core version: 2.2, 3.0
Database provider: (e.g. Microsoft.EntityFrameworkCore.SqlServer)
Target framework: (e.g. .NET Core 3.0, .NET Core 2.2)
Operating system: Windows 10.
IDE: (e.g. Visual Studio 2019 16.3)

@AndriySvyryd
Copy link
Member

This would be covered by #885

@roji
Copy link
Member

roji commented Oct 4, 2019

On a hunch, I recreated a benchmark for reading large values with SqlClient, on async/sync and with CommandBehavior=Default/SequentialAccess. Results are different from the (very useful) stackoverflow reported above:

  • Async is always slow and sync is always fast, regardless of command behavior
  • Sync eats lots more memory and async far less, again regardless of command behavior

I suspect a significant behavior difference between the .NET Framework version of System.Data.SqlClient used in the SO benchmark back in 2015, to the MS.Data.SqlClient being used on .NET Core today. I've opened dotnet/SqlClient#245 to track this, the full benchmark and results are available there.

So as things stand, it looks like having EF Core pass CommandBehavior.SequentialAccess wouldn't help in any way. However, it definitely seems like the right thing to do, and once the odd SqlClient behavior is corrected it could provide a significant boost. As the ability to pass SequentialAccess is already covered by #885, I don't think there's anything else to do here, so we should close it.

One last note: CommandBehavior.Sequential makes sense for large values. On SQL Server the provider can reasonably "guess" that a property will contain large values based on its store type: VARCHAR(MAX) would probably be a good candidate, whereas VARCHAR(255) obviously would not be. This doesn't necessarily hold for other providers: in PostgreSQL users are encouraged to just use the TEXT store type for all strings, regardless of length, so it would probably be a bad idea to always access it with SequentialAccess. Maybe an additional user hint makes sense here.

@ajcvickers
Copy link
Contributor

Closing this as duplicate of #885 for the EF part; the SqlClient part is tracked by dotnet/SqlClient#245

@rogerfar
Copy link

@tibitoth did you ever find a solution to this issue? It looks like everyone shrugs and says, can't be fixed easily. I ran into this issue yesterday where a table hit a breaking point and essentially killed out app. Is there a workaround?

@ErikEJ
Copy link
Contributor

ErikEJ commented Feb 25, 2021

@rogerfar use the non Async API?

@roji
Copy link
Member

roji commented Feb 25, 2021

@rogerfar the problem really is in the SqlClient provider - I'd advise upvoting the issue there to help them prioritize it more.

@tibitoth
Copy link
Author

@rogerfar Sadly we have to keep in mind this large fields and use sync APIs when this involves in a query. I did not run the benchmark with .NET 5 or .NET 6 but I will and file a new issue if it is still an issue.

@panoskj
Copy link

panoskj commented Mar 1, 2021

@tibitoth did you ever find a solution to this issue? It looks like everyone shrugs and says, can't be fixed easily. I ran into this issue yesterday where a table hit a breaking point and essentially killed out app. Is there a workaround?

@rogerfar I found a workaround. First see #24290. The missing part is making a wrapper for DbDataReader which caches the most recently accessed value, in order not to get an exception when EF Core reads primary keys twice.

With the suggested interceptor and the DbDataReader wrapper it worked, reaching the same performance as sync methods for my use cases.

@KonnectSalon
Copy link

This is still a major issue.

I have been troubleshooting spikes in CPU and slow SQL statements only to come to this thread.

GetClaimsAsync in Microsoft.AspNetCore.Identity calls SQL that has a table of nvarchar(max) for claims and values.

Changing the table to nvarchar(255) seem to resolve the issue.

This issue is having a knock on affect to other libraries :(

@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
@SokoFromNZ
Copy link

Any news on this? We are still waiting for this to be fixed :(

@roji
Copy link
Member

roji commented Feb 23, 2023

@SokoFromNZ and others, this has nothing to do with EF Core; this issue tracking the problem is dotnet/SqlClient#593.

@OzBob
Copy link

OzBob commented Jun 5, 2024

EF SequentialAccess perfomance issue is #885 (according to @ajcvickers comment), pls upvote 885

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants