Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlocks during Nest.ElasticClient.Get calls #6342

Closed
suchoss opened this issue Apr 16, 2022 · 5 comments
Closed

Deadlocks during Nest.ElasticClient.Get calls #6342

suchoss opened this issue Apr 16, 2022 · 5 comments
Labels
7.x Relates to a 7.x client version

Comments

@suchoss
Copy link

suchoss commented Apr 16, 2022

NEST/Elasticsearch.Net version:
7.17

Elasticsearch version:
7.17.1

.NET runtime version:
.NET6

Operating system version:
Windows 10

Description of the problem including expected versus actual behavior:
Deadlocks probably caused by calling async from synchronous code in

responseMessage = client.SendAsync(requestMessage, HttpCompletionOption.ResponseHeadersRead).GetAwaiter().GetResult();

This didn't appear when we used .NET 5. But if we update to .NET 6 (and change nothing else than .NET version) this problem appears.

Steps to reproduce:

  1. Unfortunatelly we do not have some simple steps to reproduce. But I am guessing that more requests/sec over some longer period of time may cause this.

Expected behavior
No deadlocks.

Provide ConnectionSettings (if relevant):

public static ConnectionSettings GetElasticSearchConnectionSettings(string indexName)
{
    var connectionStr = "http://10.10.150.15:9200;http://10.10.100.161:9200;http://10.10.100.162:9200;http://10.10.100.163:9200;";
    var pool = new Elasticsearch.Net.StaticConnectionPool(esUrl
        .Split(';')
        .Where(m => !string.IsNullOrWhiteSpace(m))
        .Select(u => new Uri(u))
        );

    var settings = new ConnectionSettings(pool)
        .DefaultIndex(indexName)
        .DisableAutomaticProxyDetection(false)
        .RequestTimeout(TimeSpan.FromMilliseconds(60000))
        .SniffLifeSpan(null)
        .OnRequestCompleted(call =>
        {
            // log out the request and the request body, if one exists for the type of request
            if (call.RequestBodyInBytes != null)
            {
                ESTraceLogger.Debug($"{call.HttpMethod}\t{call.Uri}\t" +
                    $"{Encoding.UTF8.GetString(call.RequestBodyInBytes)}");
            }
            else
            {
                ESTraceLogger.Debug($"{call.HttpMethod}\t{call.Uri}\t");
            }

        });

        settings = settings.ConnectionLimit(80);

    return settings;
}

Provide DebugInformation (if relevant):
image

@suchoss
Copy link
Author

suchoss commented Apr 17, 2022

I am wondering - why don't you use client.Send instead of client.SendAsync in synchronous call?
edit:
I am not wondering anymore - simply it is not available in .NET standard...

@suchoss
Copy link
Author

suchoss commented Apr 18, 2022

So to test it over the weekend I have created this simple solution.

It consists of several different projects:

  • Everything was tested on my laptop in Win10.
  • You can run any project yourself if you configure Common.Settings.cs.
  • When running project, press "A" in console to add new timer which creates 1 request/s

Projects which tests NEST from .NET5 source code and .NET 6 source code

  • EsTestDotNet5\
  • EsTestDotNet6\

These projects use NEST to call group of endpoints. It simply asks over and over again with 1s interval for simple index in ES. I found out here that .NET 5 works pretty fine and it is able to call over 100 requests/s without any significant increase in threads. Whereas switch to .NET 6 will lead (when firing about 40 requests/s) to application freeze because it will constantly add new threads.
image

Projects testing behavior of single HttpClient instance when sending requests to one Elasticsearch node

  • EsTestDotNet5SingleHttpClient\
  • EsTestDotNet6SingleHttpClient\

These projects use single instance of HttpClient to call just one endpoint. Again it simply asks one index on one node for a data. For both .NET 5 and .NET 6 there is the same behavior. About 40 requests/s (on my laptop) makes them fail over time - increase in threads. This is not an ES issue, I am just finding out some baseline for my computer. When exceeding 40 requests/s in this setup, it will lead to opening up new connections until some maximum is reached => new threads are created, but all threads are locked up in wait.

Projects testing behavior of HttpClientFactory, sending requests to one Elasticsearch node

  • EsTestDotNet5HttpClientFactory\
  • EsTestDotNet6HttpClientFactory\

These projects use HttpClientFactory to call just one endpoint. Same calls as before. But different behavior (partially expected).
.NET 5 project can run more than 100 requests/s, although only about 40 of them are processed. The more requests I create the less are really going through. But - It doesn't create new threads.

.NET 6 project will start failing when running about 100 requests/s, although again about 40 of them are really processed. The more requests I create the less are really going through. It does create new threads over time. Ultimately leading to fail.

Test if change from async call in ES to sync call in ES changes anything

  • EsTestDotNet6WithChangedEsNetLib\

To test my original hypothesis, I changed following line
responseMessage = client.SendAsync(requestMessage, HttpCompletionOption.ResponseHeadersRead).GetAwaiter().GetResult();
to this line
responseMessage = client.Send(requestMessage, HttpCompletionOption.ResponseHeadersRead);
and tried to check if it changes anything.

But I found out that this changes nothing and code behaves the same way.

Summary

NEST Version=7.17.0 is stable when running from .NET 5, but unstable when running from .NET 6 and leads to quite a quick death in our application (app becomes unresponsible over time) when running more requests/s.

Problem/Question

Our problem is that we are blocked by this, because we are unable to migrate to .NET 6 with this issue appearing. When we change from .NET 5 to .NET 6 our main web app goes to deadlock within 30 minutes of running. On the picture below you can find our ES cluster.
image

Question is if there are any tips for running it in .NET 6, or if we should wait for some new version?

Many thanks!

@suchoss
Copy link
Author

suchoss commented Apr 18, 2022

Also I can confirm, that it works (more than good) when using Async method client.GetAsync from .NET 6. Our only problem is the combination of .NET 6 calling client.Get.

@stevejgordon
Copy link
Contributor

Sorry for the delay here @suchoss. As mentioned on Twitter, this is due to a limitation of the .NET Standard API surface for HttpClient. For 8.x we now have more target frameworks supported so on those which support it (.NET 5+), we can leverage the .NET HttpClient sync APIs, rather than this workaround which does risk deadlocks in some cases. Our approach follows what the Azure SDK team have done in their transport layer but it's not ideal. I've reviewed the possibility of adding more target frameworks to 7.x but it's not entirely practical and unfortunately not something we'll address here. As a general best practice, preferring the async APIs is preferred for performance and throughput reasons. On this basis, I'll close out this issue.

@stevejgordon stevejgordon added Won't Fix 7.x Relates to a 7.x client version labels May 10, 2022
@suchoss
Copy link
Author

suchoss commented May 16, 2022

Hello @stevejgordon, Thanks for info. I am working on rewriting our app to Async.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
7.x Relates to a 7.x client version
Projects
None yet
Development

No branches or pull requests

2 participants