Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Circuit breaker on Transport ResponseHandlers #66196

Closed
easyice opened this issue Dec 11, 2020 · 9 comments
Closed

Add Circuit breaker on Transport ResponseHandlers #66196

easyice opened this issue Dec 11, 2020 · 9 comments
Labels
:Distributed Coordination/Network Http and internode communication implementations >enhancement feedback_needed Team:Distributed Meta label for distributed team (obsolete) team-discuss

Comments

@easyice
Copy link
Contributor

easyice commented Dec 11, 2020

for every in flight Transport request, we add a handler at org.elasticsearch.transport.Transport.ResponseHandlers

public long add(ResponseContext<? extends TransportResponse> holder) {
            long requestId = newRequestId();
            ResponseContext existing = handlers.put(requestId, holder);
            assert existing == null : "request ID already in use: " + requestId;
            return requestId;
      }

some times, the users sent a lot of request, such as query, that they exceed the system's capacity,then, CPU utilization reached 100%, but ,The client still sends a large number of requests,the requests will be add in ResponseHandlers#handlers

then the Elasticsearch nodes will be oom

This happened to me a few times,I dump the jvm memory,open with Eclipse Memory Analyzer, it show the ResponseHandlers#handlers used 7.7GB memory

image

@easyice easyice added >enhancement needs:triage Requires assignment of a team area label labels Dec 11, 2020
@DaveCTurner DaveCTurner added the :Distributed Coordination/Network Http and internode communication implementations label Dec 11, 2020
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team (obsolete) label Dec 11, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@jimczi jimczi removed the needs:triage Requires assignment of a team area label label Jan 12, 2021
@Hailei

This comment has been minimized.

@DaveCTurner

This comment has been minimized.

@Hailei

This comment has been minimized.

@DaveCTurner
Copy link
Contributor

I don't think there's a general transport-level solution for limiting the size retained by a response handler since the memory retained by a handler is pretty much independent of the transport layer. The OP doesn't tell us anything about what requests correspond with the problematic response handlers, nor what version they're using. Likely culprits include searches (in which case this duplicates #67478), stats (in which case this duplicates #55550) or bulks (in which case this is resolved by the new indexing pressure mechanisms). @easyice to what requests do the problematic handlers relate?

@Hailei
Copy link

Hailei commented Jan 14, 2021

I don't think there's a general transport-level solution for limiting the size retained by a response handler since the memory retained by a handler is pretty much independent of the transport layer.

I can't agree any more

@easyice is my coworker, The cluster that this issue mentioned is the same to me

ES version: 6.8.0, The request were bulk in the last two accident @DaveCTurner

@DaveCTurner
Copy link
Contributor

Ok, there is a solution for limiting the size retained by bulk response handlers, and it's already implemented (as of 7.9) so I think there's no further action needed here.

@easyice
Copy link
Contributor Author

easyice commented Jan 15, 2021

@DaveCTurner Thanks for reply, Let me add the relate requests, It doesn't just appear in bulk,also in search request,like this:

image

@DaveCTurner
Copy link
Contributor

Indeed, we're tracking the issue for search responses at #67478.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations >enhancement feedback_needed Team:Distributed Meta label for distributed team (obsolete) team-discuss
Projects
None yet
Development

No branches or pull requests

6 participants