Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Increased search latency with Task Resource Consumer enabled #7146

Open
PritLadani opened this issue Apr 13, 2023 · 2 comments
Open

[BUG] Increased search latency with Task Resource Consumer enabled #7146

PritLadani opened this issue Apr 13, 2023 · 2 comments
Labels
bug Something isn't working distributed framework

Comments

@PritLadani
Copy link
Contributor

Describe the bug

  • We introduced Task Resource Consumer to log task resource information at a specified interval as a part of Task consumer Integration #2293. However, TopNSearchTasksLogger.java -> recordSearchTask() is synchronized for thread-safety. Since it's called on every (search) task completion, the performance penalty is too much on highly concurrent search workloads. In one of the tests, it came out to be ~20% higher latency due to this. This can even increase further for the clusters having nodes with large number of cores & thread pools.
  • Also, since logging happens in the hot-path of searches, it will introduce HOL blocking. I'm expecting the tail latencies to suffer due to this. Let's also confirm that we use async logging here.
  • Less concerning – As logging is triggered by a search request, it may not honor the LOG_TOP_QUERIES_FREQUENCY accurately when the search traffic is low and irregularly spaced apart. For example, consider a scenario where a few hundred requests arrive at a time t = 0 seconds but the subsequent requests arrive at a later time t = 1000 seconds, the logging will be deferred for 16+ minutes.

To Reproduce
Steps to reproduce the behavior:
Compare search latency and throughput for a cluster with large number of cores and thread pools with Task Resource Consumer (task_resource_consumers.enabled) enabled and disabled.

Expected behavior
No or acceptable performance degradation.

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@anasalkouz
Copy link
Member

HI @PritLadani, 20% latency increase is concerning. are you actively working on this issue? do you have path forward?

@PritLadani
Copy link
Contributor Author

Hey @anasalkouz , I am not able to work on fixing this issue. Let's see if @sruti1312 can help here. Also, @sruti1312 do you suggest any other solution here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working distributed framework
Projects
None yet
Development

No branches or pull requests

3 participants