You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We introduced Task Resource Consumer to log task resource information at a specified interval as a part of Task consumer Integration #2293. However, TopNSearchTasksLogger.java -> recordSearchTask() is synchronized for thread-safety. Since it's called on every (search) task completion, the performance penalty is too much on highly concurrent search workloads. In one of the tests, it came out to be ~20% higher latency due to this. This can even increase further for the clusters having nodes with large number of cores & thread pools.
Also, since logging happens in the hot-path of searches, it will introduce HOL blocking. I'm expecting the tail latencies to suffer due to this. Let's also confirm that we use async logging here.
Less concerning – As logging is triggered by a search request, it may not honor the LOG_TOP_QUERIES_FREQUENCY accurately when the search traffic is low and irregularly spaced apart. For example, consider a scenario where a few hundred requests arrive at a time t = 0 seconds but the subsequent requests arrive at a later time t = 1000 seconds, the logging will be deferred for 16+ minutes.
To Reproduce
Steps to reproduce the behavior:
Compare search latency and throughput for a cluster with large number of cores and thread pools with Task Resource Consumer (task_resource_consumers.enabled) enabled and disabled.
Expected behavior
No or acceptable performance degradation.
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
OS: [e.g. iOS]
Version [e.g. 22]
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Hey @anasalkouz , I am not able to work on fixing this issue. Let's see if @sruti1312 can help here. Also, @sruti1312 do you suggest any other solution here?
Describe the bug
TopNSearchTasksLogger.java -> recordSearchTask()
issynchronized
for thread-safety. Since it's called on every (search) task completion, the performance penalty is too much on highly concurrent search workloads. In one of the tests, it came out to be ~20% higher latency due to this. This can even increase further for the clusters having nodes with large number of cores & thread pools.LOG_TOP_QUERIES_FREQUENCY
accurately when the search traffic is low and irregularly spaced apart. For example, consider a scenario where a few hundred requests arrive at a time t = 0 seconds but the subsequent requests arrive at a later time t = 1000 seconds, the logging will be deferred for 16+ minutes.To Reproduce
Steps to reproduce the behavior:
Compare search latency and throughput for a cluster with large number of cores and thread pools with Task Resource Consumer (
task_resource_consumers.enabled
) enabled and disabled.Expected behavior
No or acceptable performance degradation.
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: