Search Latency Tracking - Coordinator Slow Logs #9642

dzane17 · 2023-08-30T18:29:44Z

Is your feature request related to a problem? Please describe.
As of today, we track search request latencies on a shard level via node stats. After every query/fetch phase is completed on a shard, we note down the time taken for each, keep accumulating those values and maintain an overall average value which is tracked under stats.

But we don’t have a mechanism to track search latencies around coordinator node. Coordinator node plays an important role in fanning out requests to individual shard/data-nodes, aggregating those responses and eventually sending response back to the client. We have seen multiple issues in the past where it becomes hard/impossible to reason latency related issues because of lack of insights into coordinator level stats and we ended up spending a lot of unnecessary time/bandwidth on figuring it out. Clients using search API only rely on overall took time(present as part of search response) which doesn’t offer much insights into time taken by different phases.

Parent RFC: #7334

Describe the solution you'd like
Slow logs at coordinator level: As of now, we only have the capability to enable slow logs at a shard level for desired search phase(query and fetch). See this. Setting this threshold is tricky when customer usually sees latency spikes at a request level. Plus shard level slow logs doesn't offer a holistic view. So as part of this, we will also add capabilities to capture slow logs at a request level along with different search phases from coordinator node perspective.

Additional context
Coordinator slow logs will be governed by cluster settings. We will offer for the following 3 intervals:

Overall request
~~Query phase~~
~~Fetch phase~~

// Setting on a whole request level
cluster.search.request.slowlog.threshold.warn: 10s
cluster.search.request.slowlog.threshold.info: 5s
cluster.search.request.slowlog.threshold.debug: 2s
cluster.search.request.slowlog.threshold.trace: 500ms

// Minimum level to print
cluster.search.request.slowlog.level: "trace"

The text was updated successfully, but these errors were encountered:

macohen · 2023-10-04T13:53:15Z

Looking forward to seeing this! @dzane17 is this still slated for 2.11? Can you link a PR for this, if it is or remove the label if it isn't, please?

dzane17 · 2023-10-16T22:56:23Z

Thanks @macohen. I have opened the PR and we are now on track for 2.12

dzane17 added enhancement Enhancement or improvement to existing feature or request untriaged labels Aug 30, 2023

kkhatua assigned dzane17 Aug 30, 2023

kkhatua added feature New feature or request Search Search query, autocomplete ...etc v2.11.0 Issues and PRs related to version 2.11.0 labels Aug 30, 2023

github-project-automation bot added this to Search Project Board Aug 30, 2023

github-project-automation bot moved this to 🆕 New in Search Project Board Aug 30, 2023

kkhatua removed the untriaged label Aug 30, 2023

dzane17 changed the title ~~Search Latency - Coordinator Slow Logs~~ Search Latency Tracking - Coordinator Slow Logs Aug 30, 2023

dzane17 mentioned this issue Oct 16, 2023

Request level coordinator slow logs #10650

Merged

7 tasks

kkhatua added v2.12.0 Issues and PRs related to version 2.12.0 and removed v2.11.0 Issues and PRs related to version 2.11.0 labels Oct 16, 2023

kkhatua moved this from 🆕 New to Now(This Quarter) in Search Project Board Oct 16, 2023

dzane17 mentioned this issue Oct 30, 2023

[DOC] Search request slow logs opensearch-project/documentation-website#5298

Merged

1 task

msfroh mentioned this issue Nov 9, 2023

[BUG] Slow query logs contain null date range #2054

Open

ansjcy mentioned this issue Nov 15, 2023

[RFC] Real-time Insights into Top N Queries by Latency and Resource Usage #11186

Closed

msfroh closed this as completed in #10650 Nov 16, 2023

github-project-automation bot moved this from Now(This Quarter) to ✅ Done in Search Project Board Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search Latency Tracking - Coordinator Slow Logs #9642

Search Latency Tracking - Coordinator Slow Logs #9642

dzane17 commented Aug 30, 2023 •

edited

Loading

macohen commented Oct 4, 2023

dzane17 commented Oct 16, 2023

Search Latency Tracking - Coordinator Slow Logs #9642

Search Latency Tracking - Coordinator Slow Logs #9642

Comments

dzane17 commented Aug 30, 2023 • edited Loading

macohen commented Oct 4, 2023

dzane17 commented Oct 16, 2023

dzane17 commented Aug 30, 2023 •

edited

Loading