`druid.server.http.defaultQueryTimeout` does not set timeout for Historical servers #17475

GWphua · 2024-11-14T03:29:10Z

Affected Version

Environments are run on Kubernetes clusters:
Production env: 0.22.1
Test env: 27

Description

Cluster Info: 2 Historicals, max heap 8G, direct memory size 16G. Limit 4cpu, 24G memory.

According to the docs, Historical servers can configure a query timeout which stops the queries. The default value is 300_000ms, which translates to 5min.

This problem is first encountered when my Historical servers are being overloaded with slow queries, and I want to set a timeout of 90s on Historicals to alleviate this problem. However, setting druid.server.http.defaultQueryTimeout=90000 does not seem to work.

Grafana dashboard reports still record Historical latencies of up to 8min, while Brokers show latencies of 5min, which shows that the default 5min druid.server.http.defaultQueryTimeout configuration is valid for the Brokers. (Maybe Historicals need more time to process query cancellations? Read from Cancel a Query that Druid does a "best-effort cancellation", but may need more details on how this 'best-effort' is conducted, and whether query cancellation trigger a stop Historical processes.)

Adding a timeout under query context does help with limiting the latency. Here's a try of adding a timeout in the query context from 500ms to 5s, and back:

To further investigate, I set up a test cluster and modified the druid.server.http.defaultQueryTimeout in the historical/runtime.properties file to a very low value (5ms) to trigger a timeout condition. However, the query continued to execute as normal (Historical latencies of up to 10s) despite this change.

I added the following lines to historical/runtime.properties, and rebuilt the Druid image to run on my Kubernetes setup:

druid.server.http.defaultQueryTimeout=90000
druid.server.http.maxQueryTimeout=90000

Despite these configurations, my Historical servers began experiencing issues loading segments, resulting in an inability to query Druid's provided trips_xaa datasource. The error message indicated that the configured timeout had reverted to the default of 5 minutes.

In an effort to troubleshoot, I thoroughly checked the configuration spellings and reviewed ServerConfig.java but found no discrepancies. This raises the possibility that the documentation may misrepresent the ability to configure timeouts for Historical servers, suggesting it may only be applicable to Broker instances.

Request for Clarification

If Druid is indeed lacking the capability to adjust timeout settings for Historical servers or if there has been a change in the configuration naming, I kindly request an update to the documentation for clarity.

Thank you for your attention to this matter!

The text was updated successfully, but these errors were encountered:

GWphua added the Uncategorized problem report label Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`druid.server.http.defaultQueryTimeout` does not set timeout for Historical servers #17475

`druid.server.http.defaultQueryTimeout` does not set timeout for Historical servers #17475

GWphua commented Nov 14, 2024 •

edited

Loading

druid.server.http.defaultQueryTimeout does not set timeout for Historical servers #17475

druid.server.http.defaultQueryTimeout does not set timeout for Historical servers #17475

Comments

GWphua commented Nov 14, 2024 • edited Loading

Affected Version

Description

Request for Clarification

`druid.server.http.defaultQueryTimeout` does not set timeout for Historical servers #17475

`druid.server.http.defaultQueryTimeout` does not set timeout for Historical servers #17475

GWphua commented Nov 14, 2024 •

edited

Loading