druid.server.http.defaultQueryTimeout
does not set timeout for Historical servers
#17475
Labels
druid.server.http.defaultQueryTimeout
does not set timeout for Historical servers
#17475
Affected Version
Environments are run on Kubernetes clusters:
Production env: 0.22.1
Test env: 27
Description
Cluster Info: 2 Historicals, max heap 8G, direct memory size 16G. Limit 4cpu, 24G memory.
According to the docs, Historical servers can configure a query timeout which stops the queries. The default value is 300_000ms, which translates to 5min.
This problem is first encountered when my Historical servers are being overloaded with slow queries, and I want to set a timeout of 90s on Historicals to alleviate this problem. However, setting
druid.server.http.defaultQueryTimeout=90000
does not seem to work.Grafana dashboard reports still record Historical latencies of up to 8min, while Brokers show latencies of 5min, which shows that the default 5min
druid.server.http.defaultQueryTimeout
configuration is valid for the Brokers. (Maybe Historicals need more time to process query cancellations? Read from Cancel a Query that Druid does a "best-effort cancellation", but may need more details on how this 'best-effort' is conducted, and whether query cancellation trigger a stop Historical processes.)Adding a
timeout
under query context does help with limiting the latency. Here's a try of adding a timeout in the query context from 500ms to 5s, and back:To further investigate, I set up a test cluster and modified the
druid.server.http.defaultQueryTimeout
in thehistorical/runtime.properties
file to a very low value (5ms) to trigger a timeout condition. However, the query continued to execute as normal (Historical latencies of up to 10s) despite this change.I added the following lines to
historical/runtime.properties
, and rebuilt the Druid image to run on my Kubernetes setup:Despite these configurations, my Historical servers began experiencing issues loading segments, resulting in an inability to query Druid's provided
trips_xaa
datasource. The error message indicated that the configured timeout had reverted to the default of 5 minutes.In an effort to troubleshoot, I thoroughly checked the configuration spellings and reviewed ServerConfig.java but found no discrepancies. This raises the possibility that the documentation may misrepresent the ability to configure timeouts for Historical servers, suggesting it may only be applicable to Broker instances.
Request for Clarification
If Druid is indeed lacking the capability to adjust timeout settings for Historical servers or if there has been a change in the configuration naming, I kindly request an update to the documentation for clarity.
Thank you for your attention to this matter!
The text was updated successfully, but these errors were encountered: