Understand/Improve the performance of date histograms #9310

jainankitk · 2023-08-14T17:16:12Z

I have been looking at understanding and improving the performance of date histograms in Opensearch. For this purpose, I have setup single node cluster, and ingested nyc_taxis dataset.

While running single histogram aggregation request in loop, I collected the following cpu flamegraph:

Trying to understand how can we reduce the scoreRange cost for this request

jainankitk · 2023-08-17T16:50:17Z

I did some more experiments running histogram query on one node setup and tracking the took times:

$ curl -s -X GET "localhost:9200/nyc_taxis/_search?pretty" -H 'Content-Type: application/json' -d'{"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-01-01 00:00:00","lt": "2016-01-01 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}' | head
{
  "took" : 8822,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {

Initially, I rewrote this as cardinality filter query and took time was significantly lesser for one month:

$ curl -s -X GET "localhost:9200/nyc_taxis/_search?pretty" -H 'Content-Type: application/json' -d'{"size": 0,"track_total_hits": "true","query": {"range": {"dropoff_datetime": {"gte": "2015-12-01 00:00:00","lt": "2015-12-31 00:00:00"}}}}' | head
{
  "took" : 162,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {

Even though the took for one month is much lesser (and overall query latency can be significantly reduced), I was not too keen on this approach as it had very limited applications (only cardinality aggregations without aggregation composition)

jainankitk · 2023-08-17T16:59:35Z

While going through the lucene code, I noticed that the critical path is date histogram collector which is very iterative in nature. But, it is very efficient at computing results for many levels of composition with single iteration. Hence, I came up with another way of rewriting the query applicable to more histogram aggregation queries with following properties:

Leading scorer is date range filter
Top level aggregation is date range

Key idea is to slice the date range itself into multiple queries using rewrite. The response transformation is quite simple - union over disjoint sets. Sharing some results below (doing the rewrite on client instead of server):

Partitioning into 2 slices:

$ curl -s -X GET "localhost:9200/nyc_taxis/_msearch?pretty" -H 'Content-Type: application/json' -d'
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-01-01 00:00:00","lt": "2015-06-30 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-07-01 00:00:00","lt": "2015-12-31 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> ' | head
{
  "took" : 4225,
  "responses" : [
    {
      "took" : 4224,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,

Partitioning into 4 slices:

$ curl -s -X GET "localhost:9200/nyc_taxis/_msearch?pretty" -H 'Content-Type: application/json' -d'
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-01-01 00:00:00","lt": "2015-03-31 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-04-01 00:00:00","lt": "2015-06-30 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-07-01 00:00:00","lt": "2015-09-30 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-10-01 00:00:00","lt": "2015-12-31 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> '  | head
{
  "took" : 3588,
  "responses" : [
    {
      "took" : 3583,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,

Partitioning into 12 slices:

$ curl -s -X GET "localhost:9200/nyc_taxis/_msearch?pretty" -H 'Content-Type: application/json' -d'
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-01-01 00:00:00","lt": "2015-01-31 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-02-01 00:00:00","lt": "2015-02-28 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-03-01 00:00:00","lt": "2015-03-31 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-04-01 00:00:00","lt": "2015-04-30 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-05-01 00:00:00","lt": "2015-05-31 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-06-01 00:00:00","lt": "2015-06-30 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-07-01 00:00:00","lt": "2015-07-31 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-08-01 00:00:00","lt": "2015-08-31 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-09-01 00:00:00","lt": "2015-09-30 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-10-01 00:00:00","lt": "2015-10-31 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-11-01 00:00:00","lt": "2015-11-30 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-12-01 00:00:00","lt": "2015-12-31 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"}}}}
> ' | head
{
  "took" : 3291,
  "responses" : [
    {
      "took" : 1813,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,

jainankitk · 2023-08-17T17:05:42Z

As we can observe from took times above, the most reduction comes from 1 to 2 slices (~50%), increasing the number of slices further does not see as much jump. The limitation from single node compute might also be coming into play

If we combine the slicing approach and cardinality specific rewrite, the reduction is more than 80% in latency. Although, that approach does not give any flexibility on the number of slices and limited to specific cardinality query like above. Sample invocation:

$ curl -s -X GET "localhost:9200/nyc_taxis/_msearch?pretty" -H 'Content-Type: application/json' -d'
> {}
> {"size": 0,"track_total_hits": "true","query": {"range": {"dropoff_datetime": {"gte": "2015-01-01 00:00:00","lt": "2015-01-31 00:00:00"}}}}
> {}
> {"size": 0,"track_total_hits": "true","query": {"range": {"dropoff_datetime": {"gte": "2015-02-01 00:00:00","lt": "2015-02-28 00:00:00"}}}}
> {}
> {"size": 0,"track_total_hits": "true","query": {"range": {"dropoff_datetime": {"gte": "2015-03-01 00:00:00","lt": "2015-03-31 00:00:00"}}}}
> {}
> {"size": 0,"track_total_hits": "true","query": {"range": {"dropoff_datetime": {"gte": "2015-04-01 00:00:00","lt": "2015-04-30 00:00:00"}}}}
> {}
> {"size": 0,"track_total_hits": "true","query": {"range": {"dropoff_datetime": {"gte": "2015-05-01 00:00:00","lt": "2015-05-31 00:00:00"}}}}
> {}
> {"size": 0,"track_total_hits": "true","query": {"range": {"dropoff_datetime": {"gte": "2015-06-01 00:00:00","lt": "2015-06-30 00:00:00"}}}}
> {}
> {"size": 0,"track_total_hits": "true","query": {"range": {"dropoff_datetime": {"gte": "2015-07-01 00:00:00","lt": "2015-07-31 00:00:00"}}}}
> {}
> {"size": 0,"track_total_hits": "true","query": {"range": {"dropoff_datetime": {"gte": "2015-08-01 00:00:00","lt": "2015-08-31 00:00:00"}}}}
> {}
> {"size": 0,"track_total_hits": "true","query": {"range": {"dropoff_datetime": {"gte": "2015-09-01 00:00:00","lt": "2015-09-30 00:00:00"}}}}
> {}
> {"size": 0,"track_total_hits": "true","query": {"range": {"dropoff_datetime": {"gte": "2015-10-01 00:00:00","lt": "2015-10-31 00:00:00"}}}}
> {}
> {"size": 0,"track_total_hits": "true","query": {"range": {"dropoff_datetime": {"gte": "2015-11-01 00:00:00","lt": "2015-11-30 00:00:00"}}}}
> {}
> {"size": 0,"track_total_hits": "true","query": {"range": {"dropoff_datetime": {"gte": "2015-12-01 00:00:00","lt": "2015-12-31 00:00:00"}}}}
> ' | head
{
  "took" : 993,
  "responses" : [
    {
      "took" : 551,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,

jainankitk · 2023-08-17T22:37:25Z

Applying the same slicing concept on composite date histogram aggregation, we can see similar improvement (26s to 14s):

$ curl -s -X GET "localhost:9200/nyc_taxis/_search?pretty" -H 'Content-Type: application/json' -d'{"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-01-01 00:00:00","lt": "2016-01-01 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"},"aggs":{"total_amount":{"stats":{"field":"total_amount"}},"tip_amount":{"stats":{"field":"tip_amount"}},"trip_distance":{"stats":{"field":"trip_distance"}}}}}}' | head
{
  "took" : 26604,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {

$ curl -s -X GET "localhost:9200/nyc_taxis/_msearch?pretty" -H 'Content-Type: application/json' -d'
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-01-01 00:00:00","lt": "2015-07-01 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"},"aggs":{"total_amount":{"stats":{"field":"total_amount"}},"tip_amount":{"stats":{"field":"tip_amount"}},"trip_distance":{"stats":{"field":"trip_distance"}}}}}}
> {}
> {"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-07-01 00:00:00","lt": "2016-01-01 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram": {"field": "dropoff_datetime","calendar_interval": "month"},"aggs":{"total_amount":{"stats":{"field":"total_amount"}},"tip_amount":{"stats":{"field":"tip_amount"}},"trip_distance":{"stats":{"field":"trip_distance"}}}}}}
> ' | head
{
  "took" : 14806,
  "responses" : [
    {
      "took" : 14806,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,

backslasht · 2023-08-18T09:22:16Z

Thanks @jainankitk for the details.

While this approach of split (into multiple smaller range queries) and merge (into one result) looks promising, I think the net CPU utilization will remain the same. Or in other words, in a busy system where are all search threads are (nearly) occupied, the performance benefit may not be seen.

Also, is there any overhead caused by the number of splits, for example, consider a date aggregation query where the results are aggregated at a month level for an year and say the data is split across 20 shards in 20 different nodes. In that case, instead of 20 network call (between the nodes), this would make 20*12 network calls and there may be some overhead with respect of serializing/de-serializing the request/responses.

Have you tried this with a setup containing more number of nodes?

msfroh · 2023-08-22T18:08:24Z

@jainankitk How do you think this would compare to using concurrent search across two slices? Unless I'm mistaken, the drop in latency comes from running the two slices in parallel in the _msearch versus running them on a single thread per shard.

Using concurrent search, we would mostly end up doing the same thing, especially if segments hold disjoint contiguous time ranges (which isn't guaranteed, but can be made more likely with #9241).

jainankitk · 2023-08-22T20:32:21Z

@jainankitk How do you think this would compare to using concurrent search across two slices?

The impact will be similar.

Using concurrent search, we would mostly end up doing the same thing, especially if segments hold disjoint contiguous time ranges (which isn't guaranteed, but can be made more likely with #9241).

While the benefits are similar, this approach is more generic (for date histogram aggregations) as it is agnostic of number of segments. Also, the rewrite using track_total_hits (for cardinality date histogram aggregtions) makes the request scorer bound instead of collector, which is significant improvement

jainankitk · 2023-08-22T20:39:19Z

While this approach of split (into multiple smaller range queries) and merge (into one result) looks promising, I think the net CPU utilization will remain the same. Or in other words, in a busy system where are all search threads are (nearly) occupied, the performance benefit may not be seen.

@backslasht - Completely agree with you on this. Although I don't have any data on this, IMO date histogram aggregations are mostly generated by UI or dashboard components for visualizing the data. They are quite expensive as well (~7s) rendering them too slow or impractical for low latency applications. Hence, the request concurrency should not be high.

Also, is there any overhead caused by the number of splits, for example, consider a date aggregation query where the results are aggregated at a month level for an year and say the data is split across 20 shards in 20 different nodes. In that case, instead of 20 network call (between the nodes), this would make 20*12 network calls and there may be some overhead with respect of serializing/de-serializing the request/responses.

Assuming the throughput of such requests is not high, the overhead caused due to 20*12 network calls should not be huge. We can also evaluate the query splitting on data node (instead of coordinator) to avoid network calls overhead.

gashutos · 2023-08-23T07:49:11Z

@jainankitk how many number of shards you have in index ? Running your query with 2 slice is equivalent to doubling the shard count right ?
I think this will give even better improvement on top of concurrent segment search. with higher number of replicas since we will able to achieve more parallelization.

jainankitk · 2023-08-23T15:14:01Z

@jainankitk how many number of shards you have in index ?

I ran this experiment with single shard, single node.

Running your query with 2 slice is equivalent to doubling the shard count right ?

Yes, but doubling the shard count dynamically is not possible. It already is what it is. Given any shard count, we can always rewrite to slice the query like above

I think this will give even better improvement on top of concurrent segment search. with higher number of replicas since we will able to achieve more parallelization.

While I don't understand the correlation with higher number of replicas, the rewriting is additive to concurrent segment search, given enough CPU is available

gashutos · 2023-08-23T15:20:14Z

While I don't understand the correlation with higher number of replicas, the rewriting is additive to concurrent segment search, given enough CPU is available

Let say if we have 3 replica for a primary shard, we will have 4 shards total after all on 4 nodes.
Search request for that shard will land to any shard selected through ARS ( adaptive replica selection ). So 4 concurrent search request will be served through 4 different nodes and all individual replica will utilize available individual node CPU through concurrent segment search.
So given if we rewrite query to act like 4 different parallel query, it is latency wise benefit on top of concurrent search.
But here results wont be consistent since replicas might have different documents based on sync state with primary

mikemccand · 2023-08-28T10:12:25Z

While this approach of split (into multiple smaller range queries) and merge (into one result) looks promising, I think the net CPU utilization will remain the same. Or in other words, in a busy system where are all search threads are (nearly) occupied, the performance benefit may not be seen.

@backslasht - Completely agree with you on this. Although I don't have any data on this, IMO date histogram aggregations are mostly generated by UI or dashboard components for visualizing the data. They are quite expensive as well (~7s) rendering them too slow or impractical for low latency applications. Hence, the request concurrency should not be high.

This is indeed a zero sum game -- as you use more concurrency within a single query, latencies get much better for queries when the cluster is not near red-line (100% CPU saturation), yet red-line QPS (capacity) is no better and maybe a bit worse due to costs of thread switching (Java 19's new light weight threads (Project Loom) may help here).

But note that this tradeoff is nearly always very worthwhile. Modern CPUs/clusters have tremendous concurrency that is horribly underutilized during "peace time" (cluster not near 100% CPU saturation). Tapping into this concurrency to make single queries faster is an excellent tradeoff -- you massively reduce latency for the few in-flight queries. Most applications very rarely run their clusters at red-line.

It's crazy to me how long it took ES/OS to simply make use of Lucene's intra-concurrency feature -- it's been available in Lucene for a looooong time, and CPUs finally became concurrent enough to make it a no-brainer. In our service (Amazon's customer facing product search) we've used this feature since the very start of the (lengthy!) migration to Lucene and it was vital to the success of that migration.

If latency / capacity at red-line is really a problem, OpenSearch could add some "smart" logic to taper the concurrency when the cluster is near capacity, because at that point the added intra-query concurrency can only hurt things. But I would test empirically to see if such smart logic is really necessary -- the thread context switching cost is likely low enough (and decreasing with time with Project Loom) that it's not worth doing that. Really OS should focus on reducing query latencies during peace time (low cluster utilization).

mikemccand · 2023-08-28T10:23:03Z

But here results wont be consistent since replicas might have different documents based on sync state with primary

Note that this is true even without the added intra-query concurrency?

With document based replication there is no transactional guarantee (consistent point-in-time view of the Search index).

But segment based replication can offer this, if we choose to implement it -- a query can know, even across replicas, precisely which point-in-time view of the index it is searching. Replicas can also preserve (keep open, for a time) recent point-in-time views so that queries can be nearly 100% consistent.

Given that document based replication has forever NOT had such guarantees, we should not start adding it just yet with segment replication. But segment replication makes it quite a simple feature (transactionaly consistency) to add later on, if it really matters to users ...

jainankitk · 2023-09-14T05:09:53Z

I tried manually rewriting the query as filter aggregation, but did not notice any performance improvement:

|                                                 Min Throughput | date_histogram_calendar_interval |        0.13 |  ops/s |
|                                                Mean Throughput | date_histogram_calendar_interval |        0.13 |  ops/s |
|                                              Median Throughput | date_histogram_calendar_interval |        0.13 |  ops/s |
|                                                 Max Throughput | date_histogram_calendar_interval |        0.13 |  ops/s |
|                                        50th percentile latency | date_histogram_calendar_interval |     68206.1 |     ms |
|                                        90th percentile latency | date_histogram_calendar_interval |     90803.8 |     ms |
|                                       100th percentile latency | date_histogram_calendar_interval |     96544.5 |     ms |
|                                   50th percentile service time | date_histogram_calendar_interval |     7673.54 |     ms |
|                                   90th percentile service time | date_histogram_calendar_interval |     7741.49 |     ms |
|                                  100th percentile service time | date_histogram_calendar_interval |      7806.9 |     ms |
|                                                     error rate | date_histogram_calendar_interval |           0 |      % |
|                                                 Min Throughput |               date_histogram_agg |        0.73 |  ops/s |
|                                                Mean Throughput |               date_histogram_agg |        0.75 |  ops/s |
|                                              Median Throughput |               date_histogram_agg |        0.74 |  ops/s |
|                                                 Max Throughput |               date_histogram_agg |        0.78 |  ops/s |
|                                        50th percentile latency |               date_histogram_agg |     519.746 |     ms |
|                                        90th percentile latency |               date_histogram_agg |     522.896 |     ms |
|                                       100th percentile latency |               date_histogram_agg |     525.959 |     ms |
|                                   50th percentile service time |               date_histogram_agg |     518.406 |     ms |
|                                   90th percentile service time |               date_histogram_agg |     520.888 |     ms |
|                                  100th percentile service time |               date_histogram_agg |     524.325 |     ms |
|                                                     error rate |               date_histogram_agg |           0 |      % |
|                                                 Min Throughput |        date_histogram_agg_filter |        0.13 |  ops/s |
|                                                Mean Throughput |        date_histogram_agg_filter |        0.13 |  ops/s |
|                                              Median Throughput |        date_histogram_agg_filter |        0.13 |  ops/s |
|                                                 Max Throughput |        date_histogram_agg_filter |        0.13 |  ops/s |
|                                        50th percentile latency |        date_histogram_agg_filter |     67761.8 |     ms |
|                                        90th percentile latency |        date_histogram_agg_filter |     90681.7 |     ms |
|                                       100th percentile latency |        date_histogram_agg_filter |     96495.7 |     ms |
|                                   50th percentile service time |        date_histogram_agg_filter |     7769.12 |     ms |
|                                   90th percentile service time |        date_histogram_agg_filter |     7843.55 |     ms |
|                                  100th percentile service time |        date_histogram_agg_filter |     7888.28 |     ms |
|                                                     error rate |        date_histogram_agg_filter |           0 |      % |

jainankitk · 2023-09-14T18:36:07Z

Okay, I missed adding the range filter in my filter aggregation query. Due to that the flame graph had MatchAll query on which collector was being run:

    {   
      "name": "date_histogram_agg_filter",
      "operation-type": "search",
      "body": {
        "size": 0,
        "aggs": {
          "dropoffs_over_time": {
            "filters": {
              "filters": {
                "Jan" : {"range": {"dropoff_datetime": {"gte": "2015-01-01 00:00:00","lt": "2015-02-01 00:00:00"}}},
                "Feb" : { "range": {"dropoff_datetime": {"gte": "2015-02-01 00:00:00","lt": "2015-03-01 00:00:00"}}}
              }   
            }   
          }   
        }   
      }   
    }

jainankitk · 2023-09-14T18:39:12Z

After adding the query range filter, the flame graph does not have matchAll query:

"query": {
          "range": {
            "dropoff_datetime": {
              "gte": "2015-01-01 00:00:00",
              "lt": "2015-03-01 00:00:00"
            }
          }
        }

jainankitk · 2023-09-18T19:58:05Z

From small POC based on this, the performance improvement for above specific query is from 6947 to 172 ms:

With the rewrite and filter aggregation improvements:

$ curl -X GET "localhost:9200/nyc_taxis/_search?pretty" -H 'Content-Type: application/json' -d'
> {
>   "size": 0,
>   "aggs" : {
>     "messages" : {
>       "filters" : {
>         "filters" : {
>           "1420070400000" :   {"range": {"dropoff_datetime": {"gte": "2015-01-01 00:00:00","lt": "2015-02-01 00:00:00"}}},
>           "1422748800000" :  { "range": {"dropoff_datetime": {"gte": "2015-02-01 00:00:00","lt": "2015-03-01 00:00:00"}}},
>   "1425168000000" :  { "range": {"dropoff_datetime": {"gte": "2015-03-01 00:00:00","lt": "2015-04-01 00:00:00"}}},
>   "1427846400000" :  { "range": {"dropoff_datetime": {"gte": "2015-04-01 00:00:00","lt": "2015-05-01 00:00:00"}}},
>   "1430438400000," :  { "range": {"dropoff_datetime": {"gte": "2015-05-01 00:00:00","lt": "2015-06-01 00:00:00"}}},
>    "1433116800000" :  { "range": {"dropoff_datetime": {"gte": "2015-06-01 00:00:00","lt": "2015-07-01 00:00:00"}}},
>   "1435708800000" :  { "range": {"dropoff_datetime": {"gte": "2015-07-01 00:00:00","lt": "2015-08-01 00:00:00"}}},
>   "1438387200000" :  { "range": {"dropoff_datetime": {"gte": "2015-08-01 00:00:00","lt": "2015-09-01 00:00:00"}}},
>    "1441065600000" :  { "range": {"dropoff_datetime": {"gte": "2015-09-01 00:00:00","lt": "2015-10-01 00:00:00"}}},
>   "1443657600000" :  { "range": {"dropoff_datetime": {"gte": "2015-10-01 00:00:00","lt": "2015-11-01 00:00:00"}}},
>   "1446336000000" : { "range": {"dropoff_datetime": {"gte": "2015-11-01 00:00:00","lt": "2015-12-01 00:00:00"}}},
>   "1448928000000" : { "range": {"dropoff_datetime": {"gte": "2015-12-01 00:00:00","lt": "2016-01-01 00:00:00"}}}
>         }
>       }
>     }
>   }
> }
> '
{
  "took" : 172,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "messages" : {
      "buckets" : {
        "1420070400000" : {
          "doc_count" : 14249568
        },
        "1422748800000" : {
          "doc_count" : 14024615
        },
        "1425168000000" : {
          "doc_count" : 15078229
        },
        "1427846400000" : {
          "doc_count" : 14734472
        },
        "1430438400000," : {
          "doc_count" : 14947963
        },
        "1433116800000" : {
          "doc_count" : 13962891
        },
        "1435708800000" : {
          "doc_count" : 13101209
        },
        "1438387200000" : {
          "doc_count" : 12667603
        },
        "1441065600000" : {
          "doc_count" : 12717785
        },
        "1443657600000" : {
          "doc_count" : 13941138
        },
        "1446336000000" : {
          "doc_count" : 12849174
        },
        "1448928000000" : {
          "doc_count" : 13068510
        }
      }
    }
  }
}

Original:

$ curl -s -X GET "localhost:9200/nyc_taxis/_search?pretty" -H 'Content-Type: application/json' -d'{"size": 0,"query": {"range": {"dropoff_datetime": {"gte": "2015-01-01 00:00:00","lt": "2016-01-01 00:00:00"}}},"aggs": {"dropoffs_over_time": {"date_histogram":
 {"field": "dropoff_datetime","calendar_interval": "month"}}}}'
{
  "took" : 6947,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "dropoffs_over_time" : {
      "buckets" : [
        {
          "key_as_string" : "2015-01-01 00:00:00",
          "key" : 1420070400000,
          "doc_count" : 14249568
        },
        {
          "key_as_string" : "2015-02-01 00:00:00",
          "key" : 1422748800000,
          "doc_count" : 14024615
        },
        {
          "key_as_string" : "2015-03-01 00:00:00",
          "key" : 1425168000000,
          "doc_count" : 15078229
        },
        {
          "key_as_string" : "2015-04-01 00:00:00",
          "key" : 1427846400000,
          "doc_count" : 14734472
        },
        {
          "key_as_string" : "2015-05-01 00:00:00",
          "key" : 1430438400000,
          "doc_count" : 14947963
        },
        {
          "key_as_string" : "2015-06-01 00:00:00",
          "key" : 1433116800000,
          "doc_count" : 13962891
        },
        {
          "key_as_string" : "2015-07-01 00:00:00",
          "key" : 1435708800000,
          "doc_count" : 13101209
        },
        {
          "key_as_string" : "2015-08-01 00:00:00",
          "key" : 1438387200000,
          "doc_count" : 12667603
        },
        {
          "key_as_string" : "2015-09-01 00:00:00",
          "key" : 1441065600000,
          "doc_count" : 12717785
        },
        {
          "key_as_string" : "2015-10-01 00:00:00",
          "key" : 1443657600000,
          "doc_count" : 13941138
        },
        {
          "key_as_string" : "2015-11-01 00:00:00",
          "key" : 1446336000000,
          "doc_count" : 12849174
        },
        {
          "key_as_string" : "2015-12-01 00:00:00",
          "key" : 1448928000000,
          "doc_count" : 13068510
        }
      ]
    }
  }
}

jainankitk · 2023-10-18T03:39:14Z

Code changes for poc - jainankitk@c3b447c

backslasht · 2023-10-19T06:38:27Z

From small POC based on this, the performance improvement for above specific query is from 6947 to 172 ms

Thanks @jainankitk! The numbers are pretty impressive.

backslasht · 2023-10-19T14:34:33Z

How are you thinking of implementing this? Are you considering to build a generic query rewrite framework?

jainankitk · 2023-10-23T18:23:00Z

Are you considering to build a generic query rewrite framework?

I spent sometime on the code and seems that generic query framework across all query types might be tricky. Although, we will add generic logic which can account for different types of aggregation rewrites.

How are you thinking of implementing this?

I am planning to do this rewrite at shard level as it already has the parsing logic and we need not duplicate that on the coordinator layer. I am planning to do the rewrite in two steps:

Rewrite the date histogram into range aggregation first if possible. This has 2 advantages over rewriting into filter aggregation:

Performance of default range aggregation is similar to date histogram as they use similar logic for bucketing and much better than the default filter aggregation. Hence, this rewrite can always be done without taking any performance hit
Doing this as two step process benefits both date histogram and range aggregation

Rewrite the range aggregation into filter aggregation if segment does not have any deletions. Else, run it as date histogram aggregation

jainankitk · 2023-10-23T18:25:21Z

Thanks @jainankitk! The numbers are pretty impressive

Thanks @backslasht. I also verified that even on smaller duration date histograms like 1 day or 15-20 days, the improvement is still about 4-5x.

jainankitk · 2023-10-25T16:31:40Z

After applying the optimizations to date histogram aggregations, saw below numbers for nyc_taxis aggregation benchmark:

|                                                 Min Throughput |              distance_amount_agg |        2.02 |  ops/s |
|                                                Mean Throughput |              distance_amount_agg |        2.02 |  ops/s |
|                                              Median Throughput |              distance_amount_agg |        2.02 |  ops/s |
|                                                 Max Throughput |              distance_amount_agg |        2.04 |  ops/s |
|                                        50th percentile latency |              distance_amount_agg |     7.73538 |     ms |
|                                        90th percentile latency |              distance_amount_agg |     8.40921 |     ms |
|                                       100th percentile latency |              distance_amount_agg |     8.87938 |     ms |
|                                   50th percentile service time |              distance_amount_agg |     6.46754 |     ms |
|                                   90th percentile service time |              distance_amount_agg |     6.95528 |     ms |
|                                  100th percentile service time |              distance_amount_agg |     7.51381 |     ms |
|                                                     error rate |              distance_amount_agg |           0 |      % |
|                                                 Min Throughput | date_histogram_calendar_interval |        1.51 |  ops/s |
|                                                Mean Throughput | date_histogram_calendar_interval |        1.51 |  ops/s |
|                                              Median Throughput | date_histogram_calendar_interval |        1.51 |  ops/s |
|                                                 Max Throughput | date_histogram_calendar_interval |        1.53 |  ops/s |
|                                        50th percentile latency | date_histogram_calendar_interval |     16.8481 |     ms |
|                                        90th percentile latency | date_histogram_calendar_interval |     17.8246 |     ms |
|                                        99th percentile latency | date_histogram_calendar_interval |     24.3817 |     ms |
|                                       100th percentile latency | date_histogram_calendar_interval |     27.7471 |     ms |
|                                   50th percentile service time | date_histogram_calendar_interval |      15.439 |     ms |
|                                   90th percentile service time | date_histogram_calendar_interval |     16.2973 |     ms |
|                                   99th percentile service time | date_histogram_calendar_interval |     22.7529 |     ms |
|                                  100th percentile service time | date_histogram_calendar_interval |      26.729 |     ms |
|                                                     error rate | date_histogram_calendar_interval |           0 |      % |
|                                                 Min Throughput |    date_histogram_fixed_interval |        1.51 |  ops/s |
|                                                Mean Throughput |    date_histogram_fixed_interval |        1.52 |  ops/s |
|                                              Median Throughput |    date_histogram_fixed_interval |        1.51 |  ops/s |
|                                                 Max Throughput |    date_histogram_fixed_interval |        1.53 |  ops/s |
|                                        50th percentile latency |    date_histogram_fixed_interval |     14.2897 |     ms |
|                                        90th percentile latency |    date_histogram_fixed_interval |     14.9021 |     ms |
|                                        99th percentile latency |    date_histogram_fixed_interval |     20.7376 |     ms |
|                                       100th percentile latency |    date_histogram_fixed_interval |     55.6359 |     ms |
|                                   50th percentile service time |    date_histogram_fixed_interval |     12.7421 |     ms |
|                                   90th percentile service time |    date_histogram_fixed_interval |     13.4117 |     ms |
|                                   99th percentile service time |    date_histogram_fixed_interval |     19.2192 |     ms |
|                                  100th percentile service time |    date_histogram_fixed_interval |     53.8581 |     ms |
|                                                     error rate |    date_histogram_fixed_interval |           0 |      % |
|                                                 Min Throughput |               date_histogram_agg |        1.51 |  ops/s |
|                                                Mean Throughput |               date_histogram_agg |        1.51 |  ops/s |
|                                              Median Throughput |               date_histogram_agg |        1.51 |  ops/s |
|                                                 Max Throughput |               date_histogram_agg |        1.52 |  ops/s |
|                                        50th percentile latency |               date_histogram_agg |     27.4653 |     ms |
|                                        90th percentile latency |               date_histogram_agg |     29.1013 |     ms |
|                                        99th percentile latency |               date_histogram_agg |     34.5083 |     ms |
|                                       100th percentile latency |               date_histogram_agg |     44.9958 |     ms |
|                                   50th percentile service time |               date_histogram_agg |     26.0248 |     ms |
|                                   90th percentile service time |               date_histogram_agg |     27.9901 |     ms |
|                                   99th percentile service time |               date_histogram_agg |     33.4492 |     ms |
|                                  100th percentile service time |               date_histogram_agg |     43.4751 |     ms |
|                                                     error rate |               date_histogram_agg |           0 |      % |



---------------------------------
[INFO] SUCCESS (took 411 seconds)
---------------------------------

jainankitk · 2023-10-25T23:51:05Z

Benchmark numbers without the optimization on same setup:

|                                                 Min Throughput | date_histogram_calendar_interval |        0.11 |  ops/s |
|                                                Mean Throughput | date_histogram_calendar_interval |        0.11 |  ops/s |
|                                              Median Throughput | date_histogram_calendar_interval |        0.11 |  ops/s |
|                                                 Max Throughput | date_histogram_calendar_interval |        0.11 |  ops/s |
|                                        50th percentile latency | date_histogram_calendar_interval |      856470 |     ms |
|                                        90th percentile latency | date_histogram_calendar_interval | 1.19283e+06 |     ms |
|                                        99th percentile latency | date_histogram_calendar_interval | 1.26869e+06 |     ms |
|                                       100th percentile latency | date_histogram_calendar_interval | 1.27709e+06 |     ms |
|                                   50th percentile service time | date_histogram_calendar_interval |     9139.83 |     ms |
|                                   90th percentile service time | date_histogram_calendar_interval |     9238.84 |     ms |
|                                   99th percentile service time | date_histogram_calendar_interval |     9329.85 |     ms |
|                                  100th percentile service time | date_histogram_calendar_interval |     9352.46 |     ms |
|                                                     error rate | date_histogram_calendar_interval |           0 |      % |
|                                                 Min Throughput |    date_histogram_fixed_interval |        0.11 |  ops/s |
|                                                Mean Throughput |    date_histogram_fixed_interval |        0.11 |  ops/s |
|                                              Median Throughput |    date_histogram_fixed_interval |        0.11 |  ops/s |
|                                                 Max Throughput |    date_histogram_fixed_interval |        0.11 |  ops/s |
|                                        50th percentile latency |    date_histogram_fixed_interval |      821916 |     ms |
|                                        90th percentile latency |    date_histogram_fixed_interval | 1.14567e+06 |     ms |
|                                        99th percentile latency |    date_histogram_fixed_interval | 1.21831e+06 |     ms |
|                                       100th percentile latency |    date_histogram_fixed_interval |  1.2264e+06 |     ms |
|                                   50th percentile service time |    date_histogram_fixed_interval |     8820.67 |     ms |
|                                   90th percentile service time |    date_histogram_fixed_interval |     8895.23 |     ms |
|                                   99th percentile service time |    date_histogram_fixed_interval |     9008.92 |     ms |
|                                  100th percentile service time |    date_histogram_fixed_interval |      9014.7 |     ms |
|                                                     error rate |    date_histogram_fixed_interval |           0 |      % |
|                                                 Min Throughput |               date_histogram_agg |        1.49 |  ops/s |
|                                                Mean Throughput |               date_histogram_agg |         1.5 |  ops/s |
|                                              Median Throughput |               date_histogram_agg |         1.5 |  ops/s |
|                                                 Max Throughput |               date_histogram_agg |         1.5 |  ops/s |
|                                        50th percentile latency |               date_histogram_agg |     548.597 |     ms |
|                                        90th percentile latency |               date_histogram_agg |     558.008 |     ms |
|                                        99th percentile latency |               date_histogram_agg |     571.611 |     ms |
|                                       100th percentile latency |               date_histogram_agg |     572.667 |     ms |
|                                   50th percentile service time |               date_histogram_agg |     547.138 |     ms |
|                                   90th percentile service time |               date_histogram_agg |     557.289 |     ms |
|                                   99th percentile service time |               date_histogram_agg |     569.914 |     ms |
|                                  100th percentile service time |               date_histogram_agg |     571.978 |     ms |
|                                                     error rate |               date_histogram_agg |           0 |      % |

jainankitk · 2023-10-27T00:35:15Z

Code changes for above benchmark: https://github.com/jainankitk/OpenSearch/compare/db90a415ff2fd428b4f7b3f800a51dc229287cb4..bb2a3b8e8f0250f5e7bc5fa22ba2e3258e8fd8e0

getsaurabh02 · 2023-10-30T23:01:37Z

Thanks @jainankitk this is exciting!! The benchmark numbers for date histogram operations here are very encouraging.

While we are trying out the initial POC with the Date Histogram changes, does it makes sense to convert this issue into a META Issue and breakdown the improvements into specific tracks for improvements further? That way we can encourage some incremental/parallel help from the community while we dive into the optimization.

hdhalter · 2023-12-11T23:45:50Z

Hi @jainankitk, will documentation for this feature be required for 2.12? If so, can you please create the doc issue, let me know who will be creating the initial doc PR, and add this dev issue to the unified tracker project? Thanks!

jainankitk added enhancement Enhancement or improvement to existing feature or request untriaged labels Aug 14, 2023

Xtansia added the Search:Aggregations label Aug 14, 2023

kkhatua added the Performance This is for any performance related enhancements or bugs label Aug 17, 2023

jainankitk removed the untriaged label Aug 24, 2023

macohen added this to OpenSearch Lucene & Core Performance Tracking Sep 8, 2023

github-project-automation bot moved this to Lucene (In Progress) in OpenSearch Lucene & Core Performance Tracking Sep 8, 2023

jainankitk mentioned this issue Sep 15, 2023

[BUG] IndicesQueryCache$CachingWeightWrapper#count should delegate to internal weight #10060

Closed

getsaurabh02 added this to Performance Roadmap Oct 13, 2023

github-project-automation bot moved this to Todo in Performance Roadmap Oct 13, 2023

getsaurabh02 moved this from Todo to In Progress in Performance Roadmap Oct 16, 2023

getsaurabh02 added the v2.12.0 Issues and PRs related to version 2.12.0 label Oct 30, 2023

jainankitk mentioned this issue Nov 3, 2023

Improving the performance of date histogram aggregation (without any sub-aggregation) #11083

Merged

7 tasks

This was referenced Nov 15, 2023

[Date Histogram] Apply the optimization to AutoDateHistogram #11220

Closed

[Date Histogram] Apply the optimization to Composite aggregation with source as DateHistogram #11301

Closed

msfroh closed this as completed in #11083 Nov 28, 2023

github-project-automation bot moved this from Open to Closed in OpenSearch Lucene & Core Performance Tracking Nov 28, 2023

github-project-automation bot moved this from In Progress to Done in Performance Roadmap Nov 28, 2023

jainankitk mentioned this issue Nov 29, 2023

[Backport 2.x] Improving the performance of date histogram aggregation (without any … #11390

Merged

7 tasks

jainankitk self-assigned this Nov 29, 2023

jainankitk mentioned this issue Jan 4, 2024

Supporting fast bucket aggregation on numeric multi field aggregation #11740

Open

jainankitk mentioned this issue Feb 20, 2024

[RFC] Query Planning and Rewriting #12390

Open

This was referenced Mar 11, 2024

Investigate how to handle the deleted document in fast filter optimization #12596

Open

[Proposal] Support sub aggregation in filter rewrite optimization #12602

Open

msfroh mentioned this issue Mar 20, 2024

[RFC] Pre Compute Aggregations with Star Tree index #12498

Open

Bukhtawar mentioned this issue Aug 19, 2024

Adding @jainankitk as a Maintainer #15304

Merged

github-project-automation bot added this to OpenSearch Project Roadmap Aug 30, 2024

github-project-automation bot moved this to 2.12.0 (Launched) in OpenSearch Project Roadmap Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understand/Improve the performance of date histograms #9310

Understand/Improve the performance of date histograms #9310

jainankitk commented Aug 14, 2023 •

edited

Loading

jainankitk commented Aug 17, 2023 •

edited

Loading

jainankitk commented Aug 17, 2023 •

edited

Loading

jainankitk commented Aug 17, 2023 •

edited

Loading

jainankitk commented Aug 17, 2023 •

edited

Loading

backslasht commented Aug 18, 2023

msfroh commented Aug 22, 2023

jainankitk commented Aug 22, 2023

jainankitk commented Aug 22, 2023

gashutos commented Aug 23, 2023 •

edited

Loading

jainankitk commented Aug 23, 2023

gashutos commented Aug 23, 2023

mikemccand commented Aug 28, 2023 •

edited

Loading

mikemccand commented Aug 28, 2023

jainankitk commented Sep 14, 2023

jainankitk commented Sep 14, 2023

jainankitk commented Sep 14, 2023

jainankitk commented Sep 18, 2023 •

edited

Loading

jainankitk commented Oct 18, 2023

backslasht commented Oct 19, 2023

backslasht commented Oct 19, 2023

jainankitk commented Oct 23, 2023

jainankitk commented Oct 23, 2023

jainankitk commented Oct 25, 2023 •

edited

Loading

jainankitk commented Oct 25, 2023

jainankitk commented Oct 27, 2023

getsaurabh02 commented Oct 30, 2023

hdhalter commented Dec 11, 2023

Understand/Improve the performance of date histograms #9310

Understand/Improve the performance of date histograms #9310

Comments

jainankitk commented Aug 14, 2023 • edited Loading

jainankitk commented Aug 17, 2023 • edited Loading

jainankitk commented Aug 17, 2023 • edited Loading

jainankitk commented Aug 17, 2023 • edited Loading

jainankitk commented Aug 17, 2023 • edited Loading

backslasht commented Aug 18, 2023

msfroh commented Aug 22, 2023

jainankitk commented Aug 22, 2023

jainankitk commented Aug 22, 2023

gashutos commented Aug 23, 2023 • edited Loading

jainankitk commented Aug 23, 2023

gashutos commented Aug 23, 2023

mikemccand commented Aug 28, 2023 • edited Loading

mikemccand commented Aug 28, 2023

jainankitk commented Sep 14, 2023

jainankitk commented Sep 14, 2023

jainankitk commented Sep 14, 2023

jainankitk commented Sep 18, 2023 • edited Loading

jainankitk commented Oct 18, 2023

backslasht commented Oct 19, 2023

backslasht commented Oct 19, 2023

jainankitk commented Oct 23, 2023

jainankitk commented Oct 23, 2023

jainankitk commented Oct 25, 2023 • edited Loading

jainankitk commented Oct 25, 2023

jainankitk commented Oct 27, 2023

getsaurabh02 commented Oct 30, 2023

hdhalter commented Dec 11, 2023

jainankitk commented Aug 14, 2023 •

edited

Loading

jainankitk commented Aug 17, 2023 •

edited

Loading

jainankitk commented Aug 17, 2023 •

edited

Loading

jainankitk commented Aug 17, 2023 •

edited

Loading

jainankitk commented Aug 17, 2023 •

edited

Loading

gashutos commented Aug 23, 2023 •

edited

Loading

mikemccand commented Aug 28, 2023 •

edited

Loading

jainankitk commented Sep 18, 2023 •

edited

Loading

jainankitk commented Oct 25, 2023 •

edited

Loading