Script queries using joda time getMillis() slower than using java time toInstant().toEpochMilli() #35754

conet · 2018-11-20T21:26:07Z

Elasticsearch version (bin/elasticsearch --version):
Version: 6.5.0, Build: default/tar/816e6f6/2018-11-09T18:58:36.352602Z, JVM: 1.8.0_191

Plugins installed: []
No extra plugins

JVM version (java -version):
java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

Description of the problem including expected versus actual behavior:
I've been using script queries that called getMillis() without issues until I upgraded to 6.5.0 after I started noticing serious query time deterioration (I'm talking about tens of seconds up from a few seconds), eventually by reading the changelog I realized that joda time was being deprecated, after switching the getMillis() call to toInstant().toEpochMilli() I got back the original query times (the ones before the upgrade). See a more detailed description in this PR. Since I've been asked to open an issue with the steps to reproduce it here they are.

Steps to reproduce:

Download and start elasticsearch 6.5.0
Download and start kibana 6.5.0
Load the Sample web logs sample datasets
Increase the doc count by copying the sample index (I copied it 40 times):

curl -XPOST "http://localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "kibana_sample_data_logs"
  },
  "dest": {
    "index": "kibana_sample_data_logs_1"
  }
}'

Run the fast query:

curl -XPOST "http://localhost:9200/kibana_sample_data_logs*/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "script": {
            "script": {
              "source": "doc[\"timestamp\"].value.toInstant().toEpochMilli() < params.millis",
              "lang": "painless",
              "params": {
                "millis": 1542747500000
              }
            }
          }
        }
      ]
    }
  }
}'
---
{
  "took" : 38,
  "timed_out" : false,
  "_shards" : {
    "total" : 201,
    "successful" : 201,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 94259,
    "max_score" : 0.0,
    "hits" : [ ]
  }
}

Run the slow query:

curl -XPOST "http://localhost:9200/kibana_sample_data_logs*/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "script": {
            "script": {
              "source": "doc[\"timestamp\"].value.getMillis() < params.millis",
              "lang": "painless",
              "params": {
                "millis": 1542747500000
              }
            }
          }
        }
      ]
    }
  }
}'
---
#! Deprecation: Use of the joda time method [getMillis()] is deprecated. Use [toInstant().toEpochMilli()] instead.
{
  "took" : 1088,
  "timed_out" : false,
  "_shards" : {
    "total" : 201,
    "successful" : 201,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 94259,
    "max_score" : 0.0,
    "hits" : [ ]
  }
}

We're talking about 95K documents and the deprecated query is already 30 times slower, one could argue that it is because the deprecation log itself but from my experience with bigger indexes the problem becomes more noticeable, the slow query seems to slow down linearly with the number of documents checked by it.

The text was updated successfully, but these errors were encountered:

conet · 2018-11-20T21:36:46Z

@rjernst I updated the description since I figured out a way to increase the sample dataset hit by the query.

rjernst · 2018-12-22T05:06:13Z

Thanks for the steps @conet. I am now able to reproduce this and have done a detailed investigation.

Your example has about 200 shards due to the 5 default shards per index and 40 indexes you create. When I first tried running your reproduction steps, I only did 20 copies, and noticed it was about half the time you saw (around 400-500ms). When I went up to 40 shards, I then see about the same times you saw. This led me to hypothesize the issue is scales linearly with the number of shards. To confirm this theory, I reindexed all 40 indexes into a single index, and ran the slow query. The response was immediate: no slowness. So, the overhead of a single deprecation call is not an issue, it is only when many are done within the same request, across different shards on the same node.

I then ran a profiler while running the slow query a few times. The interesting part of the results were as follows:

--- 14030000000 ns (37.35%), 1403 samples
  [ 0] org.apache.lucene.util.CloseableThreadLocal.set
  [ 1] org.elasticsearch.common.util.concurrent.ThreadContext$ContextThreadLocal.set
  [ 2] org.elasticsearch.common.util.concurrent.ThreadContext.addResponseHeader
  [ 3] org.elasticsearch.common.logging.DeprecationLogger.deprecated
  [ 4] org.elasticsearch.common.logging.DeprecationLogger.deprecatedAndMaybeLog
  [ 5] org.elasticsearch.script.JodaCompatibleZonedDateTime.lambda$logDeprecated$0
  [ 6] org.elasticsearch.script.JodaCompatibleZonedDateTime$$Lambda$3213.1313296263.run
  [ 7] java.security.AccessController.doPrivileged
  [ 8] org.elasticsearch.script.JodaCompatibleZonedDateTime.logDeprecated
  [ 9] org.elasticsearch.script.JodaCompatibleZonedDateTime.logDeprecatedMethod
  [10] org.elasticsearch.script.JodaCompatibleZonedDateTime.getMillis

So the time is spent in the thread local we set to store warning response headers. This actually has nothing to do with scripting exactly, but instead the fact the scripts are run per shard, and the deprecation call happens many times across different ThreadContexts (because each shard's search is run independently).

As far as how to avoid this immediately, there are two options:

reduce the number of shards you search against
convert your scripts to use the non-deprecated method (which it sounds like you may have already done)

A long term fix is may be more complicated, and needs some agreement from other developers here. I'm marking this issue for discussion after the holidays. There are a few options to consider:

Don't do anything: this is only for deprecations, and there is a workaround (and in general there always should be, ie what replaces the deprecated usage). This option isn't great from a user perspective, as upgrading should generally not cause a slowdown just because you were using something that was previously ok to use.
Create tracking across threads for deprecations: this would most likely be incredibly complex, and I'm unsure how it could work effectively in practice as the ThreadContexts intention is to avoid cross thread barriers.
Use the same logging guard for warning headers: we currently guard against logging the same deprecation warning multiple times by using a unique key per deprecation site, and keeping track of the last 128 unique deprecations on the node. We made the decision at that time to continue writing all warning headers all the time. If deprecations can be truncated in the log, why not also in warning headers?

My vote is for 3, as I think this is the simplest to implement.

conet · 2018-12-22T18:06:52Z

Thanks for the thorough investigation, I agree that 3 sounds like the best solution.

pcsanwald · 2019-01-03T22:50:55Z

@rjernst we discussed this in FixIt thursday, and there was general agreement that 3 seems like the right approach. obv up to you if you wanna leave it discuss or remove the label.

colings86 · 2019-01-04T08:35:22Z

I'm not sure if suppressing warnings is a good idea here. When we suppress deprecation warnings in logs I think it's ok because it's the same consumer (the log file) that would get all the warnings. If we suppress the warning headers int he same way then only 1 in every 128 callers will get any given deprecation warning meaning some clients may never receive the warnings. This seems like it might turn the deprecations warnings into a lottery making it hard to know that you are not hitting any deprecations when parsing a response because you may have hit some suppressed deprecation site.

If we are going to suppress deprecations warnings I think we should only do it within the same request so every request that hits a deprecation site is guaranteed to emit at least one of those deprecation warning headers, though I fear this will probably not be enough to solve the original performance issue

rjernst · 2019-01-08T23:45:41Z

@colings86 IMO the deprecation log is the "official" source of truth for deprecations that occurred. Deprecations may not occur on a request: for example, they could be when reading a setting on startup. In your example, the deprecation log would already have an entry. I really wonder if deprecation warnings should be removed altogether from responses.

@pcsanwald Given @colings86`s concerns, I think this should be discussed again.

colings86 · 2019-01-09T09:29:44Z

I really wonder if deprecation warnings should be removed altogether from responses.

I think the warnings in responses are useful because the caller is often not the administrator and so often will not have access to the deprecation logs. Having an indication in the response means the caller can be made aware they are using a deprecated feature and have the chance to rectify things without the admin needing to identify and alert each user themselves (though they may still need to do this if some users ignore the warnings).

I agree that it would be good to discuss this again though

jasontedor · 2019-01-18T03:58:08Z

We discussed this in our team discussion meeting today. Because responses are ephemeral in nature, it seems that the warning headers should stay on each response. What we do in the log messages is okay because the log files are generally durable. Instead of addressing this by circumventing the issue, we want to go after the performance problems. We investigated some of them during our meeting and found room for improvement in a few areas. I opened #37590 and #37597 for these. What remains is half explained by the de-duplication (#37530). Let's attack that now.

1. Setting length for formatWarning String to avoid AbstractStringBuilder.ensureCapacityInternal calls 2. Adding extra check for parameter array length == 0 to avoid unnecessarily creating StringBuilder in LoggerMessageFormat.format Helps to narrow the performance gap in throughout for geonames benchmark (#37411) by 3%. For more details: #37530 (comment) Relates to #37530 Relates to #37411 Relates to #35754

1. Setting length for formatWarning String to avoid AbstractStringBuilder.ensureCapacityInternal calls 2. Adding extra check for parameter array length == 0 to avoid unnecessarily creating StringBuilder in LoggerMessageFormat.format Helps to narrow the performance gap in throughout for geonames benchmark (elastic#37411) by 3%. For more details: elastic#37530 (comment) Relates to elastic#37530 Relates to elastic#37411 Relates to elastic#35754

1. Setting length for formatWarning String to avoid AbstractStringBuilder.ensureCapacityInternal calls 2. Adding extra check for parameter array length == 0 to avoid unnecessarily creating StringBuilder in LoggerMessageFormat.format Helps to narrow the performance gap in throughout for geonames benchmark (#37411) by 3%. For more details: #37530 (comment) Relates to #37530 Relates to #37411 Relates to #35754

stu-elastic · 2020-04-16T20:41:51Z

With the changes mentioned by @jasontedor and as @ebadyano notes in a comment the performance hit has reduced to a 4% hit.

Since Joda removal is planned for version 8 and the performance improvements landed in 7, we recommend moving to the latest 7 to avoid this performance trap.

conet mentioned this issue Nov 20, 2018

Scripting: Conditionally use java time api in scripting #31441

Merged

rjernst self-assigned this Nov 20, 2018

rjernst added the :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache label Nov 20, 2018

rjernst added the discuss label Dec 22, 2018

This was referenced Jan 17, 2019

Only update response headers if we have a new one #37590

Merged

Add some deprecation optimizations #37597

Merged

DeprecationLogger's deduplication performance #37530

Closed

This was referenced Jan 18, 2019

Remove warn-date from warning headers #37622

Merged

Optimize warning header de-duplication #37725

Merged

tomcallahan removed the discuss label Jan 25, 2019

ebadyano mentioned this issue Feb 12, 2019

Reduce garbage from allocations in DeprecationLogger #38780

Merged

ebadyano mentioned this issue Feb 25, 2019

Reduce garbage from allocations in deprecation logger (#38780) #39370

Merged

stu-elastic closed this as completed Apr 16, 2020

mbaranczak mentioned this issue May 3, 2022

[BUG] Calling getMillis() in Painless script is slow opensearch-project/OpenSearch#3156

Closed

reta mentioned this issue May 19, 2022

[REMOVE] Remove from scripts the deprecated methods of JodaCompatibleZonedDateTime opensearch-project/OpenSearch-Dashboards#1600

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script queries using joda time getMillis() slower than using java time toInstant().toEpochMilli() #35754

Script queries using joda time getMillis() slower than using java time toInstant().toEpochMilli() #35754

conet commented Nov 20, 2018 •

edited

Loading

conet commented Nov 20, 2018 •

edited

Loading

rjernst commented Dec 22, 2018

conet commented Dec 22, 2018

pcsanwald commented Jan 3, 2019

colings86 commented Jan 4, 2019

rjernst commented Jan 8, 2019

colings86 commented Jan 9, 2019 •

edited

Loading

jasontedor commented Jan 18, 2019

stu-elastic commented Apr 16, 2020

Script queries using joda time getMillis() slower than using java time toInstant().toEpochMilli() #35754

Script queries using joda time getMillis() slower than using java time toInstant().toEpochMilli() #35754

Comments

conet commented Nov 20, 2018 • edited Loading

conet commented Nov 20, 2018 • edited Loading

rjernst commented Dec 22, 2018

conet commented Dec 22, 2018

pcsanwald commented Jan 3, 2019

colings86 commented Jan 4, 2019

rjernst commented Jan 8, 2019

colings86 commented Jan 9, 2019 • edited Loading

jasontedor commented Jan 18, 2019

stu-elastic commented Apr 16, 2020

conet commented Nov 20, 2018 •

edited

Loading

conet commented Nov 20, 2018 •

edited

Loading

colings86 commented Jan 9, 2019 •

edited

Loading