-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script queries using joda time getMillis() slower than using java time toInstant().toEpochMilli() #35754
Comments
@rjernst I updated the description since I figured out a way to increase the sample dataset hit by the query. |
Thanks for the steps @conet. I am now able to reproduce this and have done a detailed investigation. Your example has about 200 shards due to the 5 default shards per index and 40 indexes you create. When I first tried running your reproduction steps, I only did 20 copies, and noticed it was about half the time you saw (around 400-500ms). When I went up to 40 shards, I then see about the same times you saw. This led me to hypothesize the issue is scales linearly with the number of shards. To confirm this theory, I reindexed all 40 indexes into a single index, and ran the slow query. The response was immediate: no slowness. So, the overhead of a single deprecation call is not an issue, it is only when many are done within the same request, across different shards on the same node. I then ran a profiler while running the slow query a few times. The interesting part of the results were as follows:
So the time is spent in the thread local we set to store warning response headers. This actually has nothing to do with scripting exactly, but instead the fact the scripts are run per shard, and the deprecation call happens many times across different ThreadContexts (because each shard's search is run independently). As far as how to avoid this immediately, there are two options:
A long term fix is may be more complicated, and needs some agreement from other developers here. I'm marking this issue for discussion after the holidays. There are a few options to consider:
My vote is for 3, as I think this is the simplest to implement. |
Thanks for the thorough investigation, I agree that 3 sounds like the best solution. |
@rjernst we discussed this in FixIt thursday, and there was general agreement that 3 seems like the right approach. obv up to you if you wanna leave it |
I'm not sure if suppressing warnings is a good idea here. When we suppress deprecation warnings in logs I think it's ok because it's the same consumer (the log file) that would get all the warnings. If we suppress the warning headers int he same way then only 1 in every 128 callers will get any given deprecation warning meaning some clients may never receive the warnings. This seems like it might turn the deprecations warnings into a lottery making it hard to know that you are not hitting any deprecations when parsing a response because you may have hit some suppressed deprecation site. If we are going to suppress deprecations warnings I think we should only do it within the same request so every request that hits a deprecation site is guaranteed to emit at least one of those deprecation warning headers, though I fear this will probably not be enough to solve the original performance issue |
@colings86 IMO the deprecation log is the "official" source of truth for deprecations that occurred. Deprecations may not occur on a request: for example, they could be when reading a setting on startup. In your example, the deprecation log would already have an entry. I really wonder if deprecation warnings should be removed altogether from responses. @pcsanwald Given @colings86`s concerns, I think this should be discussed again. |
I think the warnings in responses are useful because the caller is often not the administrator and so often will not have access to the deprecation logs. Having an indication in the response means the caller can be made aware they are using a deprecated feature and have the chance to rectify things without the admin needing to identify and alert each user themselves (though they may still need to do this if some users ignore the warnings). I agree that it would be good to discuss this again though |
We discussed this in our team discussion meeting today. Because responses are ephemeral in nature, it seems that the warning headers should stay on each response. What we do in the log messages is okay because the log files are generally durable. Instead of addressing this by circumventing the issue, we want to go after the performance problems. We investigated some of them during our meeting and found room for improvement in a few areas. I opened #37590 and #37597 for these. What remains is half explained by the de-duplication (#37530). Let's attack that now. |
1. Setting length for formatWarning String to avoid AbstractStringBuilder.ensureCapacityInternal calls 2. Adding extra check for parameter array length == 0 to avoid unnecessarily creating StringBuilder in LoggerMessageFormat.format Helps to narrow the performance gap in throughout for geonames benchmark (#37411) by 3%. For more details: #37530 (comment) Relates to #37530 Relates to #37411 Relates to #35754
1. Setting length for formatWarning String to avoid AbstractStringBuilder.ensureCapacityInternal calls 2. Adding extra check for parameter array length == 0 to avoid unnecessarily creating StringBuilder in LoggerMessageFormat.format Helps to narrow the performance gap in throughout for geonames benchmark (elastic#37411) by 3%. For more details: elastic#37530 (comment) Relates to elastic#37530 Relates to elastic#37411 Relates to elastic#35754
1. Setting length for formatWarning String to avoid AbstractStringBuilder.ensureCapacityInternal calls 2. Adding extra check for parameter array length == 0 to avoid unnecessarily creating StringBuilder in LoggerMessageFormat.format Helps to narrow the performance gap in throughout for geonames benchmark (#37411) by 3%. For more details: #37530 (comment) Relates to #37530 Relates to #37411 Relates to #35754
With the changes mentioned by @jasontedor and as @ebadyano notes in a comment the performance hit has reduced to a 4% hit. Since Joda removal is planned for version 8 and the performance improvements landed in 7, we recommend moving to the latest 7 to avoid this performance trap. |
Elasticsearch version (
bin/elasticsearch --version
):Version: 6.5.0, Build: default/tar/816e6f6/2018-11-09T18:58:36.352602Z, JVM: 1.8.0_191
Plugins installed: []
No extra plugins
JVM version (
java -version
):java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
Description of the problem including expected versus actual behavior:
I've been using script queries that called
getMillis()
without issues until I upgraded to6.5.0
after I started noticing serious query time deterioration (I'm talking about tens of seconds up from a few seconds), eventually by reading the changelog I realized that joda time was being deprecated, after switching thegetMillis()
call totoInstant().toEpochMilli()
I got back the original query times (the ones before the upgrade). See a more detailed description in this PR. Since I've been asked to open an issue with the steps to reproduce it here they are.Steps to reproduce:
Sample web logs
sample datasetsWe're talking about 95K documents and the deprecated query is already 30 times slower, one could argue that it is because the deprecation log itself but from my experience with bigger indexes the problem becomes more noticeable, the slow query seems to slow down linearly with the number of documents checked by it.
The text was updated successfully, but these errors were encountered: