Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiler: Don’t profile NEXTDOC for ConstantScoreQuery. #33196

Merged
merged 6 commits into from
Sep 19, 2018

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Aug 28, 2018

A ConstantScore query will return the iterator of its inner query.
However, when profiling, the constant score query is wrapped separately
from its inner query, which distorts the times emitted by the profiler.
Return the iterator directly in such a case.

Closes #23430

A ConstantScore query will return the iterator of its inner query.
However, when profiling, the constant score query is wrapped separately
from its inner query, which distorts the times emitted by the profiler.
Return the iterator directly in such a case.

Closes elastic#23430
@iverase iverase requested a review from jpountz August 28, 2018 09:28
@iverase iverase added the :Search/Search Search-related issues that do not fall into other categories label Aug 28, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs


assertThat(breakdownConstantScoreQuery.get(QueryTimingType.CREATE_WEIGHT.toString() + "_count").longValue(), greaterThan(0L));
assertThat(breakdownConstantScoreQuery.get(QueryTimingType.BUILD_SCORER.toString() + "_count").longValue(), greaterThan(0L));
assertThat(breakdownConstantScoreQuery.get(QueryTimingType.NEXT_DOC.toString() + "_count").longValue(), equalTo(0L));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it shouldn't be 0 but the same value as the wrapped query since timings are supposed to include timings of children

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Adrien! I see what you mean, somewhat the constant score query should inherit the timers from their children.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, or in that particular case, from their child since constant-score queries can only have one child.

Copy link
Contributor Author

@iverase iverase Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there are at least three cases here when building a profile scorer of a constant score query:

  1. Normal case: The provided scorer is a normal scorer with one child that is an instance of ProfileScorer.
  2. Total hits query: The provided scorer is already a profile scorer with no children.
  3. Cached total hits query: The provided scorer is a normal scorer with no children.

I think the difference in total hits queries come from this method inside Lucene:

 public Weight createWeight(Query query, boolean needsScores, float boost) throws IOException {
    final QueryCache queryCache = this.queryCache;
    Weight weight = query.createWeight(this, needsScores, boost);
    if (needsScores == false && queryCache != null) {
      weight = queryCache.doCache(weight, queryCachingPolicy);
    }
    return weight;
  }

Where the weight created is a profiled weight but get transformed into a CachingWrapperWeight. Then depending if the query is cached or not the scorer gets wrapped into a profile scorer or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't want to do anything if the underlying scorer is cached since it will be consumed differently from this scorer. We should only try to reuse timing information if the underlying scorer is a ProfileScorer too?

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question.

profileScorer = (ProfileScorer) scorer;
} else if (w.getQuery() instanceof ConstantScoreQuery && scorer.getChildren().size() == 1) {
profileScorer = (ProfileScorer) scorer.getChildren().iterator().next().child;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand when the if case may happen but I'm less sure about the second else if, when does it happen? Can you leave comments explaining?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, here is where the complexity is:

  1. The first if is hit when you have a totalHits query that is not cached. (Test testConstantScoreTotalHitsQuery and cache ==false).

  2. The else if is hit when you have a top N query. (Test testConstantScoreQuery)

  3. There is a third case where we have a totalHits query and it is cached. In this case the provided scorer is not an instance of ProfileScorer and has no children. In that case we do not do any special treatment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Can you add a comment to explain this? Also should we do an instanceof check on the scorer in the second case to be safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a comment. Random test shows that the second check is always true but I will add a instanceof check to be on the safe side.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iverase !

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@iverase iverase merged commit 7f473b6 into elastic:master Sep 19, 2018
iverase added a commit that referenced this pull request Sep 19, 2018
* Profiler: Don’t profile NEXTDOC for ConstantScoreQuery.

A ConstantScore query will return the iterator of its inner query.
However, when profiling, the constant score query is wrapped separately
from its inner query, which distorts the times emitted by the profiler.
Return the iterator directly in such a case.

Closes #23430
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Sep 19, 2018
* master: (46 commits)
  Fixing assertions in integration test (elastic#33833)
  [CCR] Rename idle_shard_retry_delay to poll_timout in auto follow patterns (elastic#33821)
  HLRC: Delete ML calendar (elastic#33775)
  Move DocsStats into Engine (elastic#33835)
  [Docs] Clarify accessing Date methods in painless (elastic#33560)
  add elasticsearch-shard tool (elastic#32281)
  Cut over to unwrap segment reader (elastic#33843)
  SQL: Fix issue with options for QUERY() and MATCH(). (elastic#33828)
  Emphasize that filesystem-level backups don't work (elastic#33102)
  Use the global doc id to generate a random score (elastic#33599)
  Add minimal sanity checks to custom/scripted similarities. (elastic#33564)
  Profiler: Don’t profile NEXTDOC for ConstantScoreQuery. (elastic#33196)
  [CCR] Change FollowIndexAction.Request class to be more user friendly (elastic#33810)
  SQL: day and month name functions tests locale providers enforcement (elastic#33653)
  TESTS: Set SO_LINGER = 0 for MockNioTransport (elastic#32560)
  Test: Relax jarhell gradle test (elastic#33787)
  [CCR] Fail with a descriptive error if leader index does not exist (elastic#33797)
  Add ES version 6.4.2 (elastic#33831)
  MINOR: Remove Some Dead Code in Scripting (elastic#33800)
  Ensure realtime `_get` and `_termvectors` don't run on the network thread (elastic#33814)
  ...
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Apr 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants