Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Reduce number of ranges generated for comparisons #30267

Merged
merged 6 commits into from
May 2, 2018

Conversation

costin
Copy link
Member

@costin costin commented Apr 30, 2018

Rewrote optimization rule for combining ranges by improving the
detection of binary comparisons in a tree to better combine
them in a range, regardless of their place inside an expression.
Additionally, improve the comparisons of Numbers of different types
Also, improve reassembly of conjunction/disjunction into balanced
trees.

Fix #30017
Closes #30019

Rewrote optimization rule for combining ranges by improving the
detection of binary comparisons in a tree to better combine
them in a range, regardless of their place inside an expression.
Additionally, improve the comparisons of Numbers of different types
Also, improve reassembly of conjunction/disjunction into balanced
trees.

Fix elastic#30017
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few things. The actual range optimizer code makes my head spin a bit so I'm going to have to come back to it.

}
} catch (ClassCastException cce) {
// types are not compatible
// return null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the side effects on not catching this and letting it bubble out? Or even wrapping it? I think it'd be nice to have a big comment here about why returning null is ok.

Also, can you move the try into the if statement so it looks smaller? That'd just make me a bit more comfortable. If returning null is right here then I think you should return it from the catch rather than fall out. That feels a little safer even if it isn't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CCE is thrown when the types are not compatible; I've expanded the comment and returns the null directly from catch.

}

/**
* Build a binary 'pyramid' - while a bit longer this method creates a balanced tree as oppose to a plain
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

Builds a "pyramid" out of all clauses by combining them pairwise. So
{@code combine(List(a, b, c, d, e), AND)` becomes
<pre><code>
       AND
   |---/ \---|
  AND        AND
 /  \        / \
e   AND     b   c
    / \    
   a   b   
</pre></code>

The picture would make me feel better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return null;
}

List<Expression> result = new ArrayList<>(exps);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be Queue<Expression> work = new ArrayDeque<>(exps); and then you do work.addLast(combiner.apply(work.removeFirst(), work.removeFirst())); The for loop and remove together scare me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked with @costin earlier about this - he wants to keep the order the same and my proposal doesn't.

What about this?

while (result.size() > 1) {
  ListIterator<Expression> itr = result.iterator();
  while (itr.hasNext()) {
    itr.add(combiner.apply(itr.remove(), itr.remove()));
  }
}

Your version works but for (int i = 0; i < result.size() - 1; i++) { make me think it'll be a normal loop and then you remove and add and I'm confused. Using the ListIterator forces the reader to think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'd like we can follow this up in a different PR.
The approach above doesn't work since remove can only be called once per next/previous.

I've added a comment to the loop explaining that it updates the list (and thus why it uses the temporary variable).
Considering the loops are fairly contained I think that conveys the message without the gymnastics of iterator.

P.S. The list is created just above as a copy for this specific reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with leaving it, yeah. I did want a prettier one but if this is what we can do, it'll do.

@@ -66,11 +65,19 @@ public boolean includeUpper() {

@Override
public boolean foldable() {
return value.foldable() && lower.foldable() && upper.foldable();
if (lower.foldable() && upper.foldable()) {
return (excludingBoundaries() || value.foldable());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the extra ( and )? I don't think they help here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -79,6 +86,13 @@ public Object fold() {
return lowerComparsion && upperComparsion;
}

private boolean excludingBoundaries() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the explanation to javadoc? I do like that you catch this case. I might rename this method. I'd call it isDegenerate but I think that isn't a great name either. matchesNothing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

static class CombineComparisonsIntoRange extends OptimizerExpressionRule {
/**
* Propagate Equals to eliminate conjuncted Ranges.
* When encountering a different Equals or exclusive {@link Range}, the conjunction becomes false.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example, maybe in terms of SQL? Like,
converts {@code SELECT * FROM a < 10 OR a == 10} into {@code SELECT * FROM a <= 10}??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This one converts {@code SELECT * FROM a < 10 AND a = 11} into {@SELECT * FROM FALSE}.

if (ex instanceof Range) {
ranges.add((Range) ex);
} else if (ex instanceof Equals) {
Equals equ = (Equals) ex;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you name equ and eq a little more differently? They sort of blur together to me when I read this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -217,6 +221,10 @@ public void testReplaceFoldableAttributes() {
assertEquals(10, lt.right().fold());
}

//
// Constant folding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in a follow up move each of these chunks into separate tests? OptimizerConstantFoldingTests and OptimizerLogicalSimplificationsTests and OptimizerRangeOptimizationTests or something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to extract the optimizer and the tests and have the rules for BinaryComparisons separate.

@nik9000 nik9000 added the v6.4.0 label Apr 30, 2018
@nik9000
Copy link
Member

nik9000 commented Apr 30, 2018

I added the v6.4.0 label because I think this is worth backporting to the 6.x branch. Since we're trying to keep 6.3 stable I think we shouldn't pull it back to the 6.3 branch.

@costin
Copy link
Member Author

costin commented Apr 30, 2018

Definitely this goes to 6.x, I typically put them in after I merge the code it to avoid race conditions (typically applies for 6.3 though).

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an improvement to me. I'd love a nicer loop to build the tree and I left a suggestion - I'm not 100% sure it'll work, but it is a thing. Also, maybe a unit test for that method too? Otherwise, I think this is good.

costin added 4 commits May 2, 2018 00:22
Do not promote BinaryComparisons to Ranges since it introduces NULL
boundaries and thus a corner-case that needs too much handling
Compare BinaryComparisons directly between themselves and to Ranges
@costin
Copy link
Member Author

costin commented May 2, 2018

retest this

@costin
Copy link
Member Author

costin commented May 2, 2018

run tests

@costin
Copy link
Member Author

costin commented May 2, 2018

test this please

@costin costin merged commit 7790cb5 into elastic:master May 2, 2018
@costin costin deleted the better-binary-comparison-combination branch May 2, 2018 16:35
costin added a commit to costin/elasticsearch that referenced this pull request May 2, 2018
Rewrote optimization rule for combining ranges by improving the
detection of binary comparisons in a tree to better combine
them in a range, regardless of their place inside an expression.
Additionally, improve the comparisons of Numbers of different types
Also, improve reassembly of conjunction/disjunction into balanced
trees.
Do not promote BinaryComparisons to Ranges since it introduces NULL
boundaries and thus a corner-case that needs too much handling
Compare BinaryComparisons directly between themselves and to Ranges

Fix elastic#30017
dnhatn added a commit that referenced this pull request May 4, 2018
* master:
  Set the new lucene version for 6.4.0
  [ML][TEST] Clean up jobs in ModelPlotIT
  Upgrade to 7.4.0-snapshot-1ed95c097b (#30357)
  Watcher: Ensure trigger service pauses execution (#30363)
  [DOCS] Added coming qualifiers in changelog
  [DOCS] Commented out empty sections in the changelog to fix the doc build. (#30372)
  Security: reduce garbage during index resolution (#30180)
  Make RepositoriesMetaData contents unmodifiable (#30361)
  Change quad tree max levels to 29. Closes #21191 (#29663)
  Test: use trial license in qa tests with security
  [ML] Add integration test for model plots (#30359)
  SQL: Fix bug caused by empty composites (#30343)
  [ML] Account for gaps in data counts after job is reopened (#30294)
  InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (#30121)
  Change signature of Get Repositories Response (#30333)
  Tests: Use different watch ids per test in smoke test (#30331)
  [Docs] Add term query with normalizer example
  Adds Eclipse config for xpack licence headers (#30299)
  Watcher: Make start/stop cycle more predictable and synchronous (#30118)
  [test] add debug logging for packaging test
  [DOCS] Removed X-Pack Breaking Changes
  [DOCS] Fixes link to TLS LDAP info
  Update versions for start_trial after backport (#30218)
  Packaging: Set elasticsearch user to have non-existent homedir (#29007)
  [DOCS] Fixes broken links to bootstrap user (#30349)
  Fix NPE when CumulativeSum agg encounters null/empty bucket (#29641)
  Make licensing FIPS-140 compliant (#30251)
  [DOCS] Reorganizes authentication details in Stack Overview (#30280)
  Network: Remove http.enabled setting (#29601)
  Fix merging logic of Suggester Options (#29514)
  [DOCS] Adds LDAP realm configuration details (#30214)
  [DOCS] Adds native realm configuration details (#30215)
  ReplicationTracker.markAllocationIdAsInSync may hang if allocation is cancelled (#30316)
  [DOCS] Enables edit links for X-Pack pages (#30278)
  Packaging: Unmark systemd service file as a config file (#29004)
  SQL: Reduce number of ranges generated for comparisons (#30267)
  Tests: Simplify VersionUtils released version splitting (#30322)
  Cancelling a peer recovery on the source can leak a primary permit (#30318)
  Added changelog entry for deb prerelease version change (#30184)
  Convert server javadoc to html5 (#30279)
  Create default ES_TMPDIR on Windows (#30325)
  [Docs] Clarify `fuzzy_like_this` redirect (#30183)
  Post backport of #29658.
  Fix docs of the `_ignored` meta field.
  Remove MapperService#types(). (#29617)
  Remove useless version checks in REST tests. (#30165)
  Add a new `_ignored` meta field. (#29658)
  Move repository-azure fixture test to QA project (#30253)

# Conflicts:
#	buildSrc/version.properties
#	server/src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
dnhatn added a commit that referenced this pull request May 8, 2018
* 6.x:
  Stop forking javac (#30462)
  Fix tribe tests
  Docs: Use task_id in examples of tasks (#30436)
  Security: Rename IndexLifecycleManager to SecurityIndexManager (#30442)
  Packaging: Set elasticsearch user to have non-existent homedir (#29007)
  [Docs] Fix typo in cardinality-aggregation.asciidoc (#30434)
  Avoid NPE in `more_like_this` when field has zero tokens (#30365)
  Build: Switch to building javadoc with html5 (#30440)
  Add a quick tour of the project to CONTRIBUTING (#30187)
  Add stricter geohash parsing (#30376)
  Reindex: Use request flavored methods (#30317)
  Silence SplitIndexIT.testSplitIndexPrimaryTerm test failure.  (#30432)
  Auto-expand replicas when adding or removing nodes (#30423)
  Silence IndexUpgradeIT test failures. (#30430)
  Fix line length violation in cache tests
  Add failing test for core cache deadlock
  [DOCS] convert forcemerge snippet
  Update forcemerge.asciidoc (#30113)
  Added zentity to the list of API extension plugins (#29143)
  Fix the search request default operation behavior doc (#29302) (#29405)
  Watcher: Mark watcher as started only after loading watches (#30403)
  Correct wording in log message (#30336)
  Do not fail snapshot when deleting a missing snapshotted file (#30332)
  AwaitsFix testCreateShrinkIndexToN
  DOCS: Correct mapping tags in put-template api
  DOCS: Fix broken link in the put index template api
  Add put index template api to high level rest client (#30400)
  [Docs] Add snippets for POS stop tags default value
  Remove entry inadvertently picked into changelog
  Move respect accept header on no handler to 6.3.1
  Respect accept header on no handler (#30383)
  [Test] Add analysis-nori plugin to the vagrant tests
  [Rollup] Validate timezone in range queries (#30338)
  [Docs] Fix bad link
  [Docs] Fix end of section in the korean plugin docs
  add the Korean nori plugin to the change logs
  Expose the Lucene Korean analyzer module in a plugin (#30397)
  Docs: remove transport_client from CCS role example (#30263)
  Test: remove cluster permission from CCS user (#30262)
  Watcher: Remove unneeded index deletion in tests
  fix docs branch version
  fix lucene snapshot version
  Upgrade to 7.4.0-snapshot-1ed95c097b (#30357)
  [ML][TEST] Clean up jobs in ModelPlotIT
  Watcher: Ensure trigger service pauses execution (#30363)
  [DOCS] Fixes ordering of changelog sections
  [DOCS] Commented out empty sections in the changelog to fix the doc build. (#30372)
  Make RepositoriesMetaData contents unmodifiable (#30361)
  Change signature of Get Repositories Response (#30333)
  6.x Backport: Terms query validate bug  (#30319)
  InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (#30121)
  Security: reduce garbage during index resolution (#30180)
  Test: use trial license in qa tests with security
  [ML] Add integration test for model plots (#30359)
  SQL: Fix bug caused by empty composites (#30343)
  [ML] Account for gaps in data counts after job is reopened (#30294)
  [ML] Refactor DataStreamDiagnostics to use array (#30129)
  Make licensing FIPS-140 compliant (#30251)
  Do not load global state when deleting a snapshot (#29278)
  Don't load global state when only restoring indices (#29239)
  Tests: Use different watch ids per test in smoke test (#30331)
  Watcher: Make start/stop cycle more predictable and synchronous (#30118)
  [Docs] Add term query with normalizer example
  Adds Eclipse config for xpack licence headers (#30299)
  Fix message content in users tool (#30293)
  [DOCS] Removed X-Pack breaking changes page
  [DOCS] Added security breaking change
  [DOCS] Fixes link to TLS LDAP info
  [DOCS] Merges X-Pack release notes into changelog (#30350)
  [DOCS] Fixes broken links to bootstrap user (#30349)
  [Docs] Remove errant changelog line
  Fix NPE when CumulativeSum agg encounters null/empty bucket (#29641)
  [DOCS] Reorganizes authentication details in Stack Overview (#30280)
  Tests: Simplify VersionUtils released version splitting (#30322)
  Fix merging logic of Suggester Options (#29514)
  ReplicationTracker.markAllocationIdAsInSync may hang if allocation is cancelled (#30316)
  [DOCS] Adds LDAP realm configuration details (#30214)
  [DOCS] Adds native realm configuration details (#30215)
  Disable SSL on testing old BWC nodes (#30337)
  [DOCS] Enables edit links for X-Pack pages
  Cancelling a peer recovery on the source can leak a primary permit (#30318)
  SQL: Reduce number of ranges generated for comparisons (#30267)
  [DOCS] Adds links to changelog sections
  Convert server javadoc to html5 (#30279)
  REST Client: Add Request object flavored methods (#29623)
  Create default ES_TMPDIR on Windows (#30325)
  [Docs] Clarify `fuzzy_like_this` redirect (#30183)
  Fix docs of the `_ignored` meta field.
  Add a new `_ignored` meta field. (#29658)
  Move repository-azure fixture test to QA project (#30253)
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants