Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-8746: Refactor EdgeTree #878

Merged
merged 21 commits into from
Oct 14, 2019
Merged

LUCENE-8746: Refactor EdgeTree #878

merged 21 commits into from
Oct 14, 2019

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Sep 13, 2019

Another try in refactoring edge tree. This PR splits Edge Tree class into two and adds a new interface:

  • Component2D: Interface defining an object that knows its bounding box and can perform some spatial operations.
  • ComponentTree: An interval tree containing the different components (e.g polygon or line)
  • EdgeTree: An interval tree containing the edges of a components (polygon or line edges)

Unfortunately the PR touches quite a lots of files but most of them are test files. Running benchmark for points results look good, points shows same performance and there is an increase of performance for shapes (we are not computing the bounding box of the triangle many times).

Approach Shape M hits/sec Dev M hits/sec Base Diff QPS Dev QPS Base Diff Hit count Dev Hit count Base Diff
points polyRussia 13.97 13.98 -0% 3.98 3.99 -0% 3508846 3508846 0%
points poly 10 73.35 71.84 2% 46.39 45.43 2% 355809475 355809475 0%
points polyMedium 8.73 8.67 1% 106.99 106.24 1% 2693559 2693559 0%
shapes polyRussia 6.88 5.73 20% 1.96 1.63 20% 3508846 3508846 0%
shapes poly 10 27.98 27.03 4% 17.69 17.10 4% 355809475 355809475 0%
shapes polyMedium 2.79 2.75 2% 34.19 33.66 2% 2693559 2693559 0%

@nknize
Copy link
Member

nknize commented Oct 8, 2019

I checked out the PR and got the following test failures:

NOTE: reproduce with: ant test  -Dtestcase=TestLatLonPointQueries -Dtests.method=testAllLatEqual -Dtests.seed=DF1A4B86CC81C3AB -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=seh-MZ -Dtests.timezone=Antarctica/Syowa -Dtests.asserts=true -Dtests.file.encoding=UTF-8

NOTE: reproduce with: ant test  -Dtestcase=TestLatLonPointQueries -Dtests.method=testSmallSetPolyWholeMap -Dtests.seed=DF1A4B86CC81C3AB -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=seh-MZ -Dtests.timezone=Antarctica/Syowa -Dtests.asserts=true -Dtests.file.encoding=UTF-8

NOTE: reproduce with: ant test  -Dtestcase=TestLatLonPointQueries -Dtests.method=testLowCardinality -Dtests.seed=DF1A4B86CC81C3AB -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=seh-MZ -Dtests.timezone=Antarctica/Syowa -Dtests.asserts=true -Dtests.file.encoding=UTF-8

NOTE: reproduce with: ant test  -Dtestcase=TestLatLonPointQueries -Dtests.method=testAllLonEqual -Dtests.seed=DF1A4B86CC81C3AB -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=seh-MZ -Dtests.timezone=Antarctica/Syowa -Dtests.asserts=true -Dtests.file.encoding=UTF-8

NOTE: reproduce with: ant test  -Dtestcase=TestLatLonPointQueries -Dtests.method=testRandomMedium -Dtests.seed=DF1A4B86CC81C3AB -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=seh-MZ -Dtests.timezone=Antarctica/Syowa -Dtests.asserts=true -Dtests.file.encoding=UTF-8

NOTE: reproduce with: ant test  -Dtestcase=TestLatLonPointQueries -Dtests.method=testRandomTiny -Dtests.seed=DF1A4B86CC81C3AB -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=seh-MZ -Dtests.timezone=Antarctica/Syowa -Dtests.asserts=true -Dtests.file.encoding=UTF-8

@iverase
Copy link
Contributor Author

iverase commented Oct 8, 2019

@nknize totally right, some bug I introduce when trying to improve it. Now it should be fixed.

Copy link
Member

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a first go and have a few thoughts / suggestions...

}
};
}

@Override
public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) throws IOException {

final Component2D tree = Polygon2D.create(polygons);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we push these variables into getIntersectVisitor? That would delay creating the bounding box, Component2D, and PolygonPredicate objects until the visitor is needed and save unnecessary computation when no docs contain point fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above suggestion: I think it makes the code here a bit cleaner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should do that. I think we create one ScorerSupplier per segment so moving that logic to the IntersectVisitor means that we will creating this objects per segment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks for reminding me. I ran into that same problem ages ago as well.

@@ -84,7 +84,7 @@ public void visit(QueryVisitor visitor) {
}
}

private IntersectVisitor getIntersectVisitor(DocIdSetBuilder result, Polygon2D tree, GeoEncodingUtils.PolygonPredicate polygonPredicate,
private IntersectVisitor getIntersectVisitor(DocIdSetBuilder result, Component2D tree, GeoEncodingUtils.PolygonPredicate polygonPredicate,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private IntersectVisitor getIntersectVisitor(DocIdSetBuilder result, Component2D tree, GeoEncodingUtils.PolygonPredicate polygonPredicate,
private IntersectVisitor getIntersectVisitor(DocIdSetBuilder result) {

@@ -84,7 +84,7 @@ public void visit(QueryVisitor visitor) {
}
}

private IntersectVisitor getIntersectVisitor(DocIdSetBuilder result, Polygon2D tree, GeoEncodingUtils.PolygonPredicate polygonPredicate,
private IntersectVisitor getIntersectVisitor(DocIdSetBuilder result, Component2D tree, GeoEncodingUtils.PolygonPredicate polygonPredicate,
byte[] minLat, byte[] maxLat, byte[] minLon, byte[] maxLon) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
byte[] minLat, byte[] maxLat, byte[] minLon, byte[] maxLon) {
final Component2D tree = Polygon2D.create(polygons);
final GeoEncodingUtils.PolygonPredicate polygonPredicate = GeoEncodingUtils.createComponentPredicate(tree);
// bounding box over all polygons, this can speed up tree intersection/cheaply improve approximation for complex multi-polygons
final byte minLat[] = new byte[Integer.BYTES];
final byte maxLat[] = new byte[Integer.BYTES];
final byte minLon[] = new byte[Integer.BYTES];
final byte maxLon[] = new byte[Integer.BYTES];
NumericUtils.intToSortableBytes(encodeLatitude(tree.getMinY()), minLat, 0);
NumericUtils.intToSortableBytes(encodeLatitude(tree.getMaxY()), maxLat, 0);
NumericUtils.intToSortableBytes(encodeLongitude(tree.getMinX()), minLon, 0);
NumericUtils.intToSortableBytes(encodeLongitude(tree.getMaxX()), maxLon, 0);

lucene/core/src/java/org/apache/lucene/geo/EdgeTree.java Outdated Show resolved Hide resolved
lucene/core/src/java/org/apache/lucene/geo/EdgeTree.java Outdated Show resolved Hide resolved
lucene/core/src/java/org/apache/lucene/geo/EdgeTree.java Outdated Show resolved Hide resolved
lucene/core/src/java/org/apache/lucene/geo/EdgeTree.java Outdated Show resolved Hide resolved
lucene/core/src/java/org/apache/lucene/geo/EdgeTree.java Outdated Show resolved Hide resolved
@iverase
Copy link
Contributor Author

iverase commented Oct 14, 2019

@nkine do you think we can move this forward?

Copy link
Member

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @iverase

@iverase iverase merged commit 68a3886 into apache:master Oct 14, 2019
asfgit pushed a commit that referenced this pull request Oct 14, 2019
Introduce a Component tree that represents the tree of components (e.g polygons).
 Edge tree is now just a tree of edges.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants