Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-8903: Add LatLonShape point query #762

Merged
merged 13 commits into from
Jan 15, 2020

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Jul 4, 2019

Adds a query to LatLonShape that filters by a provided point.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much does it help? :)

@@ -94,9 +94,18 @@ private LatLonShape() {
return new Field[] {new LatLonTriangle(fieldName, lat, lon, lat, lon, lat, lon)};
}

/** create a query to find all indexed shapes that comply the {@link QueryRelation} with the provided point
**/
public static Query newPointQuery(String field, QueryRelation queryRelation, double lat, double lon) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need a relation or can we just assume INTERSECTS?

Copy link
Contributor Author

@iverase iverase Jul 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, I would keep it like that for two reasons:

  • Keep it consistent with all the other queries in LatLonShape

  • You can build queries like give me all my shapes that do not contain this point. For Within it becomes a term query matching all indexed point which encoded value are equal to the encoded value of the query.

@@ -147,6 +147,11 @@ protected Query newRectQuery(String field, QueryRelation queryRelation, double m
return LatLonShape.newBoxQuery(field, queryRelation, minLat, maxLat, minLon, maxLon);
}

/** factory method to create a new bounding box query */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/bounding box/point/

} else {
System.out.println(b.toString());
fail = true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftovers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems all this verify functions contain this piece of logic. I guess they were introduced for debugging so I don't want to change it in this PR.

@iverase
Copy link
Contributor Author

iverase commented Jul 5, 2019

In the dataset I am working with I get speed ups of around 10%:

||op||Approach||Shape||M hits/sec      ||QPS            ||Hit count      ||
                 ||Dev||Base ||Diff||Dev||Base||Diff||Dev||Base||Diff||
|intersects|bkd|point|39.89|35.52|12%|49.71|44.26|12%|180557100|180557100| 0%|

@iverase
Copy link
Contributor Author

iverase commented Jul 5, 2019

Going a bit more in detail, I wrote the following test to prove a difference between LatLonPoint and LatLonShape:

 public void testPointShapeEquivalence() throws Exception {
    Directory dir = newDirectory();
    RandomIndexWriter writer = new RandomIndexWriter(random(), dir);
    Document document = new Document();
    BaseLatLonShapeTestCase.Point p = (BaseLatLonShapeTestCase.Point) BaseLatLonShapeTestCase.ShapeType.POINT.nextShape();
    Polygon polygon = GeoTestUtil.nextPolygon();

    Field[] fields = LatLonShape.createIndexableFields("LatLonShape", polygon);
    for (Field f : fields) {
      document.add(f);
    }
    LatLonPoint latLonPoint = new LatLonPoint("LatLonPoint", p.lat, p.lon);
    document.add(latLonPoint);
    writer.addDocument(document);

    //// search
    IndexReader r = writer.getReader();
    writer.close();
    IndexSearcher s = newSearcher(r);

    // search both and check same result
    Query q1 = LatLonShape.newPointQuery("LatLonShape", QueryRelation.INTERSECTS, p.lat, p.lon);
    Query q2 = LatLonPoint.newPolygonQuery("LatLonPoint", polygon);
    assertEquals(s.count(q1), s.count(q2));
    IOUtils.close(r, dir);
  }

The test basically index a shape and make a point query and at the same time it index the same point a make a polygon query with the same polygon. It will eventually fail because in LatLonShape, both polygon and point work on the encoding space. On the other hand for LatLonPoint, the query polygon does not work on the encoding space.

@nknize
Copy link
Member

nknize commented Jul 8, 2019

I think this is duplicate of LUCENE-8670 which I opened and posted a patch back at the end of January. If I remember right, the only reason we were holding off on that patch was because querying by MULTIPOINT (array of points) were done brute force and we had discussed ways of speeding it up using a simple in memory R tree. It looks like this is a slimmed down version that only accepts a single point. Perhaps we iterate on LUCENE-8670 and improve this query for multiple points?

@iverase
Copy link
Contributor Author

iverase commented Jul 9, 2019

I think that can be solved by #770 (LUCENE-8746). This is actually what is proposed, to create an R-tree structure when creating a multi-shape.

# Conflicts:
#	lucene/sandbox/src/java/org/apache/lucene/document/LatLonShape.java
#	lucene/sandbox/src/test/org/apache/lucene/document/BaseLatLonShapeTestCase.java
# Conflicts:
#	lucene/sandbox/src/java/org/apache/lucene/document/LatLonShape.java
#	lucene/sandbox/src/test/org/apache/lucene/document/BaseLatLonShapeTestCase.java
# Conflicts:
#	lucene/sandbox/src/java/org/apache/lucene/document/LatLonShape.java
Copy link
Member

@nknize nknize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

//This should be moved when LatLonShape is moved from sandbox!
/**
* Compute whether the given x, y point is in a triangle; uses the winding order method */
private static boolean pointInTriangle (double x, double y, double ax, double ay, double bx, double by, double cx, double cy) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate of?

public static boolean pointInTriangle (double x, double y, double ax, double ay, double bx, double by, double cx, double cy) {

@iverase iverase merged commit ff365a0 into apache:master Jan 15, 2020
asfgit pushed a commit that referenced this pull request Jan 15, 2020
@iverase iverase deleted the LatLonShapePointQuery branch February 7, 2020 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants