Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor GeoPoint and GeoShape with generics #89388

Merged
merged 9 commits into from
Sep 7, 2022

Conversation

craigtaverner
Copy link
Contributor

In preparation for supporting CartesianPoint and CartesianShape
in aggregations, this PR adds a common interface between GeoPoint
and CartesianPoint, and then uses that to split out some key common
code that will be used in CartesianPoint and CartesianShape aggregations

@craigtaverner craigtaverner added :Analytics/Geo Indexing, search aggregations of geo points and shapes :Analytics/Aggregations Aggregations >refactoring >tech debt labels Aug 16, 2022
@craigtaverner craigtaverner requested a review from iverase August 16, 2022 14:51
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 16, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@craigtaverner craigtaverner force-pushed the elastic_point_generics branch 3 times, most recently from 7d30449 to 38e4dcb Compare August 17, 2022 14:03
Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass on the points part and there is one thing I think we can improve. We are currently mixing geo classes with generic classes which makes the code confusing.

For example AbstractLeafGeoPointFieldData is using internally MultiPointValues and I think it should be using MultiGeoPointValues instead. I think the trick is not to use the generics extending ElasticPoint but using the generic extending MultiPointValues<?>. like:

public abstract class LeafPointFieldData<T extends MultiPointValues<?>> implements LeafFieldData {

    /**
     * Return geo-point or point values.
     */
    public abstract T getPointValues();

    /**
     * Return the internal representation of geo_point or point doc values as a {@link SortedNumericDocValues}.
     * A point is encoded as a long that can be decoded by using
     * {@link org.elasticsearch.common.geo.GeoPoint#resetFromEncoded(long)}
     */
    public abstract SortedNumericDocValues getSortedNumericDocValues();

}

I think this will simplify quite a bit things and it will prevent some unnecessary castings.
This applies to other classes like IndexPointFieldData and AbstractPointIndexFieldData.

@@ -744,9 +744,9 @@ public final MultiGeoPointValues geoPointValues(LeafReaderContext context) {

public static class Fielddata extends GeoPoint {

protected final IndexGeoPointFieldData indexFieldData;
protected final IndexPointFieldData<org.elasticsearch.common.geo.GeoPoint> indexFieldData;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot we use here IndexGeoPointFieldData instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is resolved by your commit at 26b11a6

* To facilitate maximizing the use of common code between GeoPoint and projected CRS
* we introduced this ElasticPoint as an interface of commonality.
*/
public interface ElasticPoint {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rename this interface to represent better what it does. Maybe Point2d or BasePoint or SpatialPoint?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

There's nothing elastic-specific about this interface, as far as I can see, so it should have a broader name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let's go for SpatialPoint. I think BasePoint sounds more like an abstract class than an interface, and Point2d sounds too much like the 2d classes in Lucene, which this is not related to at all.

@@ -118,11 +118,9 @@ public String toString() {
StringBuilder buf = new StringBuilder("[");
boolean first = true;
for (E e : enums) {
if (first == false) buf.append(", ");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not to belong to this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++. Stray automatic refactoring? I think the change is okay, but confusing that it's being done here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, ha! This was part of an earlier PR that Ignacio also pointed out was not relevant to that PR, so I pulled it into a separate, isolated PR at #88505. But that has yet to be approved. Perhaps you could review it, and this trivial change can stop popping up in my PRs? 😸

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least it is removed from this PR now.

GeoPoint bottomRight = new GeoPoint(bottom, right);
return new GeoBoundingBox(topLeft, bottomRight);
GeoBoundsParser bounds = new GeoBoundsParser(parser);
return (GeoBoundingBox) bounds.parseBoundingBox();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this casting not avoidable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example making the parser like: protected abstract static class BoundsParser<T extends BoundingBox<?>>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Done in c8b9ee3

@@ -53,7 +53,7 @@ public final boolean isFloatingPoint() {
@Override
public final SortedNumericDocValues longValues(LeafReaderContext ctx) {
final MultiGeoPointValues multiGeoPointValues = valuesSource.geoPointValues(ctx);
final GeoPointValues values = org.elasticsearch.index.fielddata.FieldData.unwrapSingleton(multiGeoPointValues);
final GeoPointValues values = (GeoPointValues) org.elasticsearch.index.fielddata.FieldData.unwrapSingleton(multiGeoPointValues);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this casting necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is resolved by your commit at 26b11a6

final GeoPoint... fromPoints
) {
final GeoPointValues singleValues = FieldData.unwrapSingleton(geoPointValues);
final GeoPointValues singleValues = (GeoPointValues) FieldData.unwrapSingleton(geoPointValues);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this casting necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is resolved by your commit at 26b11a6


public AbstractLeafGeoPointFieldData(ToScriptFieldFactory<MultiGeoPointValues> toScriptFieldFactory) {
public AbstractLeafGeoPointFieldData(ToScriptFieldFactory<MultiPointValues<GeoPoint>> toScriptFieldFactory) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if at this level we were using MultiGeoPointValues instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is resolved by your commit at 26b11a6

@@ -22,7 +23,7 @@ final class LatLonPointDVLeafFieldData extends AbstractLeafGeoPointFieldData {
private final LeafReader reader;
private final String fieldName;

LatLonPointDVLeafFieldData(LeafReader reader, String fieldName, ToScriptFieldFactory<MultiGeoPointValues> toScriptFieldFactory) {
LatLonPointDVLeafFieldData(LeafReader reader, String fieldName, ToScriptFieldFactory<MultiPointValues<GeoPoint>> toScriptFieldFactory) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if at this level we were using MultiGeoPointValues instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is resolved by your commit at 26b11a6

ValuesSourceType valuesSourceType,
ToScriptFieldFactory<MultiPointValues<GeoPoint>> toScriptFieldFactory
) {
super(fieldName, valuesSourceType, toScriptFieldFactory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if at this level we were using MultiGeoPointValues instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is resolved by your commit at 26b11a6

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly okay to me, but the Writable ValuesSourceType is extremely concerning. If it needs to be there, let's talk, and if it doesn't need to be there, I'll approve once it's dropped.

static final ParseField LEFT_FIELD = new ParseField("left");
static final ParseField RIGHT_FIELD = new ParseField("right");
static final ParseField WKT_FIELD = new ParseField("wkt");
public static final ParseField BOUNDS_FIELD = new ParseField("bounds");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's obvious to folks who work with this code a lot, but I'm unclear why some of these are public and others aren't. I'd appreciate a comment. (I see that it was like this in GeoBoundingBox where you relocated this from, so it's possible the reasoning is lost to the mists of time.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I look at the usages of these, it seems the public ones are used by classes outside this package (production code as well as test code). In particular ParsedGeoBounds uses these for parsing bounds used in aggregations.

So I think the root cause is that there are multiple queries that have a partially shared concept of geo bounds. Ideally it would be nice to imagine them all having the same, single, implementation. But that could be a behaviour change, which I think is certainly out of scope for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think we need to change it. I just want a comment saying it's on purpose and why.

* To facilitate maximizing the use of common code between GeoPoint and projected CRS
* we introduced this ElasticPoint as an interface of commonality.
*/
public interface ElasticPoint {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

There's nothing elastic-specific about this interface, as far as I can see, so it should have a broader name.

builder.field(LAT_FIELD.getPreferredName(), topLeft.lat());
builder.field(LON_FIELD.getPreferredName(), topLeft.lon());
builder.field(LAT_FIELD.getPreferredName(), topLeft.getY());
builder.field(LON_FIELD.getPreferredName(), topLeft.getX());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to think that someone in the future might get confused if longitude maps to X or Y? Should we have some static function that wraps this logic? Right now, we replicate this knowledge in a few spots (e.g. the GeoPoint constructor below), so maybe there's some sense in putting that mapping in one place?

I feel like it's probably obvious to folks who work with this stuff, but I don't trust that things which are obvious today will be obvious in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 100%. I want to move in a direction that isolates this kind of mapping to a single place. Right now there are many, many places that depend on secret knowledge of the mapping and the correct order (for example some methods take (double y, double x) and others take (double x, double y). In the earlier Elastic code (and earlier Lucene code), y, x was the norm, but over time things have swung over to x, y being much more common. What I'm keen to do is iteratively minimize the differences. Some earlier PR's I did in the Spring already downplayed the y, x cases (in code and documentation), but that was a small step. This PR here now actually does improve things quite a bit (IMHO); but it also brings some things into the light, like the one you observe here. I think more can (and should) be done, but I do not want to change too much in one go. So this PR priorizes the changes required to enable the work on Cartesian, but does not do everything that could be done. This feels like a safer compromise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the creation of the SpatialPoint (neé ElasticPoint) interface leads to one option for where we could place this kind of knowledge. But I think we should think more about that before actually taking that step. There could be better options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'm happy to keep this on the roadmap and address it in a future PR after more thought.


long count();
}
public interface GeoCentroid extends CentroidAggregation {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this interface still necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was because I wrote the CartesianCentroid stuff, and then Ignacio wanted the pure refactoring parts pulled out. The Cartesian work is in a separate PR on top of this work. If I move this change into that PR it will cause that PR to contain a lot of test code changes that are geo-specific (not cartesian). My goal is for this PR to contain only refactoring (moving Geo-specific code around) and the other PR to contain primarily Cartesian specific stuff (minimize Geo changes). So for the purpose of that split, this is a good change :-)


protected abstract ElasticPoint centroidFromStream(StreamInput in) throws IOException;

protected abstract void centroidToStream(StreamOutput out) throws IOException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like for abstract functions to have some java doc explaining what implementations need to do. In this particular case, I'm unclear what these are for since we already have a stream constructor and a doWriteTo method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods encapsulate the parts of the Geo vs. Cartesian that differ. In particular this is the y, x versus x, y we discussed in another comment. This move is a step towards trying to get that particular issue isolated. I think more can be done (eg. moving that knowledge into the SpatialPoint or some other class), but this step feels like a step in the right direction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think they're fine, I just want them to have javadoc.

import org.elasticsearch.xpack.spatial.index.fielddata.IndexShapeFieldData;
import org.elasticsearch.xpack.spatial.index.fielddata.ShapeValues;

public abstract class AbstractShapeIndexFieldData implements IndexShapeFieldData {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc please, ideally with some hints as to why and how one would build a concrete implementation of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Added some.

XFieldComparatorSource.Nested nested,
boolean reverse
) {
throw new IllegalArgumentException("can't sort on geo_shape field without using specific sorting feature, like geo_distance");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it correct to specify geo_shape and geo_distance here? might this abstraction have a non-geo specialization down the road? I would prefer if it got the field name from the concrete instance, and didn't offer a suggestion for the sorting feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the exception into the extending class, where the specific names have meaning.

int bucketSize,
BucketedSort.ExtraData extra
) {
throw new IllegalArgumentException("can't sort on geo_shape field without using specific sorting feature, like geo_distance");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I'd prefer not to hard code the type names in the error message. Also, with the repeated error message here, consider centralizing it to AggregationErrors - I'm trying to make that a thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a minimal change, but not yet made use of AggregationErrors. I'll read up on that and learn about it.


import java.io.IOException;

public abstract class ShapeValuesProvider extends ValuesSource {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we calling this a ValuesProvider instead of a ValuesSource? I really don't want to add another distinction of terms to an already confusing part of the code unless we're very clear what the new terminology means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea. Seems wrong to me too. I've fixed it in 33cb378.


import java.io.IOException;

public abstract class ShapeValuesSourceType implements Writeable, ValuesSourceType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be Writeable? ValuesSourceType implementations are intended to be stateless, which means they should never need to be written. It's actually quite concerning to me if we need a writeable values source type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I traced this back and it was introduced by Tal Levy in 2020 as:

public class GeoShapeValuesSourceType implements Writeable, ValuesSourceType

I do not know why it was writable. My vote is we investigate that separate from this PR because my goal with this PR was to leave behaviour as unchanged as possible, just moving around code to enable the Cartesian work.

I notice that the writeTo method is empty and it would be interesting to simply remove it and see what PR building does. My default thinking is to save that for another PR, but if you have a strong opinion here, we can do it here directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I thought a CI test of this would be a good idea. I pushed that in 541e47f. Let's see how it goes and then make a decision.

craigtaverner and others added 3 commits September 5, 2022 17:11
In preparation for supporting CartesianPoint and CartesianShape
in aggregations, this PR adds a common interface between GeoPoint
and CartesianPoint, and then uses that to split out some key common
code that will be used in CartesianPoint and CartesianShape aggregations
It extends ValuesSource, and is extended by GeoShapeValuesSource.
There is no reason for the suffix `Provider`.
throw new IllegalArgumentException("can't sort on geo_shape field without using specific sorting feature, like geo_distance");
}

public static class GeoBuilder implements IndexFieldData.Builder {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This builder is not needed, we can just use a lambda.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted!

Based on Ignacio's work in 050df95,
we fix the BoundingBox generics, and also add a little more specificity
to the previous generics (replace <?> with <? extends SpatialPoint>).
builder.endObject();
return builder;
}

public XContentBuilder toXContentFragment(XContentBuilder builder, boolean buildLatLonFields) throws IOException {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of buildLatLonFields is true almost always, and is also something geo-specific. We should get rid of it and have another method for the false case that only exists in the geo-implementation.

return builder;
}

public abstract XContentBuilder toXContentFragment(XContentBuilder builder, boolean buildLatLonFields) throws IOException;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildLatLonField sounds very geo-specific. We should be able to get rid of this for now.

static final ParseField RIGHT_FIELD = new ParseField("right");
static final ParseField WKT_FIELD = new ParseField("wkt");
public static final ParseField BOUNDS_FIELD = new ParseField("bounds");
public static final ParseField LAT_FIELD = new ParseField("lat");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LAT and LON are geo-specific and should be in GeoBoundingBox

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. As noted, there's follow up work, but this looks like a big improvement to me, so let's get it in. Thanks for taking this on!

@craigtaverner craigtaverner merged commit b68d62b into elastic:main Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations :Analytics/Geo Indexing, search aggregations of geo points and shapes >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >tech debt v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants