-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor GeoPoint and GeoShape with generics #89388
Refactor GeoPoint and GeoShape with generics #89388
Conversation
Pinging @elastic/es-analytics-geo (Team:Analytics) |
7d30449
to
38e4dcb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first pass on the points part and there is one thing I think we can improve. We are currently mixing geo classes with generic classes which makes the code confusing.
For example AbstractLeafGeoPointFieldData
is using internally MultiPointValues and I think it should be using MultiGeoPointValues
instead. I think the trick is not to use the generics extending ElasticPoint but using the generic extending MultiPointValues<?>. like:
public abstract class LeafPointFieldData<T extends MultiPointValues<?>> implements LeafFieldData {
/**
* Return geo-point or point values.
*/
public abstract T getPointValues();
/**
* Return the internal representation of geo_point or point doc values as a {@link SortedNumericDocValues}.
* A point is encoded as a long that can be decoded by using
* {@link org.elasticsearch.common.geo.GeoPoint#resetFromEncoded(long)}
*/
public abstract SortedNumericDocValues getSortedNumericDocValues();
}
I think this will simplify quite a bit things and it will prevent some unnecessary castings.
This applies to other classes like IndexPointFieldData and AbstractPointIndexFieldData.
@@ -744,9 +744,9 @@ public final MultiGeoPointValues geoPointValues(LeafReaderContext context) { | |||
|
|||
public static class Fielddata extends GeoPoint { | |||
|
|||
protected final IndexGeoPointFieldData indexFieldData; | |||
protected final IndexPointFieldData<org.elasticsearch.common.geo.GeoPoint> indexFieldData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot we use here IndexGeoPointFieldData instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is resolved by your commit at 26b11a6
* To facilitate maximizing the use of common code between GeoPoint and projected CRS | ||
* we introduced this ElasticPoint as an interface of commonality. | ||
*/ | ||
public interface ElasticPoint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should rename this interface to represent better what it does. Maybe Point2d or BasePoint or SpatialPoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
There's nothing elastic-specific about this interface, as far as I can see, so it should have a broader name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let's go for SpatialPoint
. I think BasePoint
sounds more like an abstract class than an interface, and Point2d
sounds too much like the 2d classes in Lucene, which this is not related to at all.
@@ -118,11 +118,9 @@ public String toString() { | |||
StringBuilder buf = new StringBuilder("["); | |||
boolean first = true; | |||
for (E e : enums) { | |||
if (first == false) buf.append(", "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems not to belong to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++. Stray automatic refactoring? I think the change is okay, but confusing that it's being done here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, ha! This was part of an earlier PR that Ignacio also pointed out was not relevant to that PR, so I pulled it into a separate, isolated PR at #88505. But that has yet to be approved. Perhaps you could review it, and this trivial change can stop popping up in my PRs? 😸
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least it is removed from this PR now.
GeoPoint bottomRight = new GeoPoint(bottom, right); | ||
return new GeoBoundingBox(topLeft, bottomRight); | ||
GeoBoundsParser bounds = new GeoBoundsParser(parser); | ||
return (GeoBoundingBox) bounds.parseBoundingBox(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this casting not avoidable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example making the parser like: protected abstract static class BoundsParser<T extends BoundingBox<?>>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Done in c8b9ee3
@@ -53,7 +53,7 @@ public final boolean isFloatingPoint() { | |||
@Override | |||
public final SortedNumericDocValues longValues(LeafReaderContext ctx) { | |||
final MultiGeoPointValues multiGeoPointValues = valuesSource.geoPointValues(ctx); | |||
final GeoPointValues values = org.elasticsearch.index.fielddata.FieldData.unwrapSingleton(multiGeoPointValues); | |||
final GeoPointValues values = (GeoPointValues) org.elasticsearch.index.fielddata.FieldData.unwrapSingleton(multiGeoPointValues); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this casting necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is resolved by your commit at 26b11a6
final GeoPoint... fromPoints | ||
) { | ||
final GeoPointValues singleValues = FieldData.unwrapSingleton(geoPointValues); | ||
final GeoPointValues singleValues = (GeoPointValues) FieldData.unwrapSingleton(geoPointValues); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this casting necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is resolved by your commit at 26b11a6
|
||
public AbstractLeafGeoPointFieldData(ToScriptFieldFactory<MultiGeoPointValues> toScriptFieldFactory) { | ||
public AbstractLeafGeoPointFieldData(ToScriptFieldFactory<MultiPointValues<GeoPoint>> toScriptFieldFactory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if at this level we were using MultiGeoPointValues instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is resolved by your commit at 26b11a6
@@ -22,7 +23,7 @@ final class LatLonPointDVLeafFieldData extends AbstractLeafGeoPointFieldData { | |||
private final LeafReader reader; | |||
private final String fieldName; | |||
|
|||
LatLonPointDVLeafFieldData(LeafReader reader, String fieldName, ToScriptFieldFactory<MultiGeoPointValues> toScriptFieldFactory) { | |||
LatLonPointDVLeafFieldData(LeafReader reader, String fieldName, ToScriptFieldFactory<MultiPointValues<GeoPoint>> toScriptFieldFactory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if at this level we were using MultiGeoPointValues instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is resolved by your commit at 26b11a6
ValuesSourceType valuesSourceType, | ||
ToScriptFieldFactory<MultiPointValues<GeoPoint>> toScriptFieldFactory | ||
) { | ||
super(fieldName, valuesSourceType, toScriptFieldFactory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if at this level we were using MultiGeoPointValues instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is resolved by your commit at 26b11a6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks mostly okay to me, but the Writable ValuesSourceType is extremely concerning. If it needs to be there, let's talk, and if it doesn't need to be there, I'll approve once it's dropped.
static final ParseField LEFT_FIELD = new ParseField("left"); | ||
static final ParseField RIGHT_FIELD = new ParseField("right"); | ||
static final ParseField WKT_FIELD = new ParseField("wkt"); | ||
public static final ParseField BOUNDS_FIELD = new ParseField("bounds"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's obvious to folks who work with this code a lot, but I'm unclear why some of these are public and others aren't. I'd appreciate a comment. (I see that it was like this in GeoBoundingBox
where you relocated this from, so it's possible the reasoning is lost to the mists of time.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I look at the usages of these, it seems the public ones are used by classes outside this package (production code as well as test code). In particular ParsedGeoBounds
uses these for parsing bounds used in aggregations.
So I think the root cause is that there are multiple queries that have a partially shared concept of geo bounds. Ideally it would be nice to imagine them all having the same, single, implementation. But that could be a behaviour change, which I think is certainly out of scope for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think we need to change it. I just want a comment saying it's on purpose and why.
* To facilitate maximizing the use of common code between GeoPoint and projected CRS | ||
* we introduced this ElasticPoint as an interface of commonality. | ||
*/ | ||
public interface ElasticPoint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
There's nothing elastic-specific about this interface, as far as I can see, so it should have a broader name.
builder.field(LAT_FIELD.getPreferredName(), topLeft.lat()); | ||
builder.field(LON_FIELD.getPreferredName(), topLeft.lon()); | ||
builder.field(LAT_FIELD.getPreferredName(), topLeft.getY()); | ||
builder.field(LON_FIELD.getPreferredName(), topLeft.getX()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to think that someone in the future might get confused if longitude maps to X or Y? Should we have some static function that wraps this logic? Right now, we replicate this knowledge in a few spots (e.g. the GeoPoint
constructor below), so maybe there's some sense in putting that mapping in one place?
I feel like it's probably obvious to folks who work with this stuff, but I don't trust that things which are obvious today will be obvious in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree 100%. I want to move in a direction that isolates this kind of mapping to a single place. Right now there are many, many places that depend on secret knowledge of the mapping and the correct order (for example some methods take (double y, double x)
and others take (double x, double y)
. In the earlier Elastic code (and earlier Lucene code), y, x
was the norm, but over time things have swung over to x, y
being much more common. What I'm keen to do is iteratively minimize the differences. Some earlier PR's I did in the Spring already downplayed the y, x
cases (in code and documentation), but that was a small step. This PR here now actually does improve things quite a bit (IMHO); but it also brings some things into the light, like the one you observe here. I think more can (and should) be done, but I do not want to change too much in one go. So this PR priorizes the changes required to enable the work on Cartesian, but does not do everything that could be done. This feels like a safer compromise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the creation of the SpatialPoint
(neé ElasticPoint
) interface leads to one option for where we could place this kind of knowledge. But I think we should think more about that before actually taking that step. There could be better options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'm happy to keep this on the roadmap and address it in a future PR after more thought.
|
||
long count(); | ||
} | ||
public interface GeoCentroid extends CentroidAggregation {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this interface still necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was because I wrote the CartesianCentroid stuff, and then Ignacio wanted the pure refactoring parts pulled out. The Cartesian work is in a separate PR on top of this work. If I move this change into that PR it will cause that PR to contain a lot of test code changes that are geo-specific (not cartesian). My goal is for this PR to contain only refactoring (moving Geo-specific code around) and the other PR to contain primarily Cartesian specific stuff (minimize Geo changes). So for the purpose of that split, this is a good change :-)
|
||
protected abstract ElasticPoint centroidFromStream(StreamInput in) throws IOException; | ||
|
||
protected abstract void centroidToStream(StreamOutput out) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like for abstract functions to have some java doc explaining what implementations need to do. In this particular case, I'm unclear what these are for since we already have a stream constructor and a doWriteTo
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods encapsulate the parts of the Geo vs. Cartesian that differ. In particular this is the y, x
versus x, y
we discussed in another comment. This move is a step towards trying to get that particular issue isolated. I think more can be done (eg. moving that knowledge into the SpatialPoint
or some other class), but this step feels like a step in the right direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think they're fine, I just want them to have javadoc.
import org.elasticsearch.xpack.spatial.index.fielddata.IndexShapeFieldData; | ||
import org.elasticsearch.xpack.spatial.index.fielddata.ShapeValues; | ||
|
||
public abstract class AbstractShapeIndexFieldData implements IndexShapeFieldData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Javadoc please, ideally with some hints as to why and how one would build a concrete implementation of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Added some.
XFieldComparatorSource.Nested nested, | ||
boolean reverse | ||
) { | ||
throw new IllegalArgumentException("can't sort on geo_shape field without using specific sorting feature, like geo_distance"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it correct to specify geo_shape
and geo_distance
here? might this abstraction have a non-geo specialization down the road? I would prefer if it got the field name from the concrete instance, and didn't offer a suggestion for the sorting feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the exception into the extending class, where the specific names have meaning.
int bucketSize, | ||
BucketedSort.ExtraData extra | ||
) { | ||
throw new IllegalArgumentException("can't sort on geo_shape field without using specific sorting feature, like geo_distance"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I'd prefer not to hard code the type names in the error message. Also, with the repeated error message here, consider centralizing it to AggregationErrors
- I'm trying to make that a thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a minimal change, but not yet made use of AggregationErrors
. I'll read up on that and learn about it.
|
||
import java.io.IOException; | ||
|
||
public abstract class ShapeValuesProvider extends ValuesSource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we calling this a ValuesProvider
instead of a ValuesSource
? I really don't want to add another distinction of terms to an already confusing part of the code unless we're very clear what the new terminology means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea. Seems wrong to me too. I've fixed it in 33cb378.
|
||
import java.io.IOException; | ||
|
||
public abstract class ShapeValuesSourceType implements Writeable, ValuesSourceType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be Writeable
? ValuesSourceType
implementations are intended to be stateless, which means they should never need to be written. It's actually quite concerning to me if we need a writeable values source type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I traced this back and it was introduced by Tal Levy in 2020 as:
public class GeoShapeValuesSourceType implements Writeable, ValuesSourceType
I do not know why it was writable. My vote is we investigate that separate from this PR because my goal with this PR was to leave behaviour as unchanged as possible, just moving around code to enable the Cartesian work.
I notice that the writeTo method is empty and it would be interesting to simply remove it and see what PR building does. My default thinking is to save that for another PR, but if you have a strong opinion here, we can do it here directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I thought a CI test of this would be a good idea. I pushed that in 541e47f. Let's see how it goes and then make a decision.
In preparation for supporting CartesianPoint and CartesianShape in aggregations, this PR adds a common interface between GeoPoint and CartesianPoint, and then uses that to split out some key common code that will be used in CartesianPoint and CartesianShape aggregations
Co-authored-by: Ignacio Vera <[email protected]>
26b11a6
to
2e592b5
Compare
It extends ValuesSource, and is extended by GeoShapeValuesSource. There is no reason for the suffix `Provider`.
throw new IllegalArgumentException("can't sort on geo_shape field without using specific sorting feature, like geo_distance"); | ||
} | ||
|
||
public static class GeoBuilder implements IndexFieldData.Builder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This builder is not needed, we can just use a lambda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted!
Based on Ignacio's work in 050df95, we fix the BoundingBox generics, and also add a little more specificity to the previous generics (replace <?> with <? extends SpatialPoint>).
builder.endObject(); | ||
return builder; | ||
} | ||
|
||
public XContentBuilder toXContentFragment(XContentBuilder builder, boolean buildLatLonFields) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of buildLatLonFields is true almost always, and is also something geo-specific. We should get rid of it and have another method for the false case that only exists in the geo-implementation.
return builder; | ||
} | ||
|
||
public abstract XContentBuilder toXContentFragment(XContentBuilder builder, boolean buildLatLonFields) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buildLatLonField
sounds very geo-specific. We should be able to get rid of this for now.
static final ParseField RIGHT_FIELD = new ParseField("right"); | ||
static final ParseField WKT_FIELD = new ParseField("wkt"); | ||
public static final ParseField BOUNDS_FIELD = new ParseField("bounds"); | ||
public static final ParseField LAT_FIELD = new ParseField("lat"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LAT
and LON
are geo-specific and should be in GeoBoundingBox
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. As noted, there's follow up work, but this looks like a big improvement to me, so let's get it in. Thanks for taking this on!
In preparation for supporting CartesianPoint and CartesianShape
in aggregations, this PR adds a common interface between GeoPoint
and CartesianPoint, and then uses that to split out some key common
code that will be used in CartesianPoint and CartesianShape aggregations