-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Geo Context Parsing More Strict #32412
Make Geo Context Parsing More Strict #32412
Conversation
Currently, if geo context is represented by something other than geo_point or an object with lat and lon fields, the parsing of it as a geo context can result in ignoring the context altogether, returning confusing errors such as number_format_exception or trying to parse the number specifying as long-encoded hash code. It would also fail if the geo_point was stored. This commit makes the parsing more strict and will fail the indexing if the context is not specified correctly. It also adds support for stored geo_points. Closes elastic#32202
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Geo Context Suggester hasn't had much work on it in a while. Thanks for doing this @imotov
} else { | ||
// Does this object exist? If it does and we didn't find what we need - we need to warn user | ||
for (IndexableField field : document.getFields()) { | ||
if (field.name().startsWith(fieldName + ".")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the "dots in field names" discussion has been a long running one. Do we not yet have a more general/graceful way of throwing these exceptions? This is more of a question out of my own curiosity and not intended to hold up the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we should do that, can we restrict the geo context to geo
fields and validate this assumption from the mapping (context.mapperService
) directly ? If the field has the correct type in the mapping we don't have to check why it's missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can definitely remove that and clean this up even more, but this will be much more breaking change, since today you can have this sudo geo object and it works, if we will remove this in 7, then working indices with this, created in 6 are going to break. I am not sure we can do that. We will probably have to disable it for indices in 7 but I don't think we can fully remove it until 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd treat this case like the geohash string version, this is not documented nor tested so we should remove it. The documentation is clear, you can use a path that references a geo_point field or set the point directly in the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay @imotov . I left some comments.
@@ -200,18 +201,29 @@ protected XContentBuilder toInnerXContent(XContentBuilder builder, Params params | |||
geohashes.add(stringEncode(spare.getLon(), spare.getLat(), precision)); | |||
} | |||
} | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code above seems useless in 7 (even in 6.x ?) since we have a LatLonPoint
and a LatLonDocValuesField
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I understood the code above is if you want to represent lat and lon as an object, but don't want to index it as an actual geopoint you could do that.
} else { | ||
// Does this object exist? If it does and we didn't find what we need - we need to warn user | ||
for (IndexableField field : document.getFields()) { | ||
if (field.name().startsWith(fieldName + ".")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we should do that, can we restrict the geo context to geo
fields and validate this assumption from the mapping (context.mapperService
) directly ? If the field has the correct type in the mapping we don't have to check why it's missing.
@jimczi I pushed some changes. Not really sure if I understood your idea correctly. But it might be easier to collaborate on the code. Could you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed with @imotov, this is a breaking change so we should apply the strict parsing only on new indices and log a deprecation warning for indices created in 6x.
Discussed this a bit more with @nik9000. It looks like there might be a way to move this check into mapping phase instead of indexing phase. I will give it a try. |
I think I figured it out, but since it's complete rewrite with a completely new approach and no shared history I am going to open another PR. Closing this one for now. |
Currently, if geo context is represented by something other than geo_point or an object with lat and lon fields, the parsing of it as a geo context can result in ignoring the context altogether, returning confusing errors such as number_format_exception or trying to parse the number specifying as long-encoded hash code. It would also fail if the geo_point was stored. This commit makes the mapping parsing more strict and will fail during mapping update or index creation if the geo context doesn't point to a geo_point field. Supersedes #32412 Closes #32202
Currently, if geo context is represented by something other than
geo_point or an object with lat and lon fields, the parsing of it
as a geo context can result in ignoring the context altogether,
returning confusing errors such as number_format_exception or trying
to parse the number specifying as long-encoded hash code. It would also
fail if the geo_point was stored.
This commit makes the parsing more strict and will fail the indexing
if the context is not specified correctly. It also adds support for
stored geo_points and removes support for long geo_hashes in numeric
and string types since it doesn't seem to be intentional.
In case of objects the new change will accept objects that contain
lat and lon fields:
It will reject objects that contain any other fields without lat and
lon:
And it will ignore empty objects if other contexts for this suggestion are
provided:
Closes #32202