-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create aliases for PassThroughObjectMapper subfields in FieldTypeLookup #106829
Conversation
Pinging @elastic/es-search (Team:Search) |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did an initial review round. I like the fact that this avoids updating the mapping with alias field mappers as part of using passthrough field mapper..
// PassThroughObjectMapper name. This is achieved by adding a second reference to their | ||
// MappedFieldType using the remaining suffix. | ||
Map<String, PassThroughObjectMapper> passThroughFieldAliases = new HashMap<>(); | ||
for (FieldMapper fieldMapper : fieldMappers) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can potentially be very expensive loop (depending on the number of fields and passthrough fields). Is there a less expensive way to do this?
This is achieved by adding a second reference to their MappedFieldType using the remaining suffix
Can you share a mapping that would look like this? I'm trying to understand this better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, switched to just check the passthrough subfields.
This is achieved by adding a second reference to their MappedFieldType using the remaining suffix
This is the alias that's added to the lookup, just like through the FieldAliasMappers before.
server/src/main/java/org/elasticsearch/index/mapper/PassThroughObjectMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/PassThroughObjectMapper.java
Outdated
Show resolved
Hide resolved
PassThroughObjectMapper( | ||
String name, | ||
String fullPath, | ||
Explicit<Boolean> enabled, | ||
Dynamic dynamic, | ||
Map<String, Mapper> mappers, | ||
Explicit<Boolean> timeSeriesDimensionSubFields | ||
Explicit<Boolean> timeSeriesDimensionSubFields, | ||
Set<String> supersededBy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an alternative way to deal with conflicts could be having a priority
mapping attribute instead? (range from: 0 to max integer)
If there are multiple passthrough fields then a priority attribute is required on all passthrough fields. If there're passthough fields with the same priority we fail with a validation error. If multiple passthrough fields match we pick the passthough field with the highest priority.
I think this could simplify to validation logic / logic in FieldTypeLookup
?
Note that index templates have a similar notion of priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this is that a user may start with one passthrough field, then add another one, then notice that there are conflicts. I don't think we should require users to have a finalized list of pass-through objects from the get go, @felixbarny too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this is that a user may start with one passthrough field, then add another one, then notice that there are conflicts.
If the second passthrough field is added, the new passthrough field will need a priority attribute and the existing passthrough field as well. I don't think that this is an issue? Passthrough fields are defined on templates, and template validation will fail when the priorities are missing or misaligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without having thought very deeply about this, I was also thinking of a numeric priority
mapping parameter. But need to read more to get a better understanding of what supersededBy
is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For OTel mappings, the precedence (from highest to lowest) would look something like this:
- Top-level/non passthrough fields
resource.attributes.*
scope.attributes.*
attributes.*
To me, it seems that could be achieved with a simple numeric priority
mapping parameter, or, at least with a single value for supersededBy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to using priorities. An issue that I see is what happens with existing pass-through objects missing priorities.. I can check for the existence of aliases and bypass the check for priorities, but that's not great either.. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we haven't really used the passthrough field type outside of testing, so this breaking change is probably fine. Alternatively, we could not require passthrough fields to have unique priorities and fall back to lexicographical ordering if the priority is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make priority required. This field type is undocumented and not ga-ed. This field type is only used for cross team testing at the moment.
if (subfield instanceof FieldMapper fieldMapper) { | ||
String name = fieldMapper.simpleName(); | ||
// Check for conflict between PassThroughObjectMapper subfields. | ||
PassThroughObjectMapper conflict = passThroughFieldAliases.put(name, passThroughMapper); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you could also sort the passThroughMappers in ascending order and simply let mappers with a higher priority override i.e. put without checking for conflicts.
Your solution works just as well so feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If priority is going to be required, this does feel simpler to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to check for conflicts, to avoid cases with inconsistent behavior, depending on the order they show up in the FieldTypeLookup.
@@ -172,13 +184,14 @@ private MappingLookup( | |||
} | |||
} | |||
|
|||
PassThroughObjectMapper.checkForInvalidPriorities(passThroughMappers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is called when creating or updating an index template as it doesn't affect a concrete index, yet. In any case, I think it would be prudent to add a test case for an index template with two passthrough field types that don't specify a priority to see if this gets caught.
If this indeed turns out not to be the appropriate place for the validation, an alternative may be to validate this in the root object mapper. Yet another approach could be to allow duplicate priorities and then fall back to lexicographical ordering if the priority is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So templates are validated in MetaIndexTemplateServervice#validateTemplate()
line 1812.
The merge method at some point creates a new DocumentMapper
instance and then the MappingLookup gets created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for double-checking!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting close, let's make priority attribute required?
|
||
public static class Builder extends ObjectMapper.Builder { | ||
|
||
// Controls whether subfields are configured as time-series dimensions. | ||
protected Explicit<Boolean> timeSeriesDimensionSubFields = Explicit.IMPLICIT_FALSE; | ||
|
||
// Controls which pass-through fields take precedence in case of conflicting aliases. | ||
protected int priority = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this attribute required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to put another idea out there. We could define that the default is 0, basically this is already what it does now. When adding another passthrough field, you don't need to update the existing one if the new field has a higher priority. Not defining the priority on the second field would fail because of the validation that the priority needs to be unique.
But I'd also be fine with making the priority required, even if there's only one passthrough field in the mappings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's cleaner to just make it required.
if (subfield instanceof FieldMapper fieldMapper) { | ||
String name = fieldMapper.simpleName(); | ||
// Check for conflict between PassThroughObjectMapper subfields. | ||
PassThroughObjectMapper conflict = passThroughFieldAliases.put(name, passThroughMapper); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If priority is going to be required, this does feel simpler to me.
* | ||
* In case different pass-through objects contain subfields with the same name (excluding the pass-through prefix), their aliases conflict. | ||
* To resolve this, the pass-through spec specifies which object takes precedence through required parameter "priority"; non-negative | ||
* integer values are accepted, with the highest priority value winning in case of conflicting aliases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Priority 1 is the highest priority, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No 0 is the lowest. I can change it if you feel strongly, do we have another case where we use priority in a field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. No strong opinion here, mainly wanted to understand whether priority is ascending or descending. Maybe a slight preference for only allowing positive integers (excluding zero).
do we have another case where we use priority in a field?
I've found
https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/common/Priority.java. Interesting it does have a priority 0 (IMMEDIATE
). But that seems to be a bit of a different use case.
I've also found we're using priority
for index templates and the higher the number, the higher the priority, so priority 1 would be a very low priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think Martijn suggested to use the same approach as the last example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two comments. LGTM otherwise.
import java.util.Locale; | ||
import java.util.Map; | ||
|
||
import static org.elasticsearch.common.xcontent.support.XContentMapValues.nodeBooleanValue; | ||
import static org.elasticsearch.common.xcontent.support.XContentMapValues.nodeIntegerValue; | ||
|
||
/** | ||
* Mapper for pass-through objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a disclaimer that this is a field type that is experimental, undocumented and shouldn't be used in production and that it exists today for prototyping purposes only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
PassThroughObjectMapper conflict = passThroughFieldAliases.put(name, passThroughMapper); | ||
if (conflict != null) { | ||
// Check the priorities of the conflicting objects. | ||
if (conflict.priority() > passThroughMapper.priority()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's reverse the priority? A field from a passthrough field with a higher priority wins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the case? If the inserted object has higher priority we leave it there, otherwise the conflicting one is put back. In both cases, fullNameToFieldType
will contain the alias for the object with the highest priority?
@elasticsearchmachine run elasticsearch-ci/packaging-tests-unix-sample |
@elasticsearchmachine run elasticsearch-ci/bwc-snapshots |
# Conflicts: # modules/data-streams/src/yamlRestTest/resources/rest-api-spec/test/data_stream/150_tsdb.yml # server/src/main/java/org/elasticsearch/index/mapper/MappingLookup.java
Gentle ping for @javanna, ptal. |
# Conflicts: # server/src/main/java/org/elasticsearch/index/mapper/MapperFeatures.java
# Conflicts: # server/src/test/java/org/elasticsearch/index/mapper/RootObjectMapperTests.java
@javanna I'm submitting this to get some mileage with testing. Please let me know if you have comments, I'm happy to address them in a follow-up. |
I just reviewed this and I am happy with the changes made, I think it's a net improvement. Thanks for taking the time! |
Great to hear Luca, thanks for the guidance here! |
Move the logic for creating aliases from
RootObjectMapper
toFieldTypeLookup
. We no longer create FieldAliasMapper fields but just the field names under the pass-through path to the sameMappedFieldType
- just as it happens for FieldAliasMapper objects. As a bonus, since aliases are created on-the-fly, it's now possible to handle conflicts gracefully using asuperseded_by
param in thePassThroughObjectMapper
spec.FieldTypeLookup
is used in fieldcaps API so the aliases are expected to show up there, but not in the index mappings.Related to #103648, #103567