-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allowing dots in field names #15951
Comments
I spoke to @rjernst, and it might be simpler to disallow inconsistent dots in fields. So the first full path dots we allow, and the second one, we reject (similar to what we do with conflict on types). As an example: first allow If we can do this, it might also be simpler to do it in 2.x, and it is more constraint compared to the above solution, and later we can extend (if we need to) to implement the above. |
@kimchy I don't think that will actually work, because of how we parse values and just append them. I don't think we could distinguish, without some nasty-ish logic, if a field is just appending to an existing field, or another field with the same path (and same goes for inside the mapper service itself with storing the mappers). @clintongormley There are two things that bug me. First, why do we have _source filtering at all? We already have stored fields which can serve that same purpose (returning a subset of the document on search). The second thing that bugs me is that your example works at all. We allow duplicate values for a field to append instead of error? That is leniency at its best: I don't know of any json parsers that emit arrays as duplicate keys (at least not by default), which means the user is probably serializing themselves, and very likely has a bug in their serialization. I don't think we should support either of those features, but dropping _source filtering would at least remove your concern so we could do the dots-as-paths option? |
I wonder if we can delay a decision on how to handle updates by not supporting dots in the document merge use case. You can always use scripts to be totally clear there if you need. |
Actually this should already work today. The second document will trigger a dynamic mapping update that will be rejected since the mapping would have two mappers that have the same path: #15243 |
If we go with treating dots as paths ,then this won't work correctly, eg a document containing both forms (eg
@rjernst because users want to be able to get back what they put in, and to be able to distinguish between values such as:
You can't do this with stored fields.
Where do you see duplicate keys?
The above is perfectly valid JSON - no duplicate keys there. The fact that |
The only way I can see this working is as follows. Fields with dots are mapped with dots, so
When adding a new field:
This logic would prevent conflicting paths from being added. When looking up a field (eg
|
By the way, the decision about dots also affects the node ingest plugin, which treats dots as steps in a path hierarchy, and has no support for escaping. This may not be a problem as long as the |
@clintongormley The logic you described there for adding new fields and searching is exactly why I don't like that approach. That is much more complicated than what we have today (and I especially don't like that the lookup of a field for search becomes linear on the object level of the field). I am still convinced that doing As for your concerns about
@jpountz has expressed a concern with this approach and the edge cases it brings, in particular with nested fields. I think in the case, for example, where While discussing with @jpountz he also made me realize escaping might be simpler than I originally thought. However, I still think this
|
Agreed on both counts.
Good to hear. As long as the implemented solution is known to deal with the edge cases correctly, I'm happy.
Note, mappings in 1.x indices have field names like:
So that structure would need to be updated on upgrade to:
|
Any update on the likelihood of implementing something around this? |
@GlenRSmith I know @rjernst is currently exploring treating dots in field names as sub objects. |
Another use case where I need dots in field names is for tracking request parameters. I currently store them like this:
I don't really have control over the names of the request parameters so the only option is to de_dot the parameter names. But then I can't use the stored information to reproduce/replay the captured request. Converting the parameters into
isn't an option either, because I want do aggregations on specific parameters in Grafana. Yet another use case of mine is that I store configuration parameters in Elasticsearch where the config keys are field names and contain dots. So a big +1 from my side. |
I hate to pile on, but our use case is identical to @felixbarny's. Perhaps I'm not understanding the need to treat these as sub-objects, but if that could be an option (even if it were the default) it would be much better than the only way to handle fields that contain dots. |
I also have a use case similar to @felixbarny. |
In 2.0 we began restricting fields to not contains dots in their names. This change adds back part of dots in fieldnames support. Specifically, it allows indexing documents that contain dots in the field names, when the correct corresponding mappers exist. For example, if mappings contain an object field `foo`, and a subfield `bar`, then indexing a document with `foo.bar` will work. see elastic#15951
Nice! Could you explain/document how this works now? |
Hi, I am using 5.0.0.4 alpha release and tried to create an index with the below mapping (which has dots in field names); `{"mappings" : { "first.Name" : { "type" : "string", "index" : "not_analyzed" }, Bu this fails as below: }"type": "mapper_parsing_exception" }}"status": 400 Am I missing anything ? |
The current support for dot in field names is for dynamic mappings and document parsing. When specifying mappings directly, you will still need to split up the fields recursively. I opened #19443 to address this. |
Can dots in field names be patched in 2.3.x. Otherwise it will require 1.x -> 2.x (re-work to undo all the dots in field names), then 2.x -> 5.x (allow dots back). |
@cdenneen we are looking into possible solutions I will update the issue when we have more to say. |
@cdenneen I would like to clarify (it might be clear to you but not necessarily for other readers) that data will need to be reindexed anyway between 1.x and 5.x since elasticsearch only supports one major version back, and the version that matters in that case is the version that was used to create the index. So 5.x will not be able to read any index created in 1.x. |
@s1monw thanks @jpountz yes that's why i was saying upgrade path 1.x->5.x isn't supported but 1.x->2.x and 2.x->5.x is... but in order to do that you'd have to undo the dot fields for the 2.x upgrade and then put them back in 5.x after that upgrade... so unless there is a 1.x->5.x upgrade path I would think there needs to be a 2.x patch to support this to allow the upgrade to work (stepping up the major versions) |
@cdenneen I think you're missing the point. A 2.x patch wouldn't help you. Indices created in 1.x can't be read in 5.x. Full stop. Not even if you had no conflicts and upgraded to 2.x first. |
Glen,
Upgrades from 1.x -> 2.x wouldn't convert the index to 2.x standard so you
could do a 5.x upgrade later?
…-Chris
|
@cdenneen No. An index that lives in a 2.x cluster but was created with 1.x cannot be upgraded to 5.x. |
@cdenneen just to clarify, if we get support for dots in fields into 2.4, you'd be able to upgrade to 2.4, reindex to a new index, the upgrade to 5.x An alternate route would be to create a new 5.x cluster, then use reindex-from-remote to pull the indices you want to take with you into 5.x directly. |
@clintongormley graylog is affected by this, is it still being considered for inclusion in an hypothetic 2.4.2 release? |
Support for dots in field names was added in 2.4.0: |
As part of the Great Mapping Refactoring (#8870), we had to reject field names containing dots (#12068), eg:
The behaviour was undefined and resulted in ambiguities when trying to reference fields with the dot notation used in queries and aggregations.
Removing support for dots has caused pain for a number of users and especially as Elasticsearch is being used more and more for the metrics use case (where dotted fields are common), we should consider what we can do to improve this situation. Now that mappings are much stricter (and immutable), it becomes feasible to revisit the question of whether to allow dots to occur in field names.
Replace dots with another character
The first and simplest solution is to simply replace dots in field names with another character (eg
_
) as is done by the Logstash de_dot filter and which will be supported natively in Elasticsearch by the node ingestde_dot
processor.Treat dots as paths
Another solution would be to treat fields with dots in them as "paths" rather than field names. In other words, these two documents would be equivalent:
To use an edge case as an example, the following document:
}
would result in the following mapping:
The lucene field would be called
foo.bar.baz
and would contain the terms["val1", "val2]
. Stored fields or doc values (for supported datatypes), would both contain["val1", "val2"]
.Issues with this approach
This solution works well for search and aggregations, but leaves us with two incongruities:
_source=
The first occurs when using the
_source=
parameter to do source filtering on the response. The reason for this is that the_source
field is stored as provided - it is not normalized before being stored For instance:would return:
}
rather than:
Update requests
The second occurs during update requests, which uses the
_source
as a map-of-maps. Running an update like:could result (depending on how it is implemented) in any of the following:
Version 1:
Version 2:
Version 3:
The text was updated successfully, but these errors were encountered: