Allowing dots in field names #15951

clintongormley · 2016-01-13T12:42:18Z

As part of the Great Mapping Refactoring (#8870), we had to reject field names containing dots (#12068), eg:

{ 
  "foo.bar": "val1",  
  "foo": {
    "bar": "val2"
  }
}

The behaviour was undefined and resulted in ambiguities when trying to reference fields with the dot notation used in queries and aggregations.

Removing support for dots has caused pain for a number of users and especially as Elasticsearch is being used more and more for the metrics use case (where dotted fields are common), we should consider what we can do to improve this situation. Now that mappings are much stricter (and immutable), it becomes feasible to revisit the question of whether to allow dots to occur in field names.

Replace dots with another character

The first and simplest solution is to simply replace dots in field names with another character (eg _) as is done by the Logstash de_dot filter and which will be supported natively in Elasticsearch by the node ingest de_dot processor.

Treat dots as paths

Another solution would be to treat fields with dots in them as "paths" rather than field names. In other words, these two documents would be equivalent:

{ "foo.bar": "value" }
{ "foo": { "bar": "value" }}

To use an edge case as an example, the following document:

{
  "foo.bar" : {
    "baz": "val1"
  },
  "foo": {
    "bar.baz": "val2"
  }

}

would result in the following mapping:

{
  "properties": {
    "foo": {
      "type": "object",
      "properties": {
        "bar": {
          "type": "object",
          "properties": {
            "baz": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}

The lucene field would be called foo.bar.baz and would contain the terms ["val1", "val2]. Stored fields or doc values (for supported datatypes), would both contain ["val1", "val2"].

Issues with this approach

This solution works well for search and aggregations, but leaves us with two incongruities:

`_source=`

The first occurs when using the _source= parameter to do source filtering on the response. The reason for this is that the _source field is stored as provided - it is not normalized before being stored For instance:

GET _search?_source=foo.*

would return:

{
  "foo.bar" : {
    "baz": "val1"
  },
  "foo": {
    "bar.baz": "val2"
  }

}

rather than:

{
  "foo": {
    "bar": {
      "baz": [
        "val1",
        "val2"
      ]
    }
  }
}

Update requests

The second occurs during update requests, which uses the _source as a map-of-maps. Running an update like:

POST index/type/id/_update
{
  "doc": {
    "foo": {
      "bar": {
        "baz": "val3"
      }
    }
  }
}

could result (depending on how it is implemented) in any of the following:

Version 1:

{
  "foo": {
    "bar": {
      "baz": "val3"
    }
  }
}

Version 2:

{
  "foo": {
    "bar": {
      "baz": [
        "val1",
        "val2",
        "val3"
      ]
    }
  }
}

Version 3:

{
  "foo.bar": {
    "baz": "val1"
  },
  "foo": {
    "bar.baz": "val2",
    "bar": {
      "baz": "val3"
    }
  }
}

The text was updated successfully, but these errors were encountered:

kimchy · 2016-01-13T16:39:33Z

I spoke to @rjernst, and it might be simpler to disallow inconsistent dots in fields. So the first full path dots we allow, and the second one, we reject (similar to what we do with conflict on types). As an example: first allow {"x.y" : { "z" : "test" } }, then reject something like {"x" : {"y.z" : "test" }}

If we can do this, it might also be simpler to do it in 2.x, and it is more constraint compared to the above solution, and later we can extend (if we need to) to implement the above.

rjernst · 2016-01-13T19:37:01Z

@kimchy I don't think that will actually work, because of how we parse values and just append them. I don't think we could distinguish, without some nasty-ish logic, if a field is just appending to an existing field, or another field with the same path (and same goes for inside the mapper service itself with storing the mappers).

@clintongormley There are two things that bug me. First, why do we have _source filtering at all? We already have stored fields which can serve that same purpose (returning a subset of the document on search). The second thing that bugs me is that your example works at all. We allow duplicate values for a field to append instead of error? That is leniency at its best: I don't know of any json parsers that emit arrays as duplicate keys (at least not by default), which means the user is probably serializing themselves, and very likely has a bug in their serialization. I don't think we should support either of those features, but dropping _source filtering would at least remove your concern so we could do the dots-as-paths option?

nik9000 · 2016-01-13T21:10:18Z

I wonder if we can delay a decision on how to handle updates by not supporting dots in the document merge use case. You can always use scripts to be totally clear there if you need.

jpountz · 2016-01-13T21:38:40Z

I don't think we could distinguish, without some nasty-ish logic, if a field is just appending to an existing field, or another field with the same path

Actually this should already work today. The second document will trigger a dynamic mapping update that will be rejected since the mapping would have two mappers that have the same path: #15243

clintongormley · 2016-01-14T10:00:26Z

Actually this should already work today. The second document will trigger a dynamic mapping update that will be rejected since the mapping would have two mappers that have the same path: #15243

If we go with treating dots as paths ,then this won't work correctly, eg a document containing both forms (eg { foo: { bar.baz:..}},{foo.bar:{ baz...}}} will be rejected if indexed first, but if indexed after a document containing just one form, it will be accepted.

First, why do we have _source filtering at all?

@rjernst because users want to be able to get back what they put in, and to be able to distinguish between values such as:

""
null
[]
"val"
["val"]
["val", null]

You can't do this with stored fields.

The second thing that bugs me is that your example works at all. We allow duplicate values for a field to append instead of error? That is leniency at its best: I don't know of any json parsers that emit arrays as duplicate keys (at least not by default), which means the user is probably serializing themselves,

Where do you see duplicate keys?

{
  "foo.bar" : {
    "baz": "val1"
  },
  "foo": {
    "bar.baz": "val2"
  }
}

The above is perfectly valid JSON - no duplicate keys there. The fact that foo: bar.baz: val and foo.bar: baz: val end up being added to the same lucene field foo.bar.baz is just an artefact of the way we could translate dots to paths.

clintongormley · 2016-01-14T10:07:47Z

I don't think that will actually work, because of how we parse values and just append them. I don't think we could distinguish, without some nasty-ish logic, if a field is just appending to an existing field, or another field with the same path (and same goes for inside the mapper service itself with storing the mappers).

The only way I can see this working is as follows. Fields with dots are mapped with dots, so { "foo": { "bar.baz": "val" }} would be mapped as:

{
  "properties": {
    "foo": {
      "type": "object",
      "properties": {
        "bar.baz": {
          "type": "string"
        }
      }
    }
  }
}

When adding a new field:

If the field name contains dots (eg `foo.bar.baz`):
    Does the first part of the field (`foo`) exist as a field in the mapping already?
        If yes: conflict
        If no: add field as `foo.bar.baz`
Else (field name does not contain dots, eg `foo`)
    Do any fields exist in the mapping which start with `foo.`?
        If yes: conflict
        If no:  add field as `foo`

This logic would prevent conflicting paths from being added.

When looking up a field (eg foo.bar.baz) in search/aggs etc:

Does `foo` exist?
Does `foo.bar` exist?
Does `foo.bar.baz` exist?

clintongormley · 2016-01-14T10:12:38Z

By the way, the decision about dots also affects the node ingest plugin, which treats dots as steps in a path hierarchy, and has no support for escaping. This may not be a problem as long as the de_dot processor runs first but otherwise, it'll suffer from similar issues to those described above.

rjernst · 2016-01-18T20:51:54Z

@clintongormley The logic you described there for adding new fields and searching is exactly why I don't like that approach. That is much more complicated than what we have today (and I especially don't like that the lookup of a field for search becomes linear on the object level of the field).

I am still convinced that doing dots-as-path is the correct choice. There are really two sides to users pain here, the first is dynamic mappings, and the second is explicit mappings. In the first case, I believe we can implement it within the document parsing that we have (which is where dynamic mappings are determined), and can be done independently of the second case (which I believe is harder, but still doable).

As for your concerns about _source, my thoughts are as follows:

Source filtering should be viewed as a regex on the full path of a field, so I think returning the source as-is for all fields with a full path that match the regex is correct (so it should return your first example).
Update requests should work as they do today, which I believe merges the new document with the previous version. Therefore your "version 3" should be what happens. I do also think that we shouldn't be so concerned with returning a normalized view, but that can/should be explored in a separate issue.

@jpountz has expressed a concern with this approach and the edge cases it brings, in particular with nested fields. I think in the case, for example, where foo is already a nested field, we will need to reject foo.bar as a field, since it should require that foo is an object field. But I think that is completely doable and testable.

While discussing with @jpountz he also made me realize escaping might be simpler than I originally thought. However, I still think this dots-as-path approach is correct for a couple reasons:

It works with existing 1.x indexes for upgrade.
It does not complicate the api, or have possible confusion for users who pass in foo.bar in there document, but then later find they must query with foo\.bar.

clintongormley · 2016-01-19T12:57:13Z

As for your concerns about _source, my thoughts are as follows:

Agreed on both counts.

But I think that is completely doable and testable.

Good to hear. As long as the implemented solution is known to deal with the edge cases correctly, I'm happy.

It works with existing 1.x indexes for upgrade.

Note, mappings in 1.x indices have field names like:

"foo": {
    "type": "object",
    "properties": {
        "bar.baz": {....

So that structure would need to be updated on upgrade to:

"foo": {
    "type": "object",
    "properties": {
        "bar": {
            "type": "object",
            "properties": {
                "baz": ...

GlenRSmith · 2016-02-18T20:56:05Z

Any update on the likelihood of implementing something around this?

jpountz · 2016-02-22T18:20:19Z

@GlenRSmith I know @rjernst is currently exploring treating dots in field names as sub objects.

felixbarny · 2016-02-24T14:13:30Z

Another use case where I need dots in field names is for tracking request parameters. I currently store them like this:

"params": {
  "param1": "value1",
  "param2": "value2" 
}

I don't really have control over the names of the request parameters so the only option is to de_dot the parameter names. But then I can't use the stored information to reproduce/replay the captured request. Converting the parameters into

"params": [
  "key": "param1",
  "value": "value1"
]

isn't an option either, because I want do aggregations on specific parameters in Grafana.

Yet another use case of mine is that I store configuration parameters in Elasticsearch where the config keys are field names and contain dots.

So a big +1 from my side.

ryanmaclean · 2016-03-02T22:34:47Z

I hate to pile on, but our use case is identical to @felixbarny's. Perhaps I'm not understanding the need to treat these as sub-objects, but if that could be an option (even if it were the default) it would be much better than the only way to handle fields that contain dots.

bneff · 2016-03-14T21:57:24Z

I also have a use case similar to @felixbarny.

In 2.0 we began restricting fields to not contains dots in their names. This change adds back part of dots in fieldnames support. Specifically, it allows indexing documents that contain dots in the field names, when the correct corresponding mappers exist. For example, if mappings contain an object field `foo`, and a subfield `bar`, then indexing a document with `foo.bar` will work. see elastic#15951

felixbarny · 2016-05-16T08:41:52Z

Nice! Could you explain/document how this works now?

bneelima84 · 2016-07-15T06:39:16Z

Hi, I am using 5.0.0.4 alpha release and tried to create an index with the below mapping (which has dots in field names);

`{"mappings" : {
        "def" : {
           "_all": {
          "enabled": false
        },
        "_source": {
          "enabled": false
        },
            "properties" : {
                "Id" : { "type" : "string", "index" : "not_analyzed", "store" : true },

        "first.Name" : { "type" : "string", "index" : "not_analyzed" },
        "Last.Name" : { "type" : "string", "index" : "not_analyzed" },
                  "Middle.Name" : { "type" : "string", "index" : "not_analyzed" },
                    "Qual" : { "type" : "string", "index" : "not_analyzed" }
        }
        }
    }
}`

Bu this fails as below:
{
"error":
{
"root_cause":
[
1]
0:
{
"type": "mapper_parsing_exception"
"reason": "Field name [Middle.Name] cannot contain '.'"

}

"type": "mapper_parsing_exception"
"reason": "Failed to parse mapping [def]: Field name [Middle.Name] cannot contain '.'"
"caused_by":
{
"type": "mapper_parsing_exception"
"reason": "Field name [Middle.Name] cannot contain '.'"

}

"status": 400
}

Am I missing anything ?

rjernst · 2016-07-15T07:31:38Z

The current support for dot in field names is for dynamic mappings and document parsing. When specifying mappings directly, you will still need to split up the fields recursively. I opened #19443 to address this.

cdenneen · 2016-08-04T21:21:34Z

Can dots in field names be patched in 2.3.x. Otherwise it will require 1.x -> 2.x (re-work to undo all the dots in field names), then 2.x -> 5.x (allow dots back).
Currently this is a real show stopper in upgrading ES past 1.7 since can't use upgrade path and 1.x->5.x isn't supported.

s1monw · 2016-08-04T21:33:29Z

@cdenneen we are looking into possible solutions I will update the issue when we have more to say.

jpountz · 2016-08-04T21:55:08Z

@cdenneen I would like to clarify (it might be clear to you but not necessarily for other readers) that data will need to be reindexed anyway between 1.x and 5.x since elasticsearch only supports one major version back, and the version that matters in that case is the version that was used to create the index. So 5.x will not be able to read any index created in 1.x.

cdenneen · 2016-08-04T22:07:34Z

@s1monw thanks

@jpountz yes that's why i was saying upgrade path 1.x->5.x isn't supported but 1.x->2.x and 2.x->5.x is... but in order to do that you'd have to undo the dot fields for the 2.x upgrade and then put them back in 5.x after that upgrade... so unless there is a 1.x->5.x upgrade path I would think there needs to be a 2.x patch to support this to allow the upgrade to work (stepping up the major versions)

GlenRSmith · 2016-08-04T22:13:25Z

@cdenneen I think you're missing the point. A 2.x patch wouldn't help you. Indices created in 1.x can't be read in 5.x. Full stop. Not even if you had no conflicts and upgraded to 2.x first.

cdenneen · 2016-08-05T01:24:20Z

Glen, Upgrades from 1.x -> 2.x wouldn't convert the index to 2.x standard so you could do a 5.x upgrade later?

…

-Chris

jpountz · 2016-08-05T07:25:33Z

@cdenneen No. An index that lives in a 2.x cluster but was created with 1.x cannot be upgraded to 5.x.

clintongormley · 2016-08-11T15:35:27Z

@cdenneen just to clarify, if we get support for dots in fields into 2.4, you'd be able to upgrade to 2.4, reindex to a new index, the upgrade to 5.x

An alternate route would be to create a new 5.x cluster, then use reindex-from-remote to pull the indices you want to take with you into 5.x directly.

pktxu · 2016-11-09T16:41:12Z

@clintongormley graylog is affected by this, is it still being considered for inclusion in an hypothetic 2.4.2 release?

rjernst · 2016-11-09T16:43:05Z

Support for dots in field names was added in 2.4.0:
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/dots-in-names.html

aslakhellesoy · 2016-11-22T10:06:26Z

Solution is here: https://www.elastic.co/guide/en/elasticsearch/reference/2.4/dots-in-names.html

clintongormley added discuss :Search Foundations/Mapping Index mappings, including merging and defining field types Meta labels Jan 13, 2016

rjernst mentioned this issue Jan 18, 2016

disable dynamic mapping on fields with dots instead of rejecting documents #15714

Closed

jordansissel mentioned this issue Mar 2, 2016

Field Name Cannot Contain '.' elastic/logstash#4752

Closed

JD557 mentioned this issue Mar 30, 2016

Error "Field name cannot contain dot" trying to update a nested date field format #17404

Closed

rjernst mentioned this issue Apr 14, 2016

Support dots in field names when mapping already exists #17759

Merged

gowthamsadasivam mentioned this issue May 9, 2016

Replace/Ignore DOT character in field names before inserting into Elastic Search elastic/elasticsearch-hadoop#758

Closed

1 task

colings86 mentioned this issue May 11, 2016

Adds a methods to find (and dynamically create) the mappers for the parents of a field with dots in the field name #18106

Merged

colings86 closed this as completed in #18106 May 16, 2016

Shredder121 mentioned this issue Jun 29, 2016

QueryDSL MDC logging fields contain dots querydsl/querydsl#1941

Closed

Skyr mentioned this issue Sep 30, 2016

Logging to ELK stack with logging context not possible playframework/playframework#6612

Closed

christiangalsterer mentioned this issue Jan 21, 2017

Problem in parsing Spring Boot Metrics christiangalsterer/httpbeat#16

Closed

codefromthecrypt mentioned this issue Aug 11, 2017

Simplified span2 format openzipkin/zipkin#1499

Closed

danlevy1 mentioned this issue Mar 8, 2018

"object field starting or ending with a [.] makes object resolution ambiguous" does not appear to be accurate for all inputs #28948

Open

lukas-vlcek mentioned this issue Feb 19, 2019

improve undefined field handling ViaQ/fluent-plugin-viaq_data_model#16

Merged

fkelbert mentioned this issue Feb 20, 2020

Ingest: foreach processor not accepting dot-notation for nested fields #51037

Closed

HyukjinKwon mentioned this issue Mar 25, 2020

[SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet apache/spark#27728

Closed

stevedodson mentioned this issue Mar 30, 2021

pandas_to_eland, dot case columns naming raises a mapper_parsing_exception elastic/eland#344

Closed

masseyke mentioned this issue Feb 4, 2022

Dots in field names exception elastic/elasticsearch-hadoop#853

Open

mpfz0r mentioned this issue Jul 13, 2022

Dots in field names are replaced silently Graylog2/graylog2-server#13043

Open

olada mentioned this issue Mar 21, 2023

Add support for field mappings containing dots spring-projects/spring-data-elasticsearch#2502

Closed

philrz mentioned this issue Oct 31, 2023

Add grok function brimdata/super#4827

Merged

javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing dots in field names #15951

Allowing dots in field names #15951

clintongormley commented Jan 13, 2016

kimchy commented Jan 13, 2016

rjernst commented Jan 13, 2016

nik9000 commented Jan 13, 2016

jpountz commented Jan 13, 2016

clintongormley commented Jan 14, 2016

clintongormley commented Jan 14, 2016

clintongormley commented Jan 14, 2016

rjernst commented Jan 18, 2016

clintongormley commented Jan 19, 2016

GlenRSmith commented Feb 18, 2016

jpountz commented Feb 22, 2016

felixbarny commented Feb 24, 2016

ryanmaclean commented Mar 2, 2016

bneff commented Mar 14, 2016

felixbarny commented May 16, 2016

bneelima84 commented Jul 15, 2016

rjernst commented Jul 15, 2016

cdenneen commented Aug 4, 2016

s1monw commented Aug 4, 2016

jpountz commented Aug 4, 2016

cdenneen commented Aug 4, 2016

GlenRSmith commented Aug 4, 2016

cdenneen commented Aug 5, 2016 via email

jpountz commented Aug 5, 2016

clintongormley commented Aug 11, 2016

pktxu commented Nov 9, 2016

rjernst commented Nov 9, 2016

aslakhellesoy commented Nov 22, 2016

Allowing dots in field names #15951

Allowing dots in field names #15951

Comments

clintongormley commented Jan 13, 2016

Replace dots with another character

Treat dots as paths

Issues with this approach

_source=

Update requests

kimchy commented Jan 13, 2016

rjernst commented Jan 13, 2016

nik9000 commented Jan 13, 2016

jpountz commented Jan 13, 2016

clintongormley commented Jan 14, 2016

clintongormley commented Jan 14, 2016

clintongormley commented Jan 14, 2016

rjernst commented Jan 18, 2016

clintongormley commented Jan 19, 2016

GlenRSmith commented Feb 18, 2016

jpountz commented Feb 22, 2016

felixbarny commented Feb 24, 2016

ryanmaclean commented Mar 2, 2016

bneff commented Mar 14, 2016

felixbarny commented May 16, 2016

bneelima84 commented Jul 15, 2016

}

}

}

rjernst commented Jul 15, 2016

cdenneen commented Aug 4, 2016

s1monw commented Aug 4, 2016

jpountz commented Aug 4, 2016

cdenneen commented Aug 4, 2016

GlenRSmith commented Aug 4, 2016

cdenneen commented Aug 5, 2016 via email

jpountz commented Aug 5, 2016

clintongormley commented Aug 11, 2016

pktxu commented Nov 9, 2016

rjernst commented Nov 9, 2016

aslakhellesoy commented Nov 22, 2016

`_source=`