Dynamic Field Mapping and Templates using numeric_detection does not match documentation #30939

StevenToth · 2018-05-29T18:37:33Z

Describe the feature: Bug in Dynamic field mapping of floating point numbers with numeric_detection enabled.

Elasticsearch version (bin/elasticsearch --version): 6.2.2

Plugins installed: []

JVM version (java -version): 1.8.0_66

OS version (uname -a if on a Unix-like system): RHEL6

Description of the problem including expected versus actual behavior:
According to the documentation on Dynamic field mapping, when numeric_detection is enabled passing a floating point number as a string will map the field to a double.

PUT my_index/_doc/1
{
  "my_float":   "1.0",
  "my_integer": "1" 
}

The my_float field is added as a double field.

However, this actually results in the field being mapped as a float.

In addition, the documentation on Dynamic templates indicates that only the following datatypes can be dynamically mapped using match_mapping_type:

Only the following datatypes can be automatically detected: boolean, date, double, long, object, string.

Therefore, based on the absence of the float datatype in that list and the fact that the floating point numbers are being dynamically mapped with a float datatype, it would seem that dynamically mapped floating point numbers cannot be mapped to a double datatype using match_mapping_type.

There is a workaround using match_mapping_type. I verified dynamic templates will not allow a value of float for the match_mapping_type, but I found that using a dynamic template with a match_mapping_type of double will map the fields that would have been dynamically mapped as float to double. See Steps to reproduce for an example of the workaround.

*DISCLAIMER The following is a shameless plug for 64-bit Unsigned Integer support in Elasticsearch
The dynamic mapping to double is needed as there is no support for 64-bit unsigned integers in Elasticsearch, whereas the system publishing the data (a custom Elastic Beat written in Go) does support 64-bit unsigned integers. The Beat was coded to workaround the limitations of Elasticsearch by publishing the 64-bit unsigned integer values as strings with a trailing '.0', so they are dynamically mapped as doubles. Otherwise, calculations, including those done in aggregations, in Elasticsearch were generating inaccurate results (even with the understanding that certain precision loss would happen) due to overflows (precision loss is one thing, getting a negative result when it is not mathematically possible is another). I've only been working with the stack for a short time, but I appreciate the complexity, capabilities and power of it. However, it feels "hackish" to have to treat 64-bit unsigned integers (not an uncommon thing) as floating point numbers masquerading as strings to be able to dynamically map and store them in a way that can efficiently be used in calculations and aggregations.

Steps to reproduce:

Turn on numeric detection

PUT /_template/my_index_template
{
  "index_patterns": ["my_index"],
  "mappings": {
    "doc": {
      "numeric_detection": true
    }
  }
}

Add document

POST my_index/doc/
{
  "start":        "1527613753042816000.0",
  "end":          "1527613753110000128.0",
  "iterations":   "100"
}

Get mapping

GET my_index/_mapping/

start and end fields are dynamically mapped as float

{
  "my_index": {
    "mappings": {
      "doc": {
        "numeric_detection": true,
        "properties": {
          "end": {
            "type": "float"
          },
          "iterations": {
            "type": "long"
          },
          "start": {
            "type": "float"
          }
        }
      }
    }
  }
}

WORKAROUND

Delete index

DELETE /my_index

Replace template [with one that maps double to double]

PUT /_template/my_index_template
{
  "index_patterns": ["my_index"],
  "mappings": {
    "doc": {
      "numeric_detection": true,
      "dynamic_templates": [
        {
          "not_so_double_to_double": {
            "match_mapping_type": "double",
            "mapping": {
              "type": "double"
            }
          }
        }
       ]
    }
  }
}

Add [same] document

POST my_index/doc/
{
  "start":        "1527613753042816000.0",
  "end":          "1527613753110000128.0",
  "iterations":   "100"
}

Get the mapping

GET my_index/_mapping/

start and end fields are dynamically mapped as double

{
  "my_index": {
    "mappings": {
      "doc": {
        "dynamic_templates": [
          {
            "not_so_double_to_double": {
              "match_mapping_type": "double",
              "mapping": {
                "type": "double"
              }
            }
          }
        ],
        "numeric_detection": true,
        "properties": {
          "end": {
            "type": "double"
          },
          "iterations": {
            "type": "long"
          },
          "start": {
            "type": "double"
          }
        }
      }
    }
  }
}

EPILOGUE
Just to illustrate the impact of the mapping using the following query:

GET my_index/_search?size=0
{
  "aggs": {
    "avgDurationPerIteration": {
      "avg": {
        "script": "(doc['end'].value - doc['start'].value) / doc['iterations'].value"
      }
    }
  }
}

When fields are mapped as float

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "avgDurationPerIteration": {
      "value": 0
    }
  }
}

When fields are mapped as double

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "avgDurationPerIteration": {
      "value": 671841.28
    }
  }
}

NOTE: Results are the same even when using BigDecimal to do the calculations:

GET my_index/_search?size=0
{
  "aggs": {
    "avgDurationPerIteration": {
      "avg": {
        "script": "BigDecimal.valueOf(doc['end'].value).subtract(BigDecimal.valueOf(doc['start'].value)).divide(BigDecimal.valueOf(doc['iterations'].value*1.0)).doubleValue()"
      }
    }
  }
}

Provide logs (if relevant):

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-05-30T08:12:05Z

Pinging @elastic/es-search-aggs

colings86 · 2018-05-30T08:12:48Z

@jpountz could you confirm if this is a bug in the code or the documentation?

jpountz · 2018-05-30T08:16:37Z

According to the documentation on Dynamic field mapping, when numeric_detection is enabled passing a floating point number as a string will map the field to a double. [...] However, this actually results in the field being mapped as a float.

The documentation is wrong indeed. I'll fix.

Therefore, based on the absence of the float datatype in that list and the fact that the floating point numbers are being dynamically mapped with a float datatype, it would seem that dynamically mapped floating point numbers cannot be mapped to a double datatype using match_mapping_type.

That part probably needs clarification though I'm not sure how to do it. Basically, the json parser assumes the wider datatype that is not a BigDecimal/BigInteger. So floating-point numbers will be detected as a double rather than a float and integers will be detected as a long rather than an integer. You can dynamically map floating-point numbers to doubles by using double as a match_mapping_type and double as a type in the mapping.

*DISCLAIMER The following is a shameless plug for 64-bit Unsigned Integer support in Elasticsearch
The dynamic mapping to double is needed as there is no support for 64-bit unsigned integers in Elasticsearch, whereas the system publishing the data (a custom Elastic Beat written in Go) does support 64-bit unsigned integers.

This feature has been asked already at #30939, but it has taken some time because most of the needs that were expressed initially couldn't be supported, such as aggregating big decimals. However there seems to be a number of people who seem interested in uint64 support and accept the fact that aggregations would work by dynamically converting them to doubles, which might incur some accuracy loss.

Closes elastic#30939

Closes #30939

colings86 added the :Search Foundations/Mapping Index mappings, including merging and defining field types label May 30, 2018

colings86 assigned jpountz May 30, 2018

jpountz added >bug >docs General docs changes labels May 30, 2018

jpountz added a commit to jpountz/elasticsearch that referenced this issue May 30, 2018

Improve documentation of dynamic mappings.

8c7b656

Closes elastic#30939

jpountz mentioned this issue May 30, 2018

Improve documentation of dynamic mappings. #30952

Merged

jpountz closed this as completed in #30952 Jun 5, 2018

jpountz added a commit that referenced this issue Jun 5, 2018

Improve documentation of dynamic mappings. (#30952)

500094f

Closes #30939

jpountz added a commit that referenced this issue Jun 5, 2018

Improve documentation of dynamic mappings. (#30952)

693865e

Closes #30939

javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic Field Mapping and Templates using numeric_detection does not match documentation #30939

Dynamic Field Mapping and Templates using numeric_detection does not match documentation #30939

StevenToth commented May 29, 2018

elasticmachine commented May 30, 2018

colings86 commented May 30, 2018

jpountz commented May 30, 2018

Dynamic Field Mapping and Templates using numeric_detection does not match documentation #30939

Dynamic Field Mapping and Templates using numeric_detection does not match documentation #30939

Comments

StevenToth commented May 29, 2018

elasticmachine commented May 30, 2018

colings86 commented May 30, 2018

jpountz commented May 30, 2018