Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Field Mapping and Templates using numeric_detection does not match documentation #30939

Closed
StevenToth opened this issue May 29, 2018 · 3 comments
Assignees
Labels
>bug >docs General docs changes :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@StevenToth
Copy link

Describe the feature: Bug in Dynamic field mapping of floating point numbers with numeric_detection enabled.

Elasticsearch version (bin/elasticsearch --version): 6.2.2

Plugins installed: []

JVM version (java -version): 1.8.0_66

OS version (uname -a if on a Unix-like system): RHEL6

Description of the problem including expected versus actual behavior:
According to the documentation on Dynamic field mapping, when numeric_detection is enabled passing a floating point number as a string will map the field to a double.

PUT my_index/_doc/1
{
  "my_float":   "1.0",
  "my_integer": "1" 
}

The my_float field is added as a double field.

However, this actually results in the field being mapped as a float.

In addition, the documentation on Dynamic templates indicates that only the following datatypes can be dynamically mapped using match_mapping_type:

Only the following datatypes can be automatically detected: boolean, date, double, long, object, string.

Therefore, based on the absence of the float datatype in that list and the fact that the floating point numbers are being dynamically mapped with a float datatype, it would seem that dynamically mapped floating point numbers cannot be mapped to a double datatype using match_mapping_type.

There is a workaround using match_mapping_type. I verified dynamic templates will not allow a value of float for the match_mapping_type, but I found that using a dynamic template with a match_mapping_type of double will map the fields that would have been dynamically mapped as float to double. See Steps to reproduce for an example of the workaround.

*DISCLAIMER The following is a shameless plug for 64-bit Unsigned Integer support in Elasticsearch
The dynamic mapping to double is needed as there is no support for 64-bit unsigned integers in Elasticsearch, whereas the system publishing the data (a custom Elastic Beat written in Go) does support 64-bit unsigned integers. The Beat was coded to workaround the limitations of Elasticsearch by publishing the 64-bit unsigned integer values as strings with a trailing '.0', so they are dynamically mapped as doubles. Otherwise, calculations, including those done in aggregations, in Elasticsearch were generating inaccurate results (even with the understanding that certain precision loss would happen) due to overflows (precision loss is one thing, getting a negative result when it is not mathematically possible is another). I've only been working with the stack for a short time, but I appreciate the complexity, capabilities and power of it. However, it feels "hackish" to have to treat 64-bit unsigned integers (not an uncommon thing) as floating point numbers masquerading as strings to be able to dynamically map and store them in a way that can efficiently be used in calculations and aggregations.

Steps to reproduce:

  1. Turn on numeric detection
PUT /_template/my_index_template
{
  "index_patterns": ["my_index"],
  "mappings": {
    "doc": {
      "numeric_detection": true
    }
  }
}
  1. Add document
POST my_index/doc/
{
  "start":        "1527613753042816000.0",
  "end":          "1527613753110000128.0",
  "iterations":   "100"
}
  1. Get mapping
GET my_index/_mapping/

start and end fields are dynamically mapped as float

{
  "my_index": {
    "mappings": {
      "doc": {
        "numeric_detection": true,
        "properties": {
          "end": {
            "type": "float"
          },
          "iterations": {
            "type": "long"
          },
          "start": {
            "type": "float"
          }
        }
      }
    }
  }
}

WORKAROUND

  1. Delete index
DELETE /my_index
  1. Replace template [with one that maps double to double]
PUT /_template/my_index_template
{
  "index_patterns": ["my_index"],
  "mappings": {
    "doc": {
      "numeric_detection": true,
      "dynamic_templates": [
        {
          "not_so_double_to_double": {
            "match_mapping_type": "double",
            "mapping": {
              "type": "double"
            }
          }
        }
       ]
    }
  }
}
  1. Add [same] document
POST my_index/doc/
{
  "start":        "1527613753042816000.0",
  "end":          "1527613753110000128.0",
  "iterations":   "100"
}
  1. Get the mapping
GET my_index/_mapping/

start and end fields are dynamically mapped as double

{
  "my_index": {
    "mappings": {
      "doc": {
        "dynamic_templates": [
          {
            "not_so_double_to_double": {
              "match_mapping_type": "double",
              "mapping": {
                "type": "double"
              }
            }
          }
        ],
        "numeric_detection": true,
        "properties": {
          "end": {
            "type": "double"
          },
          "iterations": {
            "type": "long"
          },
          "start": {
            "type": "double"
          }
        }
      }
    }
  }
}

EPILOGUE
Just to illustrate the impact of the mapping using the following query:

GET my_index/_search?size=0
{
  "aggs": {
    "avgDurationPerIteration": {
      "avg": {
        "script": "(doc['end'].value - doc['start'].value) / doc['iterations'].value"
      }
    }
  }
}
  1. When fields are mapped as float
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "avgDurationPerIteration": {
      "value": 0
    }
  }
}
  1. When fields are mapped as double
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "avgDurationPerIteration": {
      "value": 671841.28
    }
  }
}

NOTE: Results are the same even when using BigDecimal to do the calculations:

GET my_index/_search?size=0
{
  "aggs": {
    "avgDurationPerIteration": {
      "avg": {
        "script": "BigDecimal.valueOf(doc['end'].value).subtract(BigDecimal.valueOf(doc['start'].value)).divide(BigDecimal.valueOf(doc['iterations'].value*1.0)).doubleValue()"
      }
    }
  }
}

Provide logs (if relevant):

@colings86 colings86 added the :Search Foundations/Mapping Index mappings, including merging and defining field types label May 30, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@colings86
Copy link
Contributor

@jpountz could you confirm if this is a bug in the code or the documentation?

@jpountz
Copy link
Contributor

jpountz commented May 30, 2018

According to the documentation on Dynamic field mapping, when numeric_detection is enabled passing a floating point number as a string will map the field to a double. [...] However, this actually results in the field being mapped as a float.

The documentation is wrong indeed. I'll fix.

Therefore, based on the absence of the float datatype in that list and the fact that the floating point numbers are being dynamically mapped with a float datatype, it would seem that dynamically mapped floating point numbers cannot be mapped to a double datatype using match_mapping_type.

That part probably needs clarification though I'm not sure how to do it. Basically, the json parser assumes the wider datatype that is not a BigDecimal/BigInteger. So floating-point numbers will be detected as a double rather than a float and integers will be detected as a long rather than an integer. You can dynamically map floating-point numbers to doubles by using double as a match_mapping_type and double as a type in the mapping.

*DISCLAIMER The following is a shameless plug for 64-bit Unsigned Integer support in Elasticsearch
The dynamic mapping to double is needed as there is no support for 64-bit unsigned integers in Elasticsearch, whereas the system publishing the data (a custom Elastic Beat written in Go) does support 64-bit unsigned integers.

This feature has been asked already at #30939, but it has taken some time because most of the needs that were expressed initially couldn't be supported, such as aggregating big decimals. However there seems to be a number of people who seem interested in uint64 support and accept the fact that aggregations would work by dynamically converting them to doubles, which might incur some accuracy loss.

@jpountz jpountz added >bug >docs General docs changes labels May 30, 2018
jpountz added a commit to jpountz/elasticsearch that referenced this issue May 30, 2018
@javanna javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug >docs General docs changes :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

5 participants