Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing value not considered in min/max aggregations #48905

Closed
mattweber opened this issue Nov 7, 2019 · 8 comments · Fixed by #48970
Closed

missing value not considered in min/max aggregations #48905

mattweber opened this issue Nov 7, 2019 · 8 comments · Fixed by #48970
Assignees

Comments

@mattweber
Copy link
Contributor

The missing value is not considered in min/max aggreagtions in es7 as they were in previous versions and I don't see this documented as a breaking change. I believe this is due optimization of using segment min/max values.

To reproduce run the following against es6 and es7. The missing values should be returned as the min and max value of the aggregation.

curl -XPUT -H"Content-Type: application/json" 'localhost:9200/testmissing/_doc/1' -d '{"title": "with value", "value": 1}'
curl -XPUT -H"Content-Type: application/json" 'localhost:9200/testmissing/_doc/2?refresh' -d '{"title": "missing value"}'
curl -XPOST -H"Content-Type: application/json" 'localhost:9200/testmissing/_search?pretty' -d '{
    "size": 0,
    "aggs": {
        "min_missing": {
            "min": {
                "field": "value",
                "missing": -1
            }
        },
        "max_missinng": {
            "max": {
                "field": "value",
                "missing": 2
            }
        }
    }
}'
@imotov imotov added the :Analytics/Aggregations Aggregations label Nov 8, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@imotov
Copy link
Contributor

imotov commented Nov 8, 2019

Which version of elasticsearch did you test it on. I just tried it on 7.4.2 and I am getting:

{
  "took" : 12,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "max_missinng" : {
      "value" : 2.0
    },
    "min_missing" : {
      "value" : -1.0
    }
  }
}

What am I missing? (some pun intended)

@mattweber
Copy link
Contributor Author

You are right, it is fixed on 7.4.2. I was testing on 7.4.1, thanks!

@mattweber mattweber reopened this Nov 8, 2019
@mattweber
Copy link
Contributor Author

@imotov Actually no, issue still exists in 7.4.2. Appears to be inconsistent, run the following a couple times deleting the index between runs. Min should be -1 and max 10.

curl -XPUT -H"Content-Type: application/json" 'localhost:9200/testmissing/_doc/1' -d '{"title": "with values", "value": [2, 3, 5]}'
curl -XPUT -H"Content-Type: application/json" 'localhost:9200/testmissing/_doc/2' -d '{"title": "with values", "value": [7, 1]}'
curl -XPUT -H"Content-Type: application/json" 'localhost:9200/testmissing/_doc/3' -d '{"title": "with values", "value": [8, 2, 5, 4]}'
curl -XPUT -H"Content-Type: application/json" 'localhost:9200/testmissing/_doc/4?refresh' -d '{"title": "missing value"}'
curl -XPOST -H"Content-Type: application/json" 'localhost:9200/testmissing/_search?pretty' -d '{
    "size": 0,
    "aggs": {
        "min_missing": {
            "min": {
                "field": "value",
                "missing": -1
            }
        },
        "max_missinng": {
            "max": {
                "field": "value",
                "missing": 10
            }
        }
    }
}'

I actually have the above in code using ESIntegTestCase that reproduces every time I run it.

@imotov
Copy link
Contributor

imotov commented Nov 8, 2019

Could you share this ESIntegTestCase? I cannot reproduce it with rest commands.

@mattweber
Copy link
Contributor Author

@imotov imotov self-assigned this Nov 11, 2019
@imotov imotov added the >bug label Nov 11, 2019
@imotov
Copy link
Contributor

imotov commented Nov 11, 2019

@mattweber Thanks! I was able to reproduce it using your test and @polyfractal suggested the fix. It turned out the reproduction depends on distribution of documents in segments. That's why I wasn't able to reproduce it in kibana. I am going to open a PR soon.

@mattweber
Copy link
Contributor Author

Great! Thanks for working on it!

imotov added a commit to imotov/elasticsearch that referenced this issue Nov 11, 2019
Fixes the issue when the missing values can be ignored in min/max
due to BKD optimization.

Fixes elastic#48905
imotov added a commit that referenced this issue Nov 12, 2019
Fixes the issue when the missing values can be ignored in min/max
due to BKD optimization.

Fixes #48905
imotov added a commit that referenced this issue Nov 13, 2019
Fixes the issue when the missing values can be ignored in min/max
due to BKD optimization.

Fixes #48905
imotov added a commit that referenced this issue Nov 15, 2019
Fixes the issue when the missing values can be ignored in min/max
due to BKD optimization.

Fixes #48905
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants