Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sum and stats.sum return different values when doc_count is 0. #26893

Closed
walterra opened this issue Oct 5, 2017 · 3 comments
Closed

sum and stats.sum return different values when doc_count is 0. #26893

walterra opened this issue Oct 5, 2017 · 3 comments

Comments

@walterra
Copy link

walterra commented Oct 5, 2017

Elasticsearch version: 7.0.0-alpha1-SNAPSHOT, Build: 9e36764/2017-10-03T12:16:42.018Z

Plugins installed: [x-pack]

JVM version: 1.8.0_144

OS version: macOS 10.13

Description of the problem including expected versus actual behavior:

You get different results for e.g. a distinct sum aggregation vs. the sum within the stats aggregation. For a doc_count of 0, the sum aggregation returns 0 whereas the sum within the stats aggregation is null.

Steps to reproduce:

  1. Create some documents:
PUT my_test/test/1
{
    "category" : "c1",
    "value": 1
}
PUT my_test/test/2
{
    "category" : "c2"
}
PUT my_test/test/3
{
    "category" : "c3",
    "value": 1
}
PUT my_test/test/4
{
    "category" : "c3",
    "value": 1
}
  1. Run a nested aggregation:
POST my_test/_search
{
  "size": 0,
  "aggregations": {
    "categories": {
      "terms": {
        "field": "category.keyword"
      },
      "aggs": {
        "my_min": {
          "min": {
            "field": "value"
          }
        },
        "my_max": {
          "max": {
            "field": "value"
          }
        },
        "my_avg": {
          "avg": {
            "field": "value"
          }
        },
        "my_sum": {
          "sum": {
            "field": "value"
          }
        },
        "my_stats": {
          "stats": {
            "field": "value"
          }
        }
      }
    }
  }
}
  1. The result looks like this:
  "aggregations": {
    "categories": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "c3",
          "doc_count": 2,
          "my_stats": {
            "count": 2,
            "min": 1,
            "max": 1,
            "avg": 1,
            "sum": 2
          },
          "my_max": {
            "value": 1
          },
          "my_avg": {
            "value": 1
          },
          "my_min": {
            "value": 1
          },
          "my_sum": {
            "value": 2
          }
        },
        {
          "key": "c1",
          "doc_count": 1,
          "my_stats": {
            "count": 1,
            "min": 1,
            "max": 1,
            "avg": 1,
            "sum": 1
          },
          "my_max": {
            "value": 1
          },
          "my_avg": {
            "value": 1
          },
          "my_min": {
            "value": 1
          },
          "my_sum": {
            "value": 1
          }
        },
        {
          "key": "c2",
          "doc_count": 1,
          "my_stats": {
            "count": 0,
            "min": null,
            "max": null,
            "avg": null,
            "sum": null
          },
          "my_max": {
            "value": null
          },
          "my_avg": {
            "value": null
          },
          "my_min": {
            "value": null
          },
          "my_sum": {
            "value": 0
          }
        }
      ]
    }
  }

The document in category c2 doesn't include the value field. All the aggregation results have a value of null whereas only the sum aggregation has a value of 0.

@colings86
Copy link
Contributor

We should change the Stats aggregation so the value of stats.sum when no documents are collected is 0 like value_count

@PammyS
Copy link

PammyS commented Oct 18, 2017

I will be happy to fix this issue! Please assign it to me

@colings86
Copy link
Contributor

colings86 commented Oct 23, 2017

@PammyS We are not able to assign issues to users that are not part of the Elastic org but we would love it if you are able to work on this fix. Please feel most welcome to open a PR. Thanks

liketic added a commit to liketic/elasticsearch that referenced this issue Nov 2, 2017
martijnvg added a commit that referenced this issue Nov 3, 2017
* master:
  Fixed byte buffer leak in Netty4 request handler
  Avoid uid creation in ParsedDocument (#27241)
  Rander sum as zero if count is zero for stats aggregation (#26893) (#27193)
  Add additional explanations around discovery.zen.ping_timeout (#27231)
  Remove unused searcher parameter in SearchService#createContext (#27227)
  Upgrade to Lucene 7.1 (#27225)
  Move IndexShard#getWritingBytes() under InternalEngine (#27209)
  Adjust bwc version for exists query tests
  Introducing took time for _msearch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants