Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buckets_path cannot route through nested aggregation? #29287

Open
webbnh opened this issue Mar 28, 2018 · 7 comments
Open

buckets_path cannot route through nested aggregation? #29287

webbnh opened this issue Mar 28, 2018 · 7 comments
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@webbnh
Copy link

webbnh commented Mar 28, 2018

Elasticsearch version (bin/elasticsearch --version):

Version: 6.2.1, Build: 7299dc3/2018-02-07T19:34:26.990113Z, JVM: 1.8.0_25

Plugins installed: []
None?

JVM version (java -version):

java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)

OS version (uname -a if on a Unix-like system):

Darwin mynode.local 14.5.0 Darwin Kernel Version 14.5.0: Sun Jun  4 21:40:08 PDT 2017; root:xnu-2782.70.3~1/RELEASE_X86_64 x86_64

Description of the problem including expected versus actual behavior:
The idea is to pick out a bunch of documents from the index which have interesting data in a few of their fields (I've removed some of the fields from the below for simplicity), organize those documents by the contents of the sourceId field, and then discard buckets which are empty or otherwise drawn from data which doesn't match.

I had a query similar to the below which worked. I then modified the document structure such that most of the interesting data moved to a nested mapping. Attempting to modify the query to match results in an error:

{
    "error": {
        "root_cause": [],
        "type": "search_phase_execution_exception",
        "reason": "",
        "phase": "fetch",
        "grouped": true,
        "failed_shards": [],
        "caused_by": {
            "type": "class_cast_exception",
            "reason": "org.elasticsearch.search.aggregations.bucket.nested.InternalNested cannot be cast to org.elasticsearch.search.aggregations.InternalMultiBucketAggregation"
        }
    },
    "status": 503
}

I tried several variations on the theme, and the commonality seems to be that a buckets_path cannot route through a nested aggregation.

New query:

{
    "size":  0,
    "query": {
        "nested": {
            "path": "eventData.6_2",
            "query": {
                "dis_max": {
                    "queries": [
                        { "term":  { "eventData.6_2.6_2_1_2": "093" } },
                        { "exists": { "field": "eventData.6_2.6_2_3" } },
                        { "range": { "eventData.6_2.6_2_3": { "lt": "1000" } } }
                        ]
                    }
                }
            }
        },
    "aggs": {
        "flights": {
            "terms": {
                "size":  100000,
                "field": "sourceId"
                },
            "aggs": {
                "subA": {
                    "nested": { "path": "eventData.6_2" },
                    "aggs": {
                        "TargetCount": {
                            "cardinality": {
	                        "field": "eventData.6_2.6_2_1_2",
                                "precision_threshold": 10
                                }
                            },
                        "MaxCC":  { "max": { "field": "eventData.6_2.6_2_3" } },
                        "FindIt":           {
                            "bucket_selector": {
                                "buckets_path": { "foundRecs": "TargetCount" },
                                "script":       "params.foundRecs > 0"
                                }
                            }
                        }
                    }
                }
            },
        "CC":  { "max_bucket": { "buckets_path": "flights>subA>MaxCC" } }
        }
    }

Steps to reproduce:

I'm willing to go scrape this together, but first I'd like confirmation that (a) it's not a fault in my query and (b) it's not just an implementation restriction.

Thanks!

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@colings86
Copy link
Contributor

@webbnh There is a bug here, but the bug is that we should be catching the problem at parsing time instead of when we try to run the pipeline aggregation and output a much better error.

The problem is that the request is trying to run the bucket_selector aggregation on the nested aggregation which is a single bucket aggregation and the bucket_selector agg only works on multi-bucket aggregations. I think what you intend to do is remove the entire terms bucket if the TargetCount of SubA is 0? If so you need to move the bucket selector up one level so it is a direct sub-agg to the terms aggregation and then modify the buckets_path. Something like the following:

{
  "size":0,
  "query":{
    "nested":{
      "path":"eventData.6_2",
      "query":{
        "dis_max":{
          "queries":[
            {
              "term":{
                "eventData.6_2.6_2_1_2":"093"
              }
            },
            {
              "exists":{
                "field":"eventData.6_2.6_2_3"
              }
            },
            {
              "range":{
                "eventData.6_2.6_2_3":{
                  "lt":"1000"
                }
              }
            }
          ]
        }
      }
    }
  },
  "aggs":{
    "flights":{
      "terms":{
        "size":100000,
        "field":"sourceId"
      },
      "aggs":{
        "subA":{
          "nested":{
            "path":"eventData.6_2"
          },
          "aggs":{
            "TargetCount":{
              "cardinality":{
                "field":"eventData.6_2.6_2_1_2",
                "precision_threshold":10
              }
            },
            "MaxCC":{
              "max":{
                "field":"eventData.6_2.6_2_3"
              }
            }
          }
        }
      },
      "aggs": {
        "FindIt":{
          "bucket_selector":{
            "buckets_path":{
              "foundRecs":"subA>TargetCount"
            },
            "script":"params.foundRecs > 0"
          }
        }
      }
    },
    "CC":{
      "max_bucket":{
        "buckets_path":"flights>subA>MaxCC"
      }
    }
  }
}

One unrelated thing to note is that your max_bucket aggregation will also not work. Pipeline aggregations need to be inside multi-bucket aggregations and cannot live at the top level. There is a separate issue for this: #14600. For now you will need to calculate the max bucket on the client side.

@webbnh
Copy link
Author

webbnh commented Mar 29, 2018

@colings86, thanks for the quick reply!

Your suggestion has a duplicate aggs key under flights, but when I remove that and place FindIt in the aggs with subA, then it seems to work! Thanks!!

I ran across #14600 looking for other reports of the problem I was encountering, but with your suggested change I'm not hitting the problem reported there. (I can't tell yet whether the query is actually working properly, as I don't have enough data in the new format yet, but my corrected query is producing values and no errors...so that seems positive! ;-) )

Thanks again for your help!

@colings86
Copy link
Contributor

@webbnh ok, glad its working for you. I'll leave this issue open to fix the validation problem so that a more clear error is returned at parsing time.

@biji-padhy
Copy link

Hi Team,
I am also facing similar issue. Pasting my code here.. it will a great help if someone can help me out. Thanks in advance.

"aggs": {
"business": {
"composite": {
"sources" : [
{ "competency_name": { "terms" : { "field": "busn_competency_name.keyword" } }
},
{ "component_name": { "terms" : { "field": "busn_component_name.keyword" } }
},
{ "busn_srvc_name": { "terms" : { "field": "busn_srvc_name.keyword" } }
}
]
},
"aggs" : {
"comp" : {
"filter" : { "term": { "automata_status.keyword": "Completed" } },
"aggs" : {
"sum1" : { "sum": { "field" : "p_manual_exe_time" } },
"sum2" : { "sum": { "field" : "a_actual_exe_time" } },
"effort_saved": {
"bucket_selector": {
"buckets_path": {
"var1": "sum1",
"var2": "sum2"
},
"script": "params.var1 - params.var2"
}
}
}
}
}
}}

the error I am receiving is:

{
"error": {
"root_cause": [],
"type": "search_phase_execution_exception",
"reason": "",
"phase": "fetch",
"grouped": true,
"failed_shards": [],
"caused_by": {
"type": "class_cast_exception",
"reason": "org.elasticsearch.search.aggregations.bucket.filter.InternalFilter cannot be cast to org.elasticsearch.search.aggregations.InternalMultiBucketAggregation"
}
},
"status": 503
}

@polyfractal
Copy link
Contributor

@biji-padhy Known limitation, unfortunately. See: #14600

Usually you can get around this by using a filters agg instead of filter. Irritating but it's a quirk of how the framework works at the moment :(

@polyfractal polyfractal added v7.2.0 and removed v7.0.0 labels Apr 9, 2019
@jakelandis jakelandis added v7.3.0 and removed v7.2.0 labels Jun 17, 2019
@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@martijnvg
Copy link
Member

martijnvg commented Sep 15, 2022

The plan is still to address the problem described in the description of this issue by catching the problem at parse time and returning a meaningful error (instead of letting the class cast error happen at execution time). Just like Colin has described in his comment.

I am also facing similar issue. Pasting my code here.. it will a great help if someone can help me out. Thanks in advance.

@biji-padhy This is another issue than is described in the description of the issue. But I agree it is similar. This relates to #90076 also and once this issue has been addressed then that should fix the class cast exception that you've reported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests