-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bucket path name resolution fails with siblings and child aggregations with the same name #30608
Comments
Pinging @elastic/es-search-aggs |
@colings86 do you have an idea whether this is something that should be supported or a usage problem? |
@ruizmarc This does indeed looks like a bug at first glance. Do you have the full stack trace for the ClassCastException from your server logs? |
Thanks for your quick response :) Sure, here you have the stack trace:
|
@ruizmarc I hope you don't mind but I re-formatted your stack trace a bit to make it a bit easier to read |
I see the bug. It's not just when the aggs are the same name, but when they line up in just the correct manner:
It blows up because we overwrite the bucket path with the sublist. So when we start iterating, we match the agg in A, sublist the path and recurse down. But when the loop comes back around to check agg C, the sublisted path now matches because it went from The fix should be pretty straightforward. I'll work something up. Thanks for the bug report @ruizmarc! |
When processing a top-level sibling pipeline, we destructively sublist the path by assigning back onto the same variable. But if aggs are specified such: A. Multi-bucket agg in the first entry of our internal list B. Regular agg as the immediate child of the multi-bucket in A C. Regular agg with the same name as B at the top level, listed as the second entry in our internal list D. Finally, a pipeline agg with the path down to B We'll get class cast exception. The first agg will sublist the path from [A,B] to [B], and then when we loop around to check agg C, the sublisted path [B] matches the name of C and it fails. The fix is simple: we just need to store the sublist in a new object so that the old path remains valid for the rest of the aggs in the loop Closes elastic#30608
I'm glad it helped! Thanks for your quick fix, it will be very helpful! 💯 |
When processing a top-level sibling pipeline, we destructively sublist the path by assigning back onto the same variable. But if aggs are specified such: A. Multi-bucket agg in the first entry of our internal list B. Regular agg as the immediate child of the multi-bucket in A C. Regular agg with the same name as B at the top level, listed as the second entry in our internal list D. Finally, a pipeline agg with the path down to B We'll get class cast exception. The first agg will sublist the path from [A,B] to [B], and then when we loop around to check agg C, the sublisted path [B] matches the name of C and it fails. The fix is simple: we just need to store the sublist in a new object so that the old path remains valid for the rest of the aggs in the loop Closes #30608
When processing a top-level sibling pipeline, we destructively sublist the path by assigning back onto the same variable. But if aggs are specified such: A. Multi-bucket agg in the first entry of our internal list B. Regular agg as the immediate child of the multi-bucket in A C. Regular agg with the same name as B at the top level, listed as the second entry in our internal list D. Finally, a pipeline agg with the path down to B We'll get class cast exception. The first agg will sublist the path from [A,B] to [B], and then when we loop around to check agg C, the sublisted path [B] matches the name of C and it fails. The fix is simple: we just need to store the sublist in a new object so that the old path remains valid for the rest of the aggs in the loop Closes #30608
It seems like I stumbled across the same bug, so I backported the fix to 5.6.8 to check. But unfortunatly the fix isn't working for me:
The query is:
|
Hey @jmuscireum, I believe your running into a triplet of unrelated issues that constrain pipeline aggs right now. First, You can "workaround" it by using Finally, and I'm not sure there's an issue for this, but pipeline aggs can't aggregate across multiple levels of terms aggregations easily. You can sometimes work around it by "proxy'ing" the value out of the terms agg with an intermediate pipeline agg (e.g. a min_bucket on the same level as the terms, to "roll up" the value at that level, then another min_bucket at a higher level to roll up all the previous min_buckets). But you can't normally use one pipeline agg to aggregate across multiple levels of terms aggs. Sorry for all the bad news... pipeline aggs have some fundamental limitations based on how the framework works. :( |
Hey @polyfractal, thank you for the detailed answer! As a workaround, we can manually fetch the min and max price from the aggregated bucket. But I have the feeling, that there is a way that is more suited for our use case, but I can't think of it. Maybe you have an idea how we can achieve this, without having so many nested aggregations. Our products can have different prices in multiple catalogs. Depending on the user, different catalogs are accessible. So the prices differ from user to user. On our search page every product filter is showing the active filter values and the values that are additionally possible. For this we are using the post filter to filter the products after the filter aggregations were done. That's why Thank you for doing awesome work and have a nice weekend! |
When processing a top-level sibling pipeline, we destructively sublist the path by assigning back onto the same variable. But if aggs are specified such: A. Multi-bucket agg in the first entry of our internal list B. Regular agg as the immediate child of the multi-bucket in A C. Regular agg with the same name as B at the top level, listed as the second entry in our internal list D. Finally, a pipeline agg with the path down to B We'll get class cast exception. The first agg will sublist the path from [A,B] to [B], and then when we loop around to check agg C, the sublisted path [B] matches the name of C and it fails. The fix is simple: we just need to store the sublist in a new object so that the old path remains valid for the rest of the aggs in the loop Closes elastic#30608
Elasticsearch version (
bin/elasticsearch --version
): 6.2.4Plugins installed: No plugins installed
JVM version (
java -version
): java version "1.8.0_25"OS version (
uname -a
if on a Unix-like system): Darwin Kernel Version 17.5.0Description of the problem including expected versus actual behavior:
When trying to use a pipeline aggregation on a date histogram, bucket_path cannot properly be resolved if it is pointing to a child aggregation of the date histogram if a sibling aggregation (pipeline aggregation's sibling) has the same name as the child aggregation of the date histogram. (see example).
I would expect that elasticsearch could properly resolve the path, as it doesn't seem to exist an ambiguation. Maybe I'm wrong with this and it should behave as it is behaving...
Steps to reproduce:
Here you have a simple query example that allows to reproduce the problem
And this is the error that I receive.
"{\"error\":{\"root_cause\":[],\"type\":\"search_phase_execution_exception\",\"reason\":\"\",\"phase\":\"fetch\",\"grouped\":true,\"failed_shards\":[],\"caused_by\":{\"type\":\"class_cast_exception\",\"reason\":\"org.elasticsearch.search.aggregations.bucket.filter.InternalFilter cannot be cast to org.elasticsearch.search.aggregations.InternalMultiBucketAggregation\"}},\"status\":503}"
Any of both aggregations works fine if they are not together (sessionsCount and monthlyAverageSessions). And it also works well if I change the name of the first aggregation (sessionsCount) to a different one (sessions for example). So it looks like a naming problem.
The text was updated successfully, but these errors were encountered: