-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slightly more accurate terms
sorting on sub-aggs
#72684
Comments
Pinging @elastic/es-analytics-geo (Team:Analytics) |
A few of us go to talking about this and figured out we don't really know how often we get into this state. We document that this sort of thing can be an error. We're not super clear about the side effects here. I think we use kind of dense language. I think we can improve that. Anyway! It'd be useful to collect a counter of how often this happens. I'm not sure it's worth the effort right now. We're kind of swamped and we're not sure that we'd really do with the counter. We don't have a good way to fix this. The errors are still unbounded. Even if we bump the shard_size. It'd help. Some. But we don't know how much and it'd absolutely come with performance changes. And, even better, you can do this yourself. Just set the So! I'm going to convert this into a docs issue and just update those docs and leave the rest as it. |
The `terms` agg picks the top `size` terms in a single scatter/gather pass across all the shards. For the default `order` and if you `order` by `_key` this works quite well. Some errors creep in, but it's fairly easy to point to them and understand them. But ordering by doc count ascending is like inviting the error vampire into your agg. It's super easy to get inaccurate results. This updates the docs to be more stark about it. Closes elastic#72684
The `terms` agg picks the top `size` terms in a single scatter/gather pass across all the shards. For the default `order` and if you `order` by `_key` this works quite well. Some errors creep in, but it's fairly easy to point to them and understand them. But ordering by doc count ascending is like inviting the error vampire into your agg. It's super easy to get inaccurate results. This updates the docs to be more stark about it. Closes #72684
The `terms` agg picks the top `size` terms in a single scatter/gather pass across all the shards. For the default `order` and if you `order` by `_key` this works quite well. Some errors creep in, but it's fairly easy to point to them and understand them. But ordering by doc count ascending is like inviting the error vampire into your agg. It's super easy to get inaccurate results. This updates the docs to be more stark about it. Closes elastic#72684
The `terms` agg picks the top `size` terms in a single scatter/gather pass across all the shards. For the default `order` and if you `order` by `_key` this works quite well. Some errors creep in, but it's fairly easy to point to them and understand them. But ordering by doc count ascending is like inviting the error vampire into your agg. It's super easy to get inaccurate results. This updates the docs to be more stark about it. Closes elastic#72684
The `terms` agg picks the top `size` terms in a single scatter/gather pass across all the shards. For the default `order` and if you `order` by `_key` this works quite well. Some errors creep in, but it's fairly easy to point to them and understand them. But ordering by doc count ascending is like inviting the error vampire into your agg. It's super easy to get inaccurate results. This updates the docs to be more stark about it. Closes #72684
…elastic#80223) The `terms` agg picks the top `size` terms in a single scatter/gather pass across all the shards. For the default `order` and if you `order` by `_key` this works quite well. Some errors creep in, but it's fairly easy to point to them and understand them. But ordering by doc count ascending is like inviting the error vampire into your agg. It's super easy to get inaccurate results. This updates the docs to be more stark about it. Closes elastic#72684
…elastic#80223) The `terms` agg picks the top `size` terms in a single scatter/gather pass across all the shards. For the default `order` and if you `order` by `_key` this works quite well. Some errors creep in, but it's fairly easy to point to them and understand them. But ordering by doc count ascending is like inviting the error vampire into your agg. It's super easy to get inaccurate results. This updates the docs to be more stark about it. Closes elastic#72684
… (#80227) The `terms` agg picks the top `size` terms in a single scatter/gather pass across all the shards. For the default `order` and if you `order` by `_key` this works quite well. Some errors creep in, but it's fairly easy to point to them and understand them. But ordering by doc count ascending is like inviting the error vampire into your agg. It's super easy to get inaccurate results. This updates the docs to be more stark about it. Closes #72684
… (#80226) The `terms` agg picks the top `size` terms in a single scatter/gather pass across all the shards. For the default `order` and if you `order` by `_key` this works quite well. Some errors creep in, but it's fairly easy to point to them and understand them. But ordering by doc count ascending is like inviting the error vampire into your agg. It's super easy to get inaccurate results. This updates the docs to be more stark about it. Closes #72684
Right now we default the
shard_size
to1.5*size+10
interms
in an effort to keep the doc count errors lower. When you sort the terms by something other than doc_count descending the error is unbounded. But largershard_size
will lead to more accurate results because we ship more data back to the coordinating node. Maybe we should bump theshard_size
if you are sorting on anything other than doc_count descending to get you more accurate results by default. Sure, it'll be slower, but it's the price you pay for a little more accuracy. And the performance when sorting by the default won't change.The text was updated successfully, but these errors were encountered: