-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi terms aggregation feature #1629
Comments
When will this feature get added? |
AFAIK nobody is working on this, please contribute / raise your hand if you are. |
SQL/PPL require this feature also. opensearch-project/sql#124. |
Any updates on this feature? |
Working on PR now. Give a quick demo of the feature now and will post first version soon.
|
🥳 🥳 🥳 |
Nice, can't wait for this feature |
1.Performance ImprovementAs described in #2687, multi_terms aggregation is 20x slower than terms aggregation. After profiling, we found that encode is major contributor.
2.Experiments
2.1.Benchmark the average time of hashCode vs Encode.Test with integer value, hashCode() is around 30x faster then encode().
2.2 Test the performance of globalOrdinal and hashCodeDo a POC to verify the idea and test the performance. In general, we are 4x faster then existing multi_terms aggregation implementation.
|
we need to see how to - showcase visualizations and UI changes in visualize |
@penghuo my apologies if I am missing something, but I suspect the idea to use
There are some in depth details in here [1] but we should not rely on hash code uniqueness [2]. [1] https://dzone.com/articles/what-is-wrong-with-hashcode-in-javalangstring |
Good point. Instead of using String.hashcode for string, I think using globalOrdinal for string value. But not verified yet. |
@penghuo this issue is tagged 2.1.0. |
Yes as @joshuali925 mentioned this feature is present in feature 2.1 branch. We will close this issue after 2.1 is released |
Is your feature request related to a problem? Please describe.
I'd like to have an aggregation feature that lets me sort by a number of a document or a metric aggregation on a composite key and get top N results.
This feature was already implemented in the 7.15 version of Elastic Search.
Here you go link to the documentation:
https://www.elastic.co/guide/en/elasticsearch//reference/master/search-aggregations-bucket-multi-terms-aggregation.html
Describe the solution you'd like
It would be perfect to see this feature as a part of the official solution (as a service). Currently, we would have to instantiate (ES 7.15) and maintain it as a container ourselves.
Describe alternatives you've considered
As an alternative, we've considered some custom implementation of this mechanism on the consumer side (our service) but it seems ineffective. Moreover, it duplicates already existing features and has no sense to do it in that way.
The text was updated successfully, but these errors were encountered: