-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pagination to HCAD feature query #76
Conversation
HCAD issues a query to fetch feature data for each entity regularly. HCAD v1 limit the number of entity values returned per query to 1000, both for reasons of efficiency (fetching the overall entity values is expensive and may not finish before the next scheduled query run starts) and for reasons of ease-of-load-shedding (the top 1000 limit, in turn, curtails the memory used to host models, the disk access needed to read/write model checkpoints and anomaly results, the CPU used for entity metadata maintenance and model training/inference, and the garbage collection for deleted models and metadata). Users can increase the maximum entity setting at the cost of a linear growth resource usage, which opens the door to cluster instability. However, such a heuristic “best-effort” limit is detrimental to the scalability of entity monitoring. Even if a user scales out, the number of monitored entities does not increase. This PR changes to use pagination to fetch entities. If there are more than 1000 entities, we will fetch them in the next page. We implement pagination with composite query. Results are decomposed into a set of 1000-entity pages. Each page encapsulates aggregated values for each entity and is sent to model nodes according to the hash ring mapping from entity model Id to a data node. Testing done: 1. Manually tested pagination works.
|
||
public class Page { | ||
// a map from categorical field name to values (type: java.lang.Comparable) | ||
Map<String, Object> afterKey; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the listener need to know the afterKey and source? IMO, the listener should only consume the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, the listener does not need to know the afterkey and source.
.aggregation(composite) | ||
.trackTotalHits(false); | ||
|
||
Page page = new Page(null, searchSourceBuilder, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the iterable fit to this use case? e.g. class IterablePage extends Iterable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You meant https://docs.oracle.com/javase/8/docs/api/java/lang/Iterable.html ? I need parameters in the next method. Does not seem fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to separate page and pageIterator. Please check the new commit. Also, cannot extend JDK interface since the method signature next() does not match my need. I need to return result using async listener callback. Iterator.next() in JDK is a synchronous method.
src/main/java/org/opensearch/ad/feature/CompositeRetriever.java
Outdated
Show resolved
Hide resolved
return null; | ||
} | ||
Aggregation agg = response.getAggregations().get(AGG_NAME_COMP); | ||
if (agg == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using Optional to enforece the calller to handle the null case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, changed to Optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change!
This PR is a conglomerate of the following PRs. #60 #64 #65 #67 #68 #69 #70 #71 #74 #75 #76 #77 #78 #79 #82 #83 #84 #92 #94 #93 #95 kaituo#1 kaituo#2 kaituo#3 kaituo#4 kaituo#5 kaituo#6 kaituo#7 kaituo#8 kaituo#9 kaituo#10 This spreadsheet contains the mappings from files to PR number (bug fix in my AD fork and tests are not included): https://gist.github.com/kaituo/9e1592c4ac4f2f449356cb93d0591167
…ject#121) This PR is a conglomerate of the following PRs. opensearch-project#60 opensearch-project#64 opensearch-project#65 opensearch-project#67 opensearch-project#68 opensearch-project#69 opensearch-project#70 opensearch-project#71 opensearch-project#74 opensearch-project#75 opensearch-project#76 opensearch-project#77 opensearch-project#78 opensearch-project#79 opensearch-project#82 opensearch-project#83 opensearch-project#84 opensearch-project#92 opensearch-project#94 opensearch-project#93 opensearch-project#95 kaituo#1 kaituo#2 kaituo#3 kaituo#4 kaituo#5 kaituo#6 kaituo#7 kaituo#8 kaituo#9 kaituo#10 This spreadsheet contains the mappings from files to PR number (bug fix in my AD fork and tests are not included): https://gist.github.com/kaituo/9e1592c4ac4f2f449356cb93d0591167
…ject#121) This PR is a conglomerate of the following PRs. opensearch-project#60 opensearch-project#64 opensearch-project#65 opensearch-project#67 opensearch-project#68 opensearch-project#69 opensearch-project#70 opensearch-project#71 opensearch-project#74 opensearch-project#75 opensearch-project#76 opensearch-project#77 opensearch-project#78 opensearch-project#79 opensearch-project#82 opensearch-project#83 opensearch-project#84 opensearch-project#92 opensearch-project#94 opensearch-project#93 opensearch-project#95 kaituo#1 kaituo#2 kaituo#3 kaituo#4 kaituo#5 kaituo#6 kaituo#7 kaituo#8 kaituo#9 kaituo#10 This spreadsheet contains the mappings from files to PR number (bug fix in my AD fork and tests are not included): https://gist.github.com/kaituo/9e1592c4ac4f2f449356cb93d0591167
This PR is a conglomerate of the following PRs. #60 #64 #65 #67 #68 #69 #70 #71 #74 #75 #76 #77 #78 #79 #82 #83 #84 #92 #94 #93 #95 kaituo#1 kaituo#2 kaituo#3 kaituo#4 kaituo#5 kaituo#6 kaituo#7 kaituo#8 kaituo#9 kaituo#10 This spreadsheet contains the mappings from files to PR number (bug fix in my AD fork and tests are not included): https://gist.github.com/kaituo/9e1592c4ac4f2f449356cb93d0591167
Note: since there are a lot of dependencies, I only list the main class and test code to save reviewers' time. The build will fail due to missing dependencies. I will use that PR just for review. will not merge it. Will have a big one in the end and merge once after all review PRs get approved. Now the code is missing unit tests. Will add unit tests, run performance tests, and fix bugs before the official release.
Description
HCAD issues a query to fetch feature data for each entity regularly. HCAD v1 limit the number of entity values returned per query to 1000, both for reasons of efficiency (fetching the overall entity values is expensive and may not finish before the next scheduled query run starts) and for reasons of ease-of-load-shedding (the top 1000 limit, in turn, curtails the memory used to host models, the disk access needed to read/write model checkpoints and anomaly results, the CPU used for entity metadata maintenance and model training/inference, and the garbage collection for deleted models and metadata). Users can increase the maximum entity setting at the cost of a linear growth resource usage, which opens the door to cluster instability. However, such a heuristic “best-effort” limit is detrimental to the scalability of entity monitoring. Even if a user scales out, the number of monitored entities does not increase.
This PR changes to use pagination to fetch entities. If there are more than 1000 entities, we will fetch them in the next page. We implement pagination with composite query. Results are decomposed into a set of 1000-entity pages. Each page encapsulates aggregated values for each entity and is sent to model nodes according to the hash ring mapping from entity model Id to a data node.
Testing done:
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.