Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pagination to HCAD feature query #76

Closed
wants to merge 4 commits into from

Conversation

kaituo
Copy link
Collaborator

@kaituo kaituo commented May 26, 2021

Note: since there are a lot of dependencies, I only list the main class and test code to save reviewers' time. The build will fail due to missing dependencies. I will use that PR just for review. will not merge it. Will have a big one in the end and merge once after all review PRs get approved. Now the code is missing unit tests. Will add unit tests, run performance tests, and fix bugs before the official release.

Description

HCAD issues a query to fetch feature data for each entity regularly. HCAD v1 limit the number of entity values returned per query to 1000, both for reasons of efficiency (fetching the overall entity values is expensive and may not finish before the next scheduled query run starts) and for reasons of ease-of-load-shedding (the top 1000 limit, in turn, curtails the memory used to host models, the disk access needed to read/write model checkpoints and anomaly results, the CPU used for entity metadata maintenance and model training/inference, and the garbage collection for deleted models and metadata). Users can increase the maximum entity setting at the cost of a linear growth resource usage, which opens the door to cluster instability. However, such a heuristic “best-effort” limit is detrimental to the scalability of entity monitoring. Even if a user scales out, the number of monitored entities does not increase.

This PR changes to use pagination to fetch entities. If there are more than 1000 entities, we will fetch them in the next page. We implement pagination with composite query. Results are decomposed into a set of 1000-entity pages. Each page encapsulates aggregated values for each entity and is sent to model nodes according to the hash ring mapping from entity model Id to a data node.

Testing done:

  1. Manually tested pagination works.

Check List

  • [ Y ] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

HCAD issues a query to fetch feature data for each entity regularly. HCAD v1 limit the number of entity values returned per query to 1000, both for reasons of efficiency (fetching the overall entity values is expensive and may not finish before the next scheduled query run starts) and for reasons of ease-of-load-shedding (the top 1000 limit, in turn, curtails the memory used to host models, the disk access needed to read/write model checkpoints and anomaly results, the CPU used for entity metadata maintenance and model training/inference, and the garbage collection for deleted models and metadata). Users can increase the maximum entity setting at the cost of a linear growth resource usage, which opens the door to cluster instability. However, such a heuristic “best-effort” limit is detrimental to the scalability of entity monitoring. Even if a user scales out, the number of monitored entities does not increase.

This PR changes to use pagination to fetch entities.  If there are more than 1000 entities, we will fetch them in the next page. We implement pagination with composite query. Results are decomposed into a set of 1000-entity pages. Each page encapsulates aggregated values for each entity and is sent to model nodes according to the hash ring mapping from entity model Id to a data node.

Testing done:
1. Manually tested pagination works.
@kaituo kaituo requested review from ylwu-amzn, dai-chen and penghuo May 26, 2021 21:44

public class Page {
// a map from categorical field name to values (type: java.lang.Comparable)
Map<String, Object> afterKey;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the listener need to know the afterKey and source? IMO, the listener should only consume the results.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the listener does not need to know the afterkey and source.

.aggregation(composite)
.trackTotalHits(false);

Page page = new Page(null, searchSourceBuilder, null);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the iterable fit to this use case? e.g. class IterablePage extends Iterable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meant https://docs.oracle.com/javase/8/docs/api/java/lang/Iterable.html ? I need parameters in the next method. Does not seem fit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to separate page and pageIterator. Please check the new commit. Also, cannot extend JDK interface since the method signature next() does not match my need. I need to return result using async listener callback. Iterator.next() in JDK is a synchronous method.

return null;
}
Aggregation agg = response.getAggregations().get(AGG_NAME_COMP);
if (agg == null) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using Optional to enforece the calller to handle the null case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, changed to Optional.

Copy link

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change!

@kaituo kaituo closed this Jun 11, 2021
kaituo added a commit that referenced this pull request Jul 12, 2021
This PR is a conglomerate of the following PRs.

#60
#64
#65
#67
#68
#69
#70
#71
#74
#75
#76
#77
#78
#79
#82
#83
#84
#92
#94
#93
#95
kaituo#1
kaituo#2
kaituo#3
kaituo#4
kaituo#5
kaituo#6
kaituo#7
kaituo#8
kaituo#9
kaituo#10

This spreadsheet contains the mappings from files to PR number (bug fix in my AD fork and tests are not included):
https://gist.github.com/kaituo/9e1592c4ac4f2f449356cb93d0591167
ohltyler pushed a commit that referenced this pull request Sep 1, 2021
This PR is a conglomerate of the following PRs.

#60
#64
#65
#67
#68
#69
#70
#71
#74
#75
#76
#77
#78
#79
#82
#83
#84
#92
#94
#93
#95
kaituo#1
kaituo#2
kaituo#3
kaituo#4
kaituo#5
kaituo#6
kaituo#7
kaituo#8
kaituo#9
kaituo#10

This spreadsheet contains the mappings from files to PR number (bug fix in my AD fork and tests are not included):
https://gist.github.com/kaituo/9e1592c4ac4f2f449356cb93d0591167
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants