New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add queue for cold entities #64

Closed

kaituo wants to merge 3 commits into opensearch-project:main from kaituo:coldEntityQueue

Collaborator

kaituo commented May 24, 2021 •

edited

Loading

Note: since there are a lot of dependencies, I only list the main class and test code to save reviewers' time. The build will fail due to missing dependencies. I will use that PR just for review. will not merge it. Will have a big one in the end and merge once after all review PRs get approved. Now the code is missing unit tests. Posting PRs now to meet the cutoff date (June 1). Will add unit tests, run performance tests, and fix bugs before the official release.

Description

In HCAD v1, only a subset of hot entities can use their models to predict if we are short of memory. In v2, we allocate a queue to absorb cold entities. Like hot entities, we load a cold entity's model checkpoint from disk, train models if the checkpoint is not found, query for missed features to complete a shingle, use the models to check whether the incoming feature is normal, update models, and save the detection results to disks.

This PR adds the implementation of cold entity queue. Implementation-wise, we reuse the queues we have developed for hot entities. The differences are we release cold entity requests in a controlled pace. Also, cold entity requests’ priority is low. So only when there are no hot entities requests to process we are gonna process cold entity requests.

Testing done:

Manual tests using 10 HCAD detectors and 12,000 entities in a 3 node cluster.

Issues Resolved

[List any issues this PR will resolve]

Check List

[ Y ] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.


          Add queue for cold entities

a9e5794

In HCAD v1, only a subset of hot entities can use their models to predict if we are short of memory. In v2, we allocate a queue to absorb cold entities. Like hot entities, we load a cold entity's model checkpoint from disk, train models if the checkpoint is not found, query for missed features to complete a shingle, use the models to check whether the incoming feature is normal, update models, and save the detection results to disks.

This PR adds the implementation of cold entity queue. Implementation-wise, we reuse the queues we have developed for hot entities.  The differences are we release cold entity requests in a controlled pace.  Also, cold entity requests’ priority is low.  So only when there are no hot entities requests to process we are gonna process cold entity requests.

Testing done:
1. Manual tests using 10 HCAD detectors and 12,000 entities in a 3 node cluster.

kaituo requested review from jmazanec15 and weicongs-amazon

May 24, 2021 22:47

ylwu-amzn reviewed

View reviewed changes

src/main/java/org/opensearch/ad/ratelimit/ColdEntityQueue.java Outdated

+              public class ColdEntityQueue extends RateLimitedQueue<EntityFeatureRequest> {
+                  private static final Logger LOG = LogManager.getLogger(ColdEntityQueue.class);
+                  private int batchSize;

Collaborator

ylwu-amzn May 26, 2021

Add volatile?

Collaborator Author

kaituo May 26, 2021

added

ylwu-amzn reviewed

View reviewed changes

src/main/java/org/opensearch/ad/ratelimit/ColdEntityQueue.java Outdated

+                  public ColdEntityQueue(
+                      long heapSizeInBytes,
+                      int singleRequestSizeInBytes,
+                      Setting<Float> maxHeapPercentForQueueSetting,

Collaborator

ylwu-amzn May 26, 2021

Why not get this setting like line 109 ?

Collaborator Author

kaituo May 26, 2021

I did it in the super class to reduce redundant code.

ylwu-amzn reviewed

View reviewed changes

src/main/java/org/opensearch/ad/ratelimit/ColdEntityQueue.java Outdated

+                  private int batchSize;
+                  private final CheckpointReadQueue checkpointReadQueue;
+                  private boolean scheduled;

Collaborator

ylwu-amzn May 26, 2021

What does scheduled mean? Add some comments?

Collaborator Author

kaituo May 26, 2021

added comment "// indicate whether a future pull over cold entity queues is scheduled"

ylwu-amzn reviewed

View reviewed changes

src/main/java/org/opensearch/ad/ratelimit/ColdEntityQueue.java Outdated

+                      try {
+                          threadPool.schedule(this::pullRequests, delay, AnomalyDetectorPlugin.AD_THREAD_POOL_NAME);
+                      } catch (Exception e) {
+                          LOG.error("Fail to schedule cold entity pulling", e);

Collaborator

ylwu-amzn May 26, 2021

What will happen if fail to schedule? Seems we will just poll part of requests and put into checkpointReadQueue and ignore other requests if fail to schedule.

Collaborator Author

kaituo May 26, 2021

I am logging failures. threadpool.schedule is the basic API provided by Opensearch. If it fails, I don't have fallback as sth is fundamentally wrong.

jmazanec15 reviewed

View reviewed changes

src/main/java/org/opensearch/ad/ratelimit/ColdEntityQueue.java Outdated Show resolved Hide resolved

src/main/java/org/opensearch/ad/ratelimit/ColdEntityQueue.java Outdated Show resolved Hide resolved

src/main/java/org/opensearch/ad/ratelimit/ColdEntityQueue.java Outdated Show resolved Hide resolved

src/main/java/org/opensearch/ad/ratelimit/ColdEntityQueue.java Outdated Show resolved Hide resolved

kaituo added 2 commits

May 26, 2021 12:20


          add comment, volatile keyword

113634e


          added an extra filtering to safeguard we only send low priority requests

5d48e29

jmazanec15 approved these changes

View reviewed changes

kaituo requested a review from jngz-es

June 1, 2021 18:36

jngz-es approved these changes

View reviewed changes

kaituo closed this

kaituo mentioned this pull request

multi-category support, rate limiting, and pagination #121

Merged

kaituo added a commit that referenced this pull request


          multi-category support, rate limiting, and pagination (#121)

ea22d59

This PR is a conglomerate of the following PRs.

#60
#64
#65
#67
#68
#69
#70
#71
#74
#75
#76
#77
#78
#79
#82
#83
#84
#92
#94
#93
#95
kaituo#1
kaituo#2
kaituo#3
kaituo#4
kaituo#5
kaituo#6
kaituo#7
kaituo#8
kaituo#9
kaituo#10

This spreadsheet contains the mappings from files to PR number (bug fix in my AD fork and tests are not included):
https://gist.github.com/kaituo/9e1592c4ac4f2f449356cb93d0591167

ohltyler pushed a commit to ohltyler/anomaly-detection-2 that referenced this pull request


          multi-category support, rate limiting, and pagination (opensearch-pro…

705a2fc

…ject#121)

This PR is a conglomerate of the following PRs.

opensearch-project#60
opensearch-project#64
opensearch-project#65
opensearch-project#67
opensearch-project#68
opensearch-project#69
opensearch-project#70
opensearch-project#71
opensearch-project#74
opensearch-project#75
opensearch-project#76
opensearch-project#77
opensearch-project#78
opensearch-project#79
opensearch-project#82
opensearch-project#83
opensearch-project#84
opensearch-project#92
opensearch-project#94
opensearch-project#93
opensearch-project#95
kaituo#1
kaituo#2
kaituo#3
kaituo#4
kaituo#5
kaituo#6
kaituo#7
kaituo#8
kaituo#9
kaituo#10

This spreadsheet contains the mappings from files to PR number (bug fix in my AD fork and tests are not included):
https://gist.github.com/kaituo/9e1592c4ac4f2f449356cb93d0591167

ohltyler pushed a commit to ohltyler/anomaly-detection-2 that referenced this pull request


          multi-category support, rate limiting, and pagination (opensearch-pro…

c9fe262

…ject#121)

This PR is a conglomerate of the following PRs.

opensearch-project#60
opensearch-project#64
opensearch-project#65
opensearch-project#67
opensearch-project#68
opensearch-project#69
opensearch-project#70
opensearch-project#71
opensearch-project#74
opensearch-project#75
opensearch-project#76
opensearch-project#77
opensearch-project#78
opensearch-project#79
opensearch-project#82
opensearch-project#83
opensearch-project#84
opensearch-project#92
opensearch-project#94
opensearch-project#93
opensearch-project#95
kaituo#1
kaituo#2
kaituo#3
kaituo#4
kaituo#5
kaituo#6
kaituo#7
kaituo#8
kaituo#9
kaituo#10

This spreadsheet contains the mappings from files to PR number (bug fix in my AD fork and tests are not included):
https://gist.github.com/kaituo/9e1592c4ac4f2f449356cb93d0591167

ohltyler pushed a commit that referenced this pull request


          multi-category support, rate limiting, and pagination (#121)

f65d070

This PR is a conglomerate of the following PRs.

#60
#64
#65
#67
#68
#69
#70
#71
#74
#75
#76
#77
#78
#79
#82
#83
#84
#92
#94
#93
#95
kaituo#1
kaituo#2
kaituo#3
kaituo#4
kaituo#5
kaituo#6
kaituo#7
kaituo#8
kaituo#9
kaituo#10

This spreadsheet contains the mappings from files to PR number (bug fix in my AD fork and tests are not included):
https://gist.github.com/kaituo/9e1592c4ac4f2f449356cb93d0591167

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet