Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Add memory tracker #258

Closed

Conversation

kaituo
Copy link
Member

@kaituo kaituo commented Oct 14, 2020

Note: since there are a lot of dependencies, I only list the main class and test code to save reviewers' time. The build will fail due to missing dependencies. I will use that PR just for review. will not merge it. Will have a big one in the end and merge once after all review PRs get approved.

Issue #, if available:

Description of changes:

Previously, when creating a model, we evaluate all existing models and compare the total with the 10% heap memory limit. If yes, we proceed to create the model. Otherwise, we throw exceptions. This does not work for multi-entity detectors. First, there can be a lot of models in cache. Reevaluating them every time we want to add a model is not efficient. Second, we have two sources of memory usage now: single-entity and multi-entity detectors. We need a central place to track memory usage across the board as we add more and more kinds of detectors. This PR achieves the purpose.

This PR also updates RCF model size estimation. Previously, we underestimated the size.

This PR also adds threshold model size estimation. Previously, we didn't consider it.

This PR also adds a customized hashmap that can automatically consume and release memory. This enables minimum change to our single-entity code as we just have to replace the map implementation.

Testing done:

  1. added unit tests.
  2. end-to-end testing pass.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@codecov
Copy link

codecov bot commented Oct 14, 2020

Codecov Report

Merging #258 into master will decrease coverage by 0.20%.
The diff coverage is 34.21%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #258      +/-   ##
============================================
- Coverage     73.01%   72.81%   -0.21%     
- Complexity     1461     1464       +3     
============================================
  Files           164      164              
  Lines          6834     6867      +33     
  Branches        527      533       +6     
============================================
+ Hits           4990     5000      +10     
- Misses         1594     1615      +21     
- Partials        250      252       +2     
Flag Coverage Δ Complexity Δ
#cli 79.27% <ø> (ø) 0.00 <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...est/handler/IndexAnomalyDetectorActionHandler.java 51.17% <0.00%> (-0.25%) 26.00 <0.00> (ø)
.../handler/IndexAnomalyDetectorJobActionHandler.java 11.44% <0.00%> (-0.22%) 4.00 <0.00> (ø)
...stroforelasticsearch/ad/model/AnomalyDetector.java 62.06% <35.71%> (-1.96%) 52.00 <0.00> (+1.00) ⬇️
...oforelasticsearch/ad/model/AnomalyDetectorJob.java 58.97% <42.85%> (-2.20%) 24.00 <1.00> (+2.00) ⬇️
...oforelasticsearch/ad/AnomalyDetectorJobRunner.java 76.59% <100.00%> (+0.12%) 35.00 <0.00> (ø)
...ransport/SearchAnomalyDetectorTransportAction.java 77.77% <0.00%> (-22.23%) 2.00% <0.00%> (ø%)

Previously, when creating a model, we evaluate all existing models and compare the total with the 10% heap memory limit.  If yes, we proceed to create the model.  Otherwise, we throw exceptions.  This does not work for multi-entity detectors.  First, there can be a lot of models in cache.  Reevaluating them every time we want to add a model is not efficient.  Second, we have two sources of memory usage now: single-entity and multi-entity detectors.  We need a central place to track memory usage across the board as we add more and more kinds of detectors.  This PR achieves the purpose.

This PR also updates RCF model size estimation.  Previously, we underestimated the size.
This PR also adds threshold model size estimation.  Previously, we didn't consider it.

This PR also adds a customized hashmap that can automatically consume and realese memory.  This enables minimum change to our single-entity code as we just have to replace the map implementation.

Testing done:
1. will add unit tests.
2. end-to-end testing pass.
kaituo added a commit that referenced this pull request Oct 16, 2020
* Add support filtering the data by one categorical variable

This PR is a conglomerate of the following PRs.

#247
#249
#250
#252
#253
#256
#257
#258
#259
#260
#261
#262
#263
#264
#265
#266
#267
#268
#269

This spreadsheet contains the mappings from files to PR number: https://quip-amazon.com/DiHkAmz9oSLu/HC-PR

Testing done:
1. Add unit tests except four classes (excluded in build.gradle). Will add them in the later PR.
2. Manual testing passes.
@kaituo kaituo closed this Oct 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants