-
Notifications
You must be signed in to change notification settings - Fork 73
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Support for Handling Missing Data in Anomaly Detection
This PR introduces enhanced handling of missing data, giving customers the flexibility to choose how to address gaps in their data. Options include ignoring missing data (default behavior), filling with fixed values (customer-specified), zeros, or previous values. These options can improve recall in anomaly detection scenarios. For example, in this forum discussion https://forum.opensearch.org/t/do-missing-buckets-ruin-anomaly-detection/16535, customers can now opt to fill missing values with zeros to maintain detection accuracy. Key Changes: 1. Enhanced Missing Data Handling: Changed to ThresholdedRandomCutForest.process(double[] inputPoint, long timestamp, int[] missingValues) to support missing data in both real-time and historical analyses. The preview mode remains unchanged for efficiency, utilizing existing linear imputation techniques. (See classes: ADColdStart, ModelColdStart, ModelManager, ADBatchTaskRunner). 2. Refactoring Imputation & Processing: Refactored the imputation process, failure handling, statistics collection, and result saving in Inferencer. 3. Improved Imputed Value Reconstruction: Reconstructed imputed values using existing mean and standard deviation, ensuring they are accurately stored in AnomalyResult. Added a featureImputed boolean tag to flag imputed values. (See class: AnomalyResult). 4. Broadcast Support for HC Detectors: Added a broadcast mechanism for HC detectors to identify entity models that haven’t received data in a given interval. This ensures models in memory process all relevant data before imputation begins. Single stream detectors handle this within existing transport messages. (See classes: ADHCImputeTransportAction, ADResultProcessor, ResultProcessor). 5. Introduction of ActionListenerExecutor: Added ActionListenerExecutor to wrap response and failure handlers in an ActionListener, executing them asynchronously using the provided ExecutorService. This allows us to handle responses in the AD thread pool. Testing: Comprehensive testing was conducted, including both integration and unit tests. Of the 7135 lines added and 1683 lines removed, 4926 additions and 749 deletions are in tests, ensuring robust coverage. Signed-off-by: Kaituo Li <[email protected]>
- Loading branch information
Showing
139 changed files
with
7,139 additions
and
1,683 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
|
||
package org.opensearch.ad.ml; | ||
|
||
import static org.opensearch.timeseries.TimeSeriesAnalyticsPlugin.AD_THREAD_POOL_NAME; | ||
|
||
import org.opensearch.ad.caching.ADCacheProvider; | ||
import org.opensearch.ad.caching.ADPriorityCache; | ||
import org.opensearch.ad.indices.ADIndex; | ||
import org.opensearch.ad.indices.ADIndexManagement; | ||
import org.opensearch.ad.model.AnomalyResult; | ||
import org.opensearch.ad.ratelimit.ADCheckpointWriteWorker; | ||
import org.opensearch.ad.ratelimit.ADColdStartWorker; | ||
import org.opensearch.ad.ratelimit.ADSaveResultStrategy; | ||
import org.opensearch.threadpool.ThreadPool; | ||
import org.opensearch.timeseries.ml.Inferencer; | ||
import org.opensearch.timeseries.stats.StatNames; | ||
import org.opensearch.timeseries.stats.Stats; | ||
|
||
import com.amazon.randomcutforest.parkservices.ThresholdedRandomCutForest; | ||
|
||
public class ADInferencer extends | ||
Inferencer<ThresholdedRandomCutForest, AnomalyResult, ThresholdingResult, ADIndex, ADIndexManagement, ADCheckpointDao, ADCheckpointWriteWorker, ADColdStart, ADModelManager, ADSaveResultStrategy, ADPriorityCache, ADColdStartWorker> { | ||
|
||
public ADInferencer( | ||
ADModelManager modelManager, | ||
Stats stats, | ||
ADCheckpointDao checkpointDao, | ||
ADColdStartWorker coldStartWorker, | ||
ADSaveResultStrategy resultWriteWorker, | ||
ADCacheProvider cache, | ||
ThreadPool threadPool | ||
) { | ||
super( | ||
modelManager, | ||
stats, | ||
StatNames.AD_MODEL_CORRUTPION_COUNT.getName(), | ||
checkpointDao, | ||
coldStartWorker, | ||
resultWriteWorker, | ||
cache, | ||
threadPool, | ||
AD_THREAD_POOL_NAME | ||
); | ||
} | ||
|
||
} |
Oops, something went wrong.