-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AD model performance benchmark #729
Conversation
71b2d16
to
b4acc8a
Compare
This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <[email protected]>
b4acc8a
to
d67ee59
Compare
@@ -134,7 +134,7 @@ configurations.all { | |||
if (it.state != Configuration.State.UNRESOLVED) return | |||
resolutionStrategy { | |||
force "joda-time:joda-time:${versions.joda}" | |||
force "com.fasterxml.jackson.core:jackson-core:2.13.4" | |||
force "com.fasterxml.jackson.core:jackson-core:2.14.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we maybe just inherit jackson from core?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will check if it is feasible in the main branch PR and merge this branch PR.
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-729-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 c6bf52ced608689a0a10f3a81afaa53738aad5a0
# Push it to GitHub
git push --set-upstream origin backport/backport-729-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x Then, create a pull request where the |
This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported opensearch-project#600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <[email protected]>
This PR adds an AD model performance benchmark so that we can compare model performance across versions. Regarding benchmark data, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions. We also backported #600 so that we can capture the performance data in CI output. Testing done: * added unit tests to run the benchmark. Signed-off-by: Kaituo Li <[email protected]>
Description
This PR adds an AD model performance benchmark so that we can compare model performance across versions.
For the single stream detector, we refactored tests in DetectionResultEvalutationIT and moved it to SingleStreamModelPerfIT.
For the HCAD detector, we randomly generated synthetic data with known anomalies inserted throughout the signal. In particular, these are one/two/four dimensional data where each dimension is a noisy cosine wave. Anomalies are inserted into one dimension with 0.003 probability. Anomalies across each dimension can be independent or dependent. We have approximately 5000 observations per data set. The data set is generated using the same random seed so the result is comparable across versions.
The PR also fixed jackson issue due to https://github.com/opensearch-project/OpenSearch/pull/5105/files. The issue caused run time test failures like
Testing done:
Signed-off-by: Kaituo Li [email protected]
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.