-
Notifications
You must be signed in to change notification settings - Fork 36
AD opendistro 1.6 support #87
AD opendistro 1.6 support #87
Conversation
isSnapshot = "true" == System.getProperty("build.snapshot", "true") | ||
if (System.properties['os.name'].toLowerCase().contains('windows')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After removing these lines, can we build on both windows and Linux?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we were installing using setupCommand
where we used absolute path for file which was different for windows. Now we just need to pass filetree
.
Example:-
plugin(fileTree("src/test/resources/job-scheduler").getSingleFile())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great
@@ -120,80 +119,6 @@ thirdPartyAudit.enabled = false | |||
// See package README.md for details on using these tasks. | |||
def _numNodes = findProperty('numNodes') as Integer ?: 1 | |||
|
|||
def getSeedHosts = { int num -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we remove the following, which is used to start and stop a multi-node cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was added to to provide hack to install job scheduler plugin on the test clusters. With ES 7.5 onwards Elasticsearch provided a way to install dependent plugins on test cluster.
Now we do not need it. To run multi node cluster, we need to use
./gradlew run -PnumNodes=<numberOfNodesYouWant>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. Could you keep runSingleNode and stopMultiNode (line 165~196 in the old version) so that I can start node one by one and kill all nodes? Need them for fault tolerance tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created issue to add this back #90
Updated README.md to reflect changes to run multi node cluster
Author: Kaituo Li <[email protected]> Date: Wed Apr 15 15:45:13 2020 -0700 Add state and error to profile API (opendistro-for-elasticsearch#84) * Add state and error to profile API We want to make it easy for customers and oncalls to identify a detector’s state and error if any. This PR adds such information to our new profile API. We expect three kinds of states: -Disabled: if get ad job api says the job is disabled; -Init: if anomaly score after the last update time of the detector is larger than 0 -Running: if neither of the above applies and no exceptions. Error is populated if error of the latest anomaly result is not empty. Testing done: -manual testing during a detector’s life cycle: not created, created but not started, started, during initialization, after initialization, stopped, restarted -added unit tests to cover above scenario commit 0c33050 Author: Kaituo Li <[email protected]> Date: Tue Apr 14 11:52:20 2020 -0700 Use callbacks and bug fix (opendistro-for-elasticsearch#83) * Use callbacks and bug fix This PR includes the following changes: 1. remove classes that are not needed in jacocoExclusions since we have enough coverage for those classes. 2. Use ClientUtil instead of Elasticsearch’s client in AD job runner 3. Use one function to get the number of partitioned forests. Previously, we have redundant code in both ModelManager and ADStateManager. 4. Change ADStateManager.getAnomalyDetector to use callback. 5. Change AnomalyResultTransportAction to use callback to get features. 6. Add in AnomalyResultTransportAction to handle the case where all features have been disabled, and users' index does not exist. 7. Change get RCF and threshold result methods to use callback and add exception handling of IndexNotFoundException due to the change. Previously, getting RCF and threshold result methods won’t throw IndexNotFoundException. 8. Remove unused fields in StopDetectorTransportAction and AnomalyResultTransportAction 9. Unwrap EsRejectedExecutionException as it can be nested inside RemoteTransportException. Previously, we would not recognize EsRejectedExecutionException and thus miss anomaly results write retrying. 10. Add error in anomaly result schema.11. Fix broken tests due to my changes. Testing done: 1. unit/integration tests pass 2. do end-to-end testing and make sure my fix achieves the purpose * timeout issue is gone * when all features have been disabled or index does not exist, we will retry a few more times and disable AD jobs.
Issue #, if available:
#86
Description of changes:
Make AD plugin compatible for ODFE 1.6 version which uses Elasticsearch OSS 7.6.1 version
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.