Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

AD opendistro 1.6 support #87

Merged
merged 12 commits into from
Apr 23, 2020

Conversation

vamshin
Copy link
Member

@vamshin vamshin commented Apr 14, 2020

Issue #, if available:
#86

Description of changes:
Make AD plugin compatible for ODFE 1.6 version which uses Elasticsearch OSS 7.6.1 version

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

isSnapshot = "true" == System.getProperty("build.snapshot", "true")
if (System.properties['os.name'].toLowerCase().contains('windows')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing these lines, can we build on both windows and Linux?

Copy link
Member Author

@vamshin vamshin Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we were installing using setupCommand where we used absolute path for file which was different for windows. Now we just need to pass filetree .
Example:-
plugin(fileTree("src/test/resources/job-scheduler").getSingleFile())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great

@@ -120,80 +119,6 @@ thirdPartyAudit.enabled = false
// See package README.md for details on using these tasks.
def _numNodes = findProperty('numNodes') as Integer ?: 1

def getSeedHosts = { int num ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we remove the following, which is used to start and stop a multi-node cluster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was added to to provide hack to install job scheduler plugin on the test clusters. With ES 7.5 onwards Elasticsearch provided a way to install dependent plugins on test cluster.
Now we do not need it. To run multi node cluster, we need to use
./gradlew run -PnumNodes=<numberOfNodesYouWant>

Copy link
Member

@kaituo kaituo Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. Could you keep runSingleNode and stopMultiNode (line 165~196 in the old version) so that I can start node one by one and kill all nodes? Need them for fault tolerance tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created issue to add this back #90

vamshin and others added 4 commits April 14, 2020 21:24
Updated README.md to reflect changes to run multi node cluster
Author: Kaituo Li <[email protected]>
Date:   Wed Apr 15 15:45:13 2020 -0700

    Add state and error to profile API (opendistro-for-elasticsearch#84)

    * Add state and error to profile API

    We want to make it easy for customers and oncalls to identify a detector’s state and error if any. This PR adds such information to our new profile API.

    We expect three kinds of states:
    -Disabled: if get ad job api says the job is disabled;
    -Init: if anomaly score after the last update time of the detector is larger than 0
    -Running: if neither of the above applies and no exceptions.

    Error is populated if error of the latest anomaly result is not empty.

    Testing done:
    -manual testing during a detector’s life cycle: not created, created but not started, started, during initialization, after initialization, stopped, restarted
    -added unit tests to cover above scenario

commit 0c33050
Author: Kaituo Li <[email protected]>
Date:   Tue Apr 14 11:52:20 2020 -0700

    Use callbacks and bug fix (opendistro-for-elasticsearch#83)

    * Use callbacks and bug fix

    This PR includes the following changes:

    1. remove classes that are not needed in jacocoExclusions since we have enough coverage for those classes.
    2. Use ClientUtil instead of Elasticsearch’s client in AD job runner
    3. Use one function to get the number of partitioned forests. Previously, we have redundant code in both ModelManager and ADStateManager.
    4. Change ADStateManager.getAnomalyDetector to use callback.
    5. Change AnomalyResultTransportAction to use callback to get features.
    6. Add in AnomalyResultTransportAction to handle the case where all features have been disabled, and users' index does not exist.
    7. Change get RCF and threshold result methods to use callback and add exception handling of IndexNotFoundException due to the change. Previously, getting RCF and threshold result methods won’t throw IndexNotFoundException.
    8. Remove unused fields in StopDetectorTransportAction and AnomalyResultTransportAction
    9. Unwrap EsRejectedExecutionException as it can be nested inside RemoteTransportException. Previously, we would not recognize EsRejectedExecutionException and thus miss anomaly results write retrying.
    10. Add error in anomaly result schema.11. Fix broken tests due to my changes.

    Testing done:
    1. unit/integration tests pass
    2. do end-to-end testing and make sure my fix achieves the purpose 
       * timeout issue is gone 
       * when all features have been disabled or index does not exist, we will retry a few more times and disable AD jobs.
@vamshin vamshin merged commit 4285d51 into opendistro-for-elasticsearch:development Apr 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants