Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Add initial support for dynamic config overriding #148

Merged
merged 5 commits into from
Jul 31, 2020
Merged

Conversation

ktkrg
Copy link
Contributor

@ktkrg ktkrg commented Jul 20, 2020

Issue #, if available:
Fixes #147
Description of changes:

This PR will fail to build. It is contingent on the reader PR: opendistro-for-elasticsearch/performance-analyzer-rca#294

We want to support turning on/off various components like rcas, deciders, and actions dynamically.

This PR contains the following changes to accomplish the task

  • Add a new cluster setting that will keep track of what settings are overridden from their default value as specified in the rca.conf
  • Add a new API _opendistro/_performanceanalyzer/override/cluster/config
    • Request body looks like this:
{
  "enable": {
     "rcas": ["rca1", "rca2"],
     "deciders": ["dec1", "dec2"],
     "actions": ["act1", "act2"]
   },
  "disable": {
     "rcas": ["rca3", "rca4"],
     "deciders": ["dec3", "dec4"],
     "actions": ["act3", "act4"]
   },
}

We merge the overrides from the request with what was already present in the cluster setting and write the merged value in the NodeDetails collector for the reader process to consume.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ktkrg ktkrg requested review from yojs and khushbr July 20, 2020 18:28
@ktkrg ktkrg self-assigned this Jul 20, 2020
clusterSettingsManager = new ClusterSettingsManager(
Arrays.asList(PerformanceAnalyzerClusterSettings.COMPOSITE_PA_SETTING,
PerformanceAnalyzerClusterSettings.PA_NODE_STATS_SETTING),
Collections.singletonList(PerformanceAnalyzerClusterSettings.CONFIG_OVERRIDES_SETTING));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if there is any documentation change we need for adding this new setting in cluster settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add it to the readme in the reader repo.

Comment on lines 168 to 174
configOverridesClusterSettingHandler = new ConfigOverridesClusterSettingHandler(configOverridesWrapper, clusterSettingsManager,
PerformanceAnalyzerClusterSettings.CONFIG_OVERRIDES_SETTING);
clusterSettingsManager.addSubscriberForStringSetting(PerformanceAnalyzerClusterSettings.CONFIG_OVERRIDES_SETTING,
configOverridesClusterSettingHandler);
perfAnalyzerClusterSettingHandler = new PerformanceAnalyzerClusterSettingHandler(
performanceAnalyzerController,
clusterSettingsManager);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we keep the indention consistent, either have function parameters starting next line over in same line as function name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it now.

final List<ClusterSettingListener<String>> listeners = stringSettingListenerMap.get(setting);
if (listeners != null) {
for (ClusterSettingListener<String> listener : listeners) {
LOG.error("Calling string listener with value: {}", settingValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to use DEBUG as the log level here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the log, I was using it only for verifying my changes.

@@ -0,0 +1,176 @@
package com.amazon.opendistro.elasticsearch.performanceanalyzer.http_action.config;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing License Information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you update the year to 2020.

return true;
}

// Check if any RCA nodes are present in both enabled and disabled lists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also check that the values provided are string type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deserialize fails if the provided values are not string types.

@ktkrg ktkrg requested a review from khushbr July 27, 2020 17:44
Copy link
Contributor

@khushbr khushbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor comment, please take care of it while pushing the change.

@@ -0,0 +1,176 @@
package com.amazon.opendistro.elasticsearch.performanceanalyzer.http_action.config;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you update the year to 2020.

Copy link
Contributor

@yojs yojs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a label to the PR.

I think we won't be able to validate the RCA list they provided in the API is correct (spelling mistakes etc.). So, we should just call it out in the README/doc change.

combinedEnabled.addAll(addList);
combinedEnabled.removeAll(removeList);

return ImmutableList.copyOf(combinedEnabled);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more efficient to use a SetBuilder instead of having to copy the hashSet over again in the end.

@ktkrg ktkrg added the enhancement New feature or request label Jul 29, 2020
@ktkrg ktkrg merged commit dd7f58a into master Jul 31, 2020
@ktkrg ktkrg deleted the ktkrg-en-disable branch July 31, 2020 01:48
sruti1312 added a commit that referenced this pull request Aug 24, 2020
* Fix NullPointerException when PA starts collecting metrics before master node is fully up

* Fix unit tests on Mac. Fix NPE during MasterServiceEventMetrics collection.

* Reorder imports, refactor unit tests

* add RFC for RCA

* Create README.md

* Split Elasticsearch version independent code (#75)

This commit has 3 major changes -

1) Performance Analyzer code that is Elasticsearch version independent.
3) Performance optimization in the Elasticsearch plugin to emit events
   to a single event log file. This brings down CPU utilization by
   an order of magnitude on large clusters.

* Update Performance Analyzer to support ElasticSearch version 7.3.2

* This commit merges some of the fixes and features that should have been in the split version of PA.

Features and fixes introduced in this PR:

    Allow performance analyzer to be en(dis)abled through a cluster setting across the cluster.
    Allow logging to be controllable through the cluster setting
    Capture a node's role along with the host address and the node name
    Checkstyle compliance

Some issues that are still not addressed in this PR:

    Update build scripts to start the agent from the reader location instead of the plugin location
    Remove pa_config, pa_bin and other folders that are already present in the reader

* Adding the dependency on the renamed jar performanceanalyzer-rca from performanceanalyzer

* Delete Unused test class

Remove NewFormatProcessorTest class which is not used.

* Adding shardsPerCollection REST API to update the shards Per Collection in node stats collector (#83)

* Update gradle wrapper

* Add isMasterNode to NodeDetailsStatus (#84)

* make the unit test backward compatible with the isMasterNode in NodeDetailsStatus

* Create gradle.yml (#87)

* Create gradle.yml

* Update gradle.yml

* Update gradle.yml

* Added the bouncy castle jars

* Added the licenses file

* Add cd.yml and enable CD pipeline to upload artifact to S3 (#90)

* add cd.yml

* upgrade ospackage version to 8.2.0

* change s3

* update cd.yml

* removing the -i flag

* fix a bug in StatsTests.java (#97)

* Update CONTRIBUTORS.md

* Update CONTRIBUTORS.md

* We must handle all exceptions while intercepting ES requests (#99)

* Making sure that we don't throw exceptions while intercepting ES requests

PerformanceAnalyzer intercepts various ES request paths toget detailed metrics. But today if we throw an exception, then it will bubble all the way upto ES and fail the request.

* Addressing the PR comments

* Updating the .gitignore

* style changes

* Adding Shard Size Metric as a part of Node Stats (#101)

* Adding Shard Size Metric as a part of Node Stats

* removing the -i flag

* fix a bug in StatsTests.java (#97)

* Update CONTRIBUTORS.md

* Update CONTRIBUTORS.md

* Adressing Typos

Co-authored-by: Aditya Jindal <[email protected]>
Co-authored-by: Joydeep Sinha <[email protected]>
Co-authored-by: Ruizhen Guo <[email protected]>
Co-authored-by: Balaji <[email protected]>

* collect queue latency metric in PerformanceAnalyzer (#111)

Authored-By: rguo-aws

* Remove unnecessary string formatting (#112)

* Odfe it framework release (#107)

* ODFE IT Framework POC

* Testing to see if Dockerstuff is set up

* Modified workflow to set DOCKER_COMPOSE_LOCATION

* Modify workflow to include stacktrace and no symbolic linkage

* Set DOCKER_COMPOSE_LOCATION using set-env

* Try set-env in a different location

* Attempt to fix docker-compose set-env

* Make workflow set vm.max_map_count

* Use sudo when setting vm.max_map_count

* Make performance-analyzer execute integration tests on checkin

* Clean up PerformanceAnalyzerIT and build.gradle script

* Add newline to end of gradle.properties file

* Modify gradle.yml and checkMetrics

* Fix ObjectMapper allocation and move TestUtils definition

Co-authored-by: Sid Narayan <[email protected]>

* Add github badges (#114)

* Add github badges

* Add github badges

* Run integ tests as part of git workflow instead of build (#115)

This commit makes it so that you can build PA without running
integration tests. This is useful for many reasons, including being able
to build RCA without depending on PA's integration tests and
dramatically reducing build times.

Integration Testing has been added as part of the Github Actions
workflow

* Fixup ITs and binding issues (#119)

This commit makes IT execution much more robust and only executes ITs if
the user passes the -Dtests.enableIT flag to the Gradle environment.

This commit also ensures that we bind to all interfaces when we spin up
a local Docker cluster for testing.

* collect queue capacity on writer (#118)

* Pa build fix (#122)

* Remove * junit import

* Fix logger usage

* Ignore JsonKeyTests

* Remove sed operation from build.gradle

The sed logic is now baked into the Dockerfile in
performance-analyzer-rca so it's no longer necessary here.

* Restore JsonKeyTests

* PA will no longer crash when SecurityManager says no (#113)

* PA will no longer crash when SecurityManager says no

PA attempts to set the default SSL Socket Factory (which defines rules
for the SSL Sockets it creates) as well as default hostname verification
rules when it is initialized by the Elasticsearch plugin loader.

However, this behavior would result in an AccessControlException when
run alongside the opendistro-security plugin. This commit is a simple
fix which allows these two plugins to work together.

* Update logging to WARN level

* Calculate rejection increase and emit the delta increase of rejection as metric  (#124)

* Enable spotbugs, address spotbug warnings (#126)

* Fix cluster state when pa is enabled from controller (#125)

* Fix cluster state when pa is Enabled from controller

* Add license info

* Move PA files to subdir owned by elasticsearch user (#146)

* IT improvements (#143)

* Use true/false instead of null/present for integTest props

integTest is a gradle task which runs our integration tests. It uses
system properties like -Dtests.useDockerCluster to decide whether or not
to perform certain actions like spinning up a docker cluster for
testing. The task would previously perform the property's action if the
property was present.

This commit makes the integTest task only execute a system property
action if that property is set to "true"

* Make IT port number configurable

The PerformanceAnalyzerIT class previously assumed that the Performance
Analyzer Webservice would always be listening on port 9600 for any
deployment of PerformanceAnalyzer.

Since this isn't always the case, this commit makes the port number
configurable through a gradle property.

* Allow logging to be enabled for ensurePaAndRcaEnabled

* cache max size metric collector (#145)

* Adding changes to collect Cache Max Size metric

* Updating the Cache Max Size Dimension to use toString (#153)

* Fixing checkstyle build failure (#158)

* Add an IT which verifies that the RCA REST endpoint can be queried (#157)

* Add an IT which verifies that the RCA REST endpoint can be queried

* Add try-catch to handle 404 exceptions

* Add initial support for dynamic config overriding (#148)

* Add initial support for dynamic config overriding

* Use helper to serialize/deserialize instead of the wrapper

* Add licence header to new files

* Update licence year to 2020

* Node collector split (#162)

Node Collector split is created based on the metrics which are required for all the shards on the node and other which can be collected on a few number of shards per iteration.

Built the Jar from this Patch and applied on the AES cluster. The Cache related metrics which should be collected for all the shards irrespective of the value of shardsPerCollection value are getting collected.

Tested with a zero value of this parameter (shardsPerCollection).

* Use the correct ctor for NodeDetailsCollector (#166)

* Use the correct ctor for NodeDetailsCollector

* Check for null ConfigOverrides wrapper while appending timestamps

* Add unit test for null cluster setting (#167)

* Use the correct ctor for NodeDetailsCollector

* Check for null ConfigOverrides wrapper while appending timestamps

* Add unit test for null cluster setting for config overrides

* Split capacity/latency collecting logic into separate try/catch block (#168)

* Update PULL_REQUEST_TEMPLATE.md

* Fix invalid cluster state (#172)

* Fix invalid cluster state

* Address PR comments

* Skip RCA tests when building PA in Github workflows (#177)

* Build against elasticsearch 7.9 and resolve dependency conflicts

* Add licenses for dependencies

* Add licenses for dependencies

* Change minor version

* Add release notes and contributors

* Changed licenses

* Modify github workflows

* Fix merge conflicts

* Fix jarHell around log4j

Co-authored-by: Karthik Kumarguru <[email protected]>
Co-authored-by: Karthik Kumarguru <[email protected]>
Co-authored-by: Partha Kanuparthy <[email protected]>
Co-authored-by: Partha Kanuparthy <[email protected]>
Co-authored-by: Adithya Chandra <[email protected]>
Co-authored-by: Venkata Jyothsna Donapati <[email protected]>
Co-authored-by: Palash Hedau <[email protected]>
Co-authored-by: Joydeep Sinha <[email protected]>
Co-authored-by: Balaji <[email protected]>
Co-authored-by: khushbr <[email protected]>
Co-authored-by: Chandra <[email protected]>
Co-authored-by: Pardeep Singh <[email protected]>
Co-authored-by: Ruizhen <[email protected]>
Co-authored-by: Joydeep Sinha <[email protected]>
Co-authored-by: Joydeep Sinha <[email protected]>
Co-authored-by: Ruizhen Guo <[email protected]>
Co-authored-by: Aditya Jindal <[email protected]>
Co-authored-by: Aditya Jindal <[email protected]>
Co-authored-by: Ricardo L. Stephen <[email protected]>
Co-authored-by: Sid Narayan <[email protected]>
Co-authored-by: Sid Narayan <[email protected]>
Co-authored-by: Peter Zhu <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support dynamic enable/disable of rca, decider, action nodes.
3 participants