Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCM/ACM integration #370

Merged
merged 10 commits into from
Jan 10, 2023
Merged

OCM/ACM integration #370

merged 10 commits into from
Jan 10, 2023

Conversation

josecastillolema
Copy link
Contributor

@josecastillolema josecastillolema commented Dec 6, 2022

Support to injecting faults into Open Cluster Management (OCM) and Red Hat Advanced Cluster Management for Kubernetes (ACM) (not tested) managed clusters through ManagedCluster Scenarios.

ManagedCluster scenarios leverage ManifestWorks to inject faults into the managed clusters.

Example of a complete run with the following configuration:

managedcluster_scenarios:
  - actions:                                                        # ManagedCluster chaos scenarios to be injected
    - managedcluster_stop_start_scenario
    managedcluster_name: cluster1                                   # ManagedCluster on which scenario has to be injected; can set multiple names separated by comma
    # label_selector:                                               # When managedcluster_name is not specified, a ManagedCluster with matching label_selector is selected for ManagedCluster chaos scenario injection
    instance_count: 1                                               # Number of managedcluster to perform action/select that match the label selector
    runs: 1                                                         # Number of times to inject each scenario under actions (will perform on same ManagedCluster each time)
    timeout: 420                                                    # Duration to wait for completion of ManagedCluster scenario injection
                                                                    # For OCM to detect a ManagedCluster as unavailable, have to wait 5*leaseDurationSeconds
                                                                    # (default leaseDurationSeconds = 60 sec)
  - actions:
    - stop_start_klusterlet_scenario
    managedcluster_name: cluster1
    # label_selector:
    instance_count: 1
    runs: 1
    timeout: 60
% kind get clusters
cluster1
hub

% kubectx
kind-cluster1
kind-hub   <=======

% kubectl get managedclusters
NAME       HUB ACCEPTED   MANAGED CLUSTER URLS                  JOINED   AVAILABLE   AGE
cluster1   true           https://cluster1-control-plane:6443   True     True        4h59m

% python3 run_kraken.py --config config/config_kubernetes.yaml
:0: UserWarning: You do not have a working installation of the service_identity module: 'No module named 'service_identity''.  Please install it from <https://pypi.python.org/pypi/service_identity> and make sure all of its dependencies are satisfied.  Without the service_identity module, Twisted can perform only rudimentary TLS client hostname verification.  Many valid certificate/hostname mappings may be rejected.
 _              _              
| | ___ __ __ _| | _____ _ __  
| |/ / '__/ _` | |/ / _ \ '_ \ 
|   <| | | (_| |   <  __/ | | |
|_|\_\_|  \__,_|_|\_\___|_| |_|                          

2022-12-06 16:11:32,274 [INFO] Starting kraken
2022-12-06 16:11:32,282 [INFO] Initializing client to talk to the Kubernetes cluster
2022-12-06 16:11:32,430 [INFO] Publishing kraken status at http://0.0.0.0:8081
2022-12-06 16:11:32,430 [INFO] Publishing kraken status at http://0.0.0.0:8081
2022-12-06 16:11:32,433 [INFO] Starting http server at http://0.0.0.0:8081
2022-12-06 16:11:32,433 [INFO] Fetching cluster info
2022-12-06 16:11:32,440 [INFO] Cluster version CRD not detected, skipping
2022-12-06 16:11:32,441 [INFO] Server URL: https://127.0.0.1:56860
2022-12-06 16:11:32,441 [INFO] Generated a uuid for the run: c05cdfc3-f97d-4f3c-a4a3-a7c8ec89494c
2022-12-06 16:11:32,441 [INFO] Daemon mode not enabled, will run through 1 iterations
2022-12-06 16:11:32,441 [INFO] Executing scenarios for iteration 0
2022-12-06 16:11:32,441 [INFO] connection set up
127.0.0.1 - - [06/Dec/2022 16:11:32] "GET / HTTP/1.1" 200 -
2022-12-06 16:11:32,442 [INFO] response RUN
2022-12-06 16:11:32,442 [INFO] Running managedcluster scenarios
2022-12-06 16:11:32,448 [INFO] Starting managedcluster_stop_start_scenario injection
2022-12-06 16:11:32,448 [INFO] Starting managedcluster_stop_scenario injection
2022-12-06 16:11:32,478 [INFO] managedcluster_stop_scenario has been successfully injected!
2022-12-06 16:11:32,478 [INFO] Waiting for the specified timeout: 420
2022-12-06 16:12:50,748 [INFO] Status of managedcluster cluster1: Unavailable
2022-12-06 16:12:50,749 [INFO] Deleting manifestworks
2022-12-06 16:13:00,759 [INFO] Starting managedcluster_start_scenario injection
2022-12-06 16:13:00,788 [INFO] managedcluster_start_scenario has been successfully injected!
2022-12-06 16:13:00,788 [INFO] Waiting for the specified timeout: 420
2022-12-06 16:13:06,815 [INFO] Status of managedcluster cluster1: Available
2022-12-06 16:13:06,815 [INFO] Deleting manifestworks
2022-12-06 16:13:06,822 [INFO] managedcluster_stop_start_scenario has been successfully injected!
2022-12-06 16:13:06,822 [INFO] Waiting for the specified duration: 60
2022-12-06 16:16:07,193 [INFO] Starting stop_start_klusterlet_scenario injection
2022-12-06 16:16:07,193 [INFO] Starting stop_klusterlet_scenario injection
2022-12-06 16:16:07,216 [INFO] stop_klusterlet_scenario has been successfully injected!
2022-12-06 16:16:37,217 [INFO] Deleting manifestworks
2022-12-06 16:16:47,226 [INFO] Starting start_klusterlet_scenario injection
2022-12-06 16:16:47,247 [INFO] start_klusterlet_scenario has been successfully injected!
2022-12-06 16:17:17,249 [INFO] Deleting manifestworks
2022-12-06 16:17:17,261 [INFO] stop_start_klusterlet_scenario has been successfully injected!
2022-12-06 16:17:17,261 [INFO] Waiting for the specified duration: 60 
2022-12-06 16:18:06,854 [INFO] Successfully finished running Kraken. UUID for the run: c05cdfc3-f97d-4f3c-a4a3-a7c8ec89494c. Report generated at /Users/jlema/krkn/kraken.report. Exiting

Signed-off-by: José Castillo Lema <[email protected]>
Signed-off-by: José Castillo Lema <[email protected]>
Signed-off-by: José Castillo Lema <[email protected]>
Signed-off-by: José Castillo Lema <[email protected]>
Signed-off-by: José Castillo Lema <[email protected]>
Copy link
Contributor

@yogananth-subramanian yogananth-subramanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Tested the patch with kind cluster.
Guess the patch is specific to kubernetes and not expected to managed Openshift RHOSAK cluster.

Copy link
Collaborator

@chaitanyaenr chaitanyaenr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice addition @josecastillolema!

@chaitanyaenr
Copy link
Collaborator

@josecastillolema can we rebase when you get time please? We can merge it in once the rebase is done. Thanks.

@josecastillolema
Copy link
Contributor Author

josecastillolema commented Jan 10, 2023

Done @chaitanyaenr . Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants