Β©2024 Niklaus Hirt / IBM
- Support for 4.8
- Topology Stability
- LAGS stability
- New Elasticsearch instance
Read...
- I'm sure there are errors
- I'm sure it's not complete
- It clearly can be improved
Please contact me if you have feedback or if you find glitches or problems.
- on Slack: @niklaushirt or
- by Mail: [email protected]
βThe installation has been tested on OpenShift 4.16 on:
- OpenShift Cluster (OCP-V) - IBM Cloud (https://techzone.ibm.com/my/reservations/create/66576e78d3aaab001ef9aa8d)
- OpenShift VMWare Cluster - UPI - Deployer - V2 (https://techzone.ibm.com/collection/tech-zone-certified-base-images/journey-pre-installed-software)
But it should work on other Openshift Platforms as well (ROKS, Fyre, ...)
β Those are non-production installations and are suited only for demo and PoC environments. β Please refer to the official IBM Documentation for production ready installations.
The idea of this repo is to provide an optimised, complete, pre-trained π£ Demo-in-a-Box
environments for IBM IT Automation Solutions that are self-contained (e.g. can be deployed in only one cluster)
Details...
It contains the following components (which can be installed independently):
IBM AIOps
IBM AIOps Demo Content (optional)
- OpenLDAP & Register with IBM AIOps
- Runbooks AWX (Open Source Ansible Tower) with preloaded Playbooks and AIOps Runbooks
- AI Models - Load and Train
- Load Training Data (LAGS, SNOW, MET, TG)
- Create Training Definitions (TG, LAGS, CR, SI, MET. Turn off RSA)
- Train Models (TG, LAGS, CR, SI, MET)
- Topology
- Live Demo Apps (RobotShop. SockShop)
- Create IBM AIOps Topology and Applications (RobotShop. SockShop, ACME, London Underground, Telecom FiberCut)
- Dedicated DemoUI that allows you to trigger different scenarios
- Custom Icons (styling and dynamic)
- Configs
- Policies for Incident creation
- Custom Alert View
IBM Concert
IBM Concert Demo Content
IBM Turbonomic
IBM Turbonomic Demo Content
IBM Instana
IBM Instana Demo Content
- It's way faster
- You don't have to install all the tooling locally
- You donβt need a connection to the cluster during the installation (fire and forget)
π€ So this could basically be done from an iPhone or iPad
Basically:
- Get an OpenShift Cluster
- Get your entitlement key/pull token
- Paste the install file into the OpenShift web UI and insert your entitlement key
- Grab a coffe and come back after 2-3 hours depending on the modules you're installing
-
𧨠Troubleshooting
-
π Already have a cluster? Dive right in
- π Demo the Solution
- π€ Demo Setup - Explained
- π¦ Create a custom Scenario
β Prerequisites
-
Get a temporary cluster from Techzone
-
OpenShift Cluster (OCP-V) - IBM Cloud (https://techzone.ibm.com/my/reservations/create/66576e78d3aaab001ef9aa8d)
-
OpenShift VMWare Cluster - UPI - Deployer - V2 (https://techzone.ibm.com/collection/tech-zone-certified-base-images/journey-pre-installed-software)
-
4x worker nodes with 32 CPU / 128 GB β
-
3x worker nodes with 16 CPU / 64 GB for IBM Concertβ
You might get away with less if you don't install some components but no guarantee.
-
-
Create a cluster for
Practice/Self Education
orTest
if you don't have an Opportunity Number (Screenshots are slightly outdated and are different for the different TechZone offerings but the basic choices remain the same) -
Select your preferred Geograpy
-
Select the maximum end date that fits your needs (you can extend the duration once after creation)
-
Select Openshift Storage
-
Storage OCS/ODF Size: 1TB or Managed NFS 2TB
-
OpenShift Version: 4.15 or 4.16
-
-
Select the Cluster Size
- Worker node count: 4
- Flavour: 32 vCPU X 128 GB β
β If you want to install IBM AIOps and Trubonomic you must select 5 x 32 vCPU X 128 GB
-
Click
Submit
-
Once the cluster is provisioned, don't forget to extend it as needed.
You can get the entitlement key (registry pull token) from https://myibm.ibm.com/products-services/containerlibrary.
This allows the images to be pulled from the IBM Container Registry.
β οΈ Important remarks before you start
Those are remarks to feedback and problem reports I got from the field.
Those scripts have been tested thoroughly on different environments and have proven to be VERY reliable.
If you think that you hit a problem:
- If you have provisioned a cluster with
Managed NFS 2TB
and you have Pods in0/0
state verify thenfs-provisioner
Pod is running. If not (this is a bug in Techzone) please apply./tools/98_maintenance/troubleshooting/nfs-provisioner.yaml
. The installation should subsequently continue. If not, please re-run the installer Pod. - Make sure that you have provisioned a cluster with 4 worker nodes with 32 CPU and 128 GB each. If you have Pods in
0/0
state verify theEvents
. If you getNot enough CPU
then delete the cluster and provision the correct size. - If you want to install IBM AIOps and Turbonomic you must select 5 worker nodes with 32 CPU and 128 GB
- The complete installation takes about 1.5 to 8 hours depending on your region where and the platform you deployed to.
- If you see Pods in
CrashLoop
or other error states, try to wait it out (this can be due to dependencies on other componenets that are not ready yet). Chances are that the deployment will eventually go through. If after 8h you are still stuck, ping me.
Here is a quick video that walks you through the installation process
π¦ 2.1.1 What will be installed
This installation contains:
- IBM AIOps
- IBM Catalog
- IBM Operator
- IBM AIOps Instance
- IBM AIOps Demo Content
- OpenLDAP & Register with IBM AIOps
- AWX (Open Source Ansible Tower) with preloaded Playbooks
- AI Models - Load and Train
- Create Training Definitions (TG, LAGS, CR, SI. Turn off RSA)
- Create Training Data (LAGS, SNOW)
- Train Models (TG, LAGS, CR, SI)
- Topology
- RobotShop Demo App
- SockShop Demo App
- ACME Air Demo App
- Create K8s Observer
- Create ASM merge rules
- Load Overlay Topology
- Create IBM AIOps Application
- Misc
- Policies for Incident creation
- Custom Alert View
- Creates valid certificate for Ingress (Slack)
- External Routes (Flink, Topology, ...)
- Disables ASM Service match rule
- Create Policy Creation for Stories and Runbooks
- Demo Service Account
π 2.1.2 Installation Instructions
- In the the OpenShift Web UI click on the
+
sign in the right upper corner - Copy and paste the content from this file
- Replace
<REGISTRY_TOKEN>
at the top of the file with your entitlement key from step 1.1.2 (line 69 - the Entitlement key from https://myibm.ibm.com) - Replace the default Password
global_password: CHANGEME
with a Password of your choice (line 82, β do NOT use the "-" character and do NOT leave empty β) - Accept the license by setting
accept_all_licenses
toTrue
(line 92) - Optionally you can change the name of your Demo Environment
environment_name
to one of the provided characters (line 89) - Click
Create
β If you get a warning (Orange or Red Bar on top) please re-run the installer Pod until you are all green.
π 2.1.3 Follow the installation progress
-
The blue Notification at the top gives you basic information about the running Installation (Name, Version, ...)
You can open and follow the installation logs by clicking on
Open Logs
-
In addition to this, you also have the bottom Notifications that give you the current step of the Installation
-
When the Installation has succeeded, you get the top green Notification bar
You can directly open the DemoUI by clicking on the link or go to the chapter Demo the Solution to learn how to run an efficient demo
And you get this message in the logs
π 2.1.4 Connecting for the first time
To access the demo environment:
-
In the top green Notification bar click on the link to open the DemoUI
-
Login with the provided Password
-
You will find Links and Passwords for all installed components here
- Note the Username and Password
- Click on the blue
IBM AIOps
button - Select
Enterprise LDAP
- Login as User
demo
with the PasswordSelected at installation
and shown in the DemoUI
βIf you are using IBM TechZone Clusters you will get certificate errors when trying to open CP4AIOPS or Turbonomic
β Open the links in a Private/Incognito window and select proceed
β In Chrome you can type
thisisunsafe
when on theYour connection is not private
page. There is no visual feedback but if you type it correctly the page will then load.
β 2.1.5 Post Install
-
In the
IBM AIOps
"Hamburger" Menu selectOperate
/AI Model Management
-
Check that the Training are displayed as follows
-
If any of the trainings (particularely Metric anomaly detection) displays an error, please re-run the training. This is often due to a limit of resources at install time.
-
Open Training definition and check that the problem was a lack of resources
-
Run Training by clicking on
Train Models
-
You should get around 500+ models
β If any of the trainings (particularely Temporal grouping or Metric anomaly detection) displays and error, please re-run the training. This is often due to a limit of resources at install time.
-
In the
IBM AIOps
"Hamburger" Menu selectOperate
/Alerts
-
Click on the
Cog
on the top right corner -
Select
User preferences
-
Select
DEMO Incidents View
for Default view -
Select
DEMO Incidents View
for Default view for alerts in incidents -
Enable
Row Coloring
π Now you're ready to Demo the Solution
π 2.1.6 Detailed Check
β If any of the checks is not right, please refer to Troubleshooting
Check that the green notification bar is displayed as follows
- In the
IBM AIOps
"Hamburger" Menu selectOperate
/AI Model Management
- Check that the Training are displayed as follows
β If any of the trainings (particularely Temporal grouping or Metric anomaly detection) displays and error, please re-run the training. This is often due to a limit of resources at install time.
- In the
IBM AIOps
"Hamburger" Menu selectOperate
/Automations
- Select the
Policies
Tab - Enter
DEMO
into the search field - Check that you have 5 Policies as shown below
- In the
IBM AIOps
"Hamburger" Menu selectOperate
/Automations
- Select the
Runbooks
Tab - Check that you have 4 Runbooks as shown below
- In the
IBM AIOps
"Hamburger" Menu selectOperate
/Automations
- Select the
Actions
Tab -
- Enter
DEMO
into the search field
- Enter
- Check that you have some Actions present as shown below
- In the
IBM AIOps
"Hamburger" Menu selectOperate
/Resource management
- Check that the Applications are displayed as follows
- In the
IBM AIOps
"Hamburger" Menu selectDefine
/Integrations
- Check that the Connections are displayed as follows
β If any of the checks is not right, please refer to Troubleshooting
π©βπ» 2.1.7 Characters to chose from
In the Quick Install file you can also adapt the Name of your Environment (default is Bear
)
environment_name: Bear
You can chose from the following:
- Adam
- Aajla
- AIOPS
- Alicent
- Amy
- Anakin
- Angus
- Arya
- Austin
- Barney
- Bart
- Batman
- Bear
- Bob
- Bono
- Bran
- Brienne
- Cara
- Cassian
- Cersei
- Cersei1
- Chewbacca
- CP4AIOPS
- Curt
- Daenerys
- Daffy
- Darth
- Demo
- Dexter
- Dilbert
- Edge
- Finn
- Fred
- Freddie
- Grogu
- Groot
- Hagrid
- Han
- Harley
- Harry
- Hodor
- Hofstadter
- Howard
- Hulk
- James
- Jimmy
- John
- Joker
- Jyn
- King
- Kirk
- Kurt
- Lando
- Leia
- Larry
- Lemmy
- Liam
- Luke
- Nightking
- Obiwan
- Padme
- Paul
- Penny
- Picard
- Prince
- Raj
- Rey
- Robin
- Robot1
- Robot2
- Robot3
- Robot4
- Robot5
- Ron
- Sabine
- Sansa
- Sheldon
- Sherlock
- Slash
- Spiderman
- Spock
- Strange
- Superman
- Tormund
- Tyrion
- Walker
- Watson
- Wedge
π¦ 2.2.1 What will be installed
This installation contains:
- IBM Concert
- IBM Concert Instance
- IBM Concert Demo Content
- Default Demo Content
- SBOMs
- App, Build and Deploy for RobotShop
- Certificates
- Certificates for RobotShop
- Compliance
- Custom NIST Demo Compliance
- Demo Applications
- RobotShop Demo App
- SockShop Demo App
π 2.2.2 Installation Instructions
- In the the OpenShift Web UI click on the
+
sign in the right upper corner - Copy and paste the content from this file
- Replace
<REGISTRY_TOKEN>
at the top of the file with your entitlement key from step 1.1.2 (line 49 - the Entitlement key from https://myibm.ibm.com) - Replace the default Password
global_password: CHANGEME
with a Password of your choice (line 62, β do NOT use the "-" character and do NOT leave empty β) - Accept the license by setting
accept_all_licenses
toTrue
(line 69) - Click
Create
β If you get a warning (Orange or Red Bar on top) please re-run the installer Pod until you are all green.
π 2.2.3 Follow the installation progress
-
The blue Notification at the top gives you basic information about the running Installation (Name, Version, ...)
You can open and follow the installation logs by clicking on
Open Logs
-
In addition to this, you also have the bottom Notifications that give you the current step of the Installation
-
When the Installation has succeeded, you get the top green Notification bar
You can directly open IBM Concert by clicking on the link
π¦ 2.3.1 What will be installed
This installation contains:
- IBM Turbonomic
- IBM Turbonomic Instance
- IBM Turbonomic Demo Content
- Demo Applications
- RobotShop Demo App
- SockShop Demo App
π 2.3.2 Installation Instructions
- In the the OpenShift Web UI click on the
+
sign in the right upper corner - Copy and paste the content from this file
- Enter your Turbonomic License on line 69
- Replace the default Password
global_password: CHANGEME
with a Password of your choice (line 62, β do NOT use the "-" character and do NOT leave empty β) - Accept the license by setting
accept_all_licenses
toTrue
(line 72) - Optionally you can change the name of your Demo Environment
environment_name
to one of the provided characters (line 69) - Click
Create
β If you get a warning (Orange or Red Bar on top) please re-run the installer Pod until you are all green.
π 2.3.3 Follow the installation progress
-
The blue Notification at the top gives you basic information about the running Installation (Name, Version, ...)
You can open and follow the installation logs by clicking on
Open Logs
-
In addition to this, you also have the bottom Notifications that give you the current step of the Installation
-
When the Installation has succeeded, you get the top green Notification bar
You can directly open IBM Turbonomic by clicking on the link
π¦ 2.4.1 What will be installed
This installation contains:
- IBM Instana
- IBM Instana Instance
- IBM Instana Demo Content
- TBD
- Demo Applications
- RobotShop Demo App
- SockShop Demo App
π 2.4.2 Installation Instructions
- In the the OpenShift Web UI click on the
+
sign in the right upper corner - Copy and paste the content from this file
- Enter your Turbonomic License on lines 142/143
- Replace the default Password
global_password: CHANGEME
with a Password of your choice (line 60, β do NOT use the "-" character and do NOT leave empty β) - Accept the license by setting
accept_all_licenses
toTrue
(line 70) - Optionally you can change the name of your Demo Environment
environment_name
to one of the provided characters (line 67) - Click
Create
β If you get a warning (Orange or Red Bar on top) please re-run the installer Pod until you are all green.
π 2.4.3 Follow the installation progress
-
The blue Notification at the top gives you basic information about the running Installation (Name, Version, ...)
You can open and follow the installation logs by clicking on
Open Logs
-
In addition to this, you also have the bottom Notifications that give you the current step of the Installation
-
When the Installation has succeeded, you get the top green Notification bar
You can directly open IBM Instana by clicking on the link
πΉ Please use the Demo Script to prepare for the demo.
πΉ The Click Through PPT, provides you with a simple PPT based demo - "feels like the real thing"(TM).
πΉ I have also added a short Demo Walkthrough video that you can watch to get an idea on how to do the demo (based on 3.2).
π 3.1.1 Access the Environment
To access the demo environment:
π 3.1.2 Login to IBM AIOps as demo User
- Click on the blue
IBM AIOps
button - Login as User
demo
with the PasswordSelected at installation
βIf you are using IBM TechZone Clusters you will get certificate errors when trying to open CP4AIOPS or Turbonomic
β Open the links in a Private/Incognito window and select proceed
β In Chrome you can type
thisisunsafe
when on theYour connection is not private
page. There is no visual feedback but if you type it correctly the page will then load.
π 3.1.3Demo the Solution
Please use the Demo Script to prepare for the demo.
Then start the demo from the same Demo Script.
π₯ 3.2.1 Basic Architecture
The environement (Kubernetes, Applications, ...) create logs that are being fed into a Log Management Tool (ELK in this case).
- External Systems generate Alerts and send them into the IBM AIOps for Event Grouping.
- At the same time IBM AIOps ingests the raw logs coming from the Log Management Tool (ELK) and looks for anomalies in the stream based on the trained model.
- It also ingests Metric Data and looks for anomalies
- If it finds an anomaly (logs and/or metrics) it forwards it to the Event Grouping as well.
- Out of this, IBM AIOps creates an Incident that is being enriched with Topology (Localization and Blast Radius) and with Similar Incidents that might help correct the problem.
- The Incident is then sent to Slack.
- A Runbook is available to correct the problem but not launched automatically.
π₯ 3.2.2 Optimized Demo Architecture
The idea of this repo is to provide a optimised, complete, pre-trained demo environment that is self-contained (e.g. can be deployed in only one cluster)
It contains the following components (which can be installed independently):
- IBM AIOps
- IBM Operator
- IBM AIOps Instance
- IBM AIOps Demo Content (optional)
- OpenLDAP & Register with IBM AIOps
- AWX (Open Source Ansible Tower) with preloaded Playbooks
- AI Models - Load and Train
- Create Training Definitions (TG, LAD, CR, SI. Turn off RSA)
- Create Training Data (LAD, SNOW)
- Train Models (TG, LAD, CR, SI)
- Topology
- RobotShop Demo App
- Create K8s Observer
- Create ASM merge rules
- Load Overlay Topology
- Create IBM AIOps Application
- Misc
- Creates valid certificate for Ingress (Slack)
- External Routes (Flink, Topology, ...)
- Disables ASM Service match rule
- Create Policy Creation for Stories and Runbooks
- Demo Service Account
- Turbonomic (optional)
- Turbonomic Demo Content (optional)
- Demo User
- RobotShop Demo App with synthetic metric
- Instana target (if Instana is installed - you have to enter the API Token Manually)
- Groups for vCenter and RobotShop
- Groups for licensing
- Resource Hogs
For the this specific Demo environment:
- ELK is not needed as I am using pre-canned logs for training and for the anomaly detection (inception)
- Same goes for Metrics, I am using pre-canned metric data for training and for the anomaly detection (inception)
- The Events are also created from pre-canned content that is injected into IBM AIOps
- There are also pre-canned ServiceNow Incidents if you donβt want to do the live integration with SNOW
- The Webpages that are reachable from the Events are static and hosted on my GitHub
- The same goes for ServiceNow Incident pages if you donβt integrate with live SNOW
This allows us to:
- Install the whole Demo Environment in a self-contained OCP Cluster
- Trigger the Anomalies reliably
- Get Events from sources that would normally not be available (Instana, Turbonomic, Log Aggregator, Metric Provider, ...)
- Show some examples of SNOW integration without a live system
π₯ 3.2.3 Training
Loading Training data is done at the lowest possible level (for efficiency and speed):
- Logs: Loading Elastic Search indexes directly into ES - two days of logs for March 3rd and 4th 2022
- SNOW: Loading Elastic Search indexes directly into ES - synthetic data with 15k change requests and 5k incidents
- Metrics: Loading Cassandra dumps of metric data - 3 months of synthetic data for 13 KPIs
The models can be trained directly on the data that has been loaded as described above.
π₯ 3.2.4 Incident creation
Incidents are being created by using the high level APIs in order to simulate a real-world scenario.
- Events: Pre-canned events are being injected through the corresponding REST API
- Logs: Pre-canned anomalous logs for a 30 min timerange are injected through Kafka
- Metrics: Anomalous metric data are generated on the fly and injected via the corresponding REST API
βΉοΈ You can find a more detailed presentation about how the automation works here: PDF.
This feature allows you to easily create custom scenarios for the IBM AIOps Demo UI.
By default the custom scenario is disabled. In order to enable it you have to modify the ibm-aiops-demo-ui-config-custom
ConfigMap in the ibm-aiops-demo-ui
Namespace.
βΉοΈ The Topology will be loaded only the first time. Once the Application exists it will not update.
βΉοΈ If you want to update the Topology after a modification of the CustomMap, you can use the
Reload Topolgy
Button on theAbout
Tab.
π₯ Topology
To create a complete Topology/Application, you have to define the following variables:
CUSTOM_TOPOLOGY_APP_NAME
: Name for the Application (if this is left empty, no Application is created)CUSTOM_TOPOLOGY_TAG
: Tag used to create the Topology Template (if this is left empty, no Template is created)CUSTOM_TOPOLOGY
: Topology definition, will be loaded through a File Explorer (make sure that you have a corresponding tag to create the Template)
β IMPORTANT: The complete topology is loaded each time the DemoUI Pod is restarting
You can get more details here.
A typical Vertex (Entity)
V:{
"name": "test01", "uniqueId": "test01-id",
"entityTypes": ["device"],
"matchTokens":["test01","test01-id"], <-- This should contain the resource name of the event to be matched to
"mergeTokens":["test01","test01-id"],
"tags":["tag1","app:custom-app"], "app":"test" ,
"city":"Richmond", "area": "Broad Meadows", "geolocation": { "geometry": { "coordinates": [-77.56121810464228, 37.64360674606608],"type": "Point"}},
"_references": [{"_toUniqueId":"test02-id","_edgeType":"connectedTo"}],
"fromFile":"true", "_operation": "InsertUpdate"
}
</details>
π₯ Events
Inject Events to simulate the Custom Scenario.
CUSTOM_EVENTS
: List of Events to be injected sequentially (order is being respected)
{
"id": "1a2a6787-59ad-4acd-bd0d-000000000000", <-- Optional
"occurrenceTime": "MY_TIMESTAMP", <-- Do not modify
"summary": "Summary - Problem test01", <-- The text of the event
"severity": 6,
"expirySeconds": 6000000,
"links": [{
"linkType": "webpage",
"name": "LinkName",
"description": "LinkDescription",
"url": "https://ibm.com/index.html"
}],
"sender": {
"type": "host",
"name": "SenderName",
"sourceId": "SenderSource"
},
"resource": {
"type": "host",
"name": "test01", <-- This is the resource name that will be matched to Topology (see MatchTokens)
"sourceId": "ResourceSorce"
},
"details": {
"Tag1Name": "Tag1",
"Tag2Name": "Tag2"
},
"type": {
"eventType": "problem",
"classification": "EventType"
}
}
</details>
π₯ Metrics
Inject Metrics to simulate the Custom Scenario.
CUSTOM_METRICS
: List of Metrics to be simulated
β IMPORTANT: You need a trained Metric Model for this to create anomalies
You can get more details here.
`ResourceName, MetricName, GroupName, BaseValue, Variance?
- ResourceName: The resource name that will be matched to Topology (see MatchTokens)
- MetricName: Name of the Metric (ex. MemoryUsageAverage)
- GroupName: Name of the Metric Group (ex. MemoryUsage)
- Base Value: Mean value
- Variance: Variance around mean value
Example:
- MeanValue: 97
- Variance: 3
- Will create random values between 94 and 100
test10,DemoMetric1,DemoGroup1,0,1;
test11,DemoMetric2,DemoGroup2,50,25'
</details>
π₯ Logs
Inject Logs to simulate the Custom Scenario.
CUSTOM_LOGS
: List of Log lines to be injected sequentially (order is being respected)
β IMPORTANT: You need a trained Log Model for this to create anomalies
A typical Vertex (Entity)
{
"timestamp": MY_EPOCH, <-- Do not modify
"utc_timestamp": "MY_TIMESTAMP", <-- Do not modify
"instance_id": "test20", <-- This is the resource name that will be matched to Topology (see MatchTokens)
"message": "Demo Log Message", <-- The text of the log line
"entities": {
"pod": "test20",
"cluster": null,
"container": "test20",
"node": "test21"
},
"application_group_id": "1000",
"application_id": "1000",
"level": 1,
"type": "StandardLog",
"features": [],
"meta_features": []
}
π₯ Logs
Simulate change in an Topology Objects Propoerties.
CUSTOM_PROPERTY_RESOURCE_NAME
: The Name of the resource to be affectedCUSTOM_PROPERTY_RESOURCE_TYPE
: The Type of the resource to be affectedCUSTOM_PROPERTY_VALUES_NOK
: The values to be added/created when the Incident is being simulatedCUSTOM_PROPERTY_VALUES_OK
: The values to be added/created when the Incident is being mitigaged
A typical Entry
CUSTOM_PROPERTY_RESOURCE_NAME: 'test01'
CUSTOM_PROPERTY_RESOURCE_TYPE: 'device'
CUSTOM_PROPERTY_VALUES_NOK: '{"test1": "NOK","test2": "NOK","test3": "NOK"}'
CUSTOM_PROPERTY_VALUES_OK: '{"test1": "OK","test2": "OK","test3": "OK"}'
}
π₯ Example
This is a small example containing a Topology, Events, Metrics and Logs.
kind: ConfigMap
apiVersion: v1
metadata:
name: ibm-aiops-demo-ui-config-custom
namespace: ibm-aiops-demo-ui
data:
CUSTOM_NAME: 'Custom Demo'
CUSTOM_EVENTS: |-
{ "id": "1a2a6787-59ad-4acd-bd0d-000000000000", "occurrenceTime": "MY_TIMESTAMP", "summary": "Summary - Problem test01", "severity": 6, "type": { "eventType": "problem", "classification": "EventType" }, "expirySeconds": 6000000, "links": [ { "linkType": "webpage", "name": "LinkName", "description": "LinkDescription", "url": "https://pirsoscom.github.io/git-commit-mysql-vm.html" } ], "sender": { "type": "host", "name": "SenderName", "sourceId": "SenderSource" }, "resource": { "type": "host", "name": "test01", "sourceId": "ResourceSorce" }, "details": { "Tag1Name": "Tag1", "Tag2Name": "Tag2" }}
{ "id": "1a2a6787-59ad-4acd-bd0d-000000000000", "occurrenceTime": "MY_TIMESTAMP", "summary": "Summary - Problem test02", "severity": 5, "type": { "eventType": "problem", "classification": "EventType" }, "expirySeconds": 6000000, "links": [ { "linkType": "webpage", "name": "LinkName", "description": "LinkDescription", "url": "https://pirsoscom.github.io/git-commit-mysql-vm.html" } ], "sender": { "type": "host", "name": "SenderName", "sourceId": "SenderSource" }, "resource": { "type": "host", "name": "test02", "sourceId": "ResourceSorce" }, "details": { "Tag1Name": "Tag1", "Tag2Name": "Tag2" }}
{ "id": "1a2a6787-59ad-4acd-bd0d-000000000000", "occurrenceTime": "MY_TIMESTAMP", "summary": "Summary - Problem test03", "severity": 4, "type": { "eventType": "problem", "classification": "EventType" }, "expirySeconds": 6000000, "links": [ { "linkType": "webpage", "name": "LinkName", "description": "LinkDescription", "url": "https://pirsoscom.github.io/git-commit-mysql-vm.html" } ], "sender": { "type": "host", "name": "SenderName", "sourceId": "SenderSource" }, "resource": { "type": "host", "name": "test03", "sourceId": "ResourceSorce" }, "details": { "Tag1Name": "Tag1", "Tag2Name": "Tag2" }}
CUSTOM_METRICS: |-
test10,DemoMetric1,DemoGroup1,0,1;
test11,DemoMetric2,DemoGroup2,50,25
CUSTOM_LOGS: |-
{"timestamp": MY_EPOCH,"utc_timestamp": "MY_TIMESTAMP", "features": [], "meta_features": [],"instance_id": "test20","application_group_id": "1000","application_id": "1000","level": 1,"message": "Demo Log Message","entities": {"pod": "test20","cluster": null,"container": "test20","node": "test21"},"type": "StandardLog"},
CUSTOM_TOPOLOGY_FORCE_RELOAD: 'False'
CUSTOM_TOPOLOGY_APP_NAME: 'Custom Demo Application'
CUSTOM_TOPOLOGY_TAG: 'app:custom-app'
CUSTOM_TOPOLOGY: |-
V:{"uniqueId": "test01-id", "name": "Deployment1", "entityTypes": ["deployment"], "tags":["tag1","app:custom-app"],"matchTokens":["test01","test01-id"],"mergeTokens":["test01","test01-id"], "city":"Richmond", "area": "Broad Meadows", "geolocation": { "geometry": { "coordinates": [-77.56121810464228, 37.64360674606608],"type": "Point"}},"_operation": "InsertUpdate", "app":"test", "fromFile":"true", "_references": [{"_toUniqueId":"test02-id","_edgeType":"connectedTo"},{"_toUniqueId":"test03-id","_edgeType":"connectedTo"}]}
V:{"uniqueId": "test02-id", "name": "VM1", "entityTypes": ["vm"], "tags":["tag1","app:custom-app"],"matchTokens":["test02","test02-id"],"mergeTokens":["test02","test02-id"], "city":"Richmond", "area": "Broad Meadows", "geolocation": { "geometry": { "coordinates": [-77.56121810464228, 37.64360674606608],"type": "Point"}},"_operation": "InsertUpdate", "app":"test", "fromFile":"true", "_references": [{"_toUniqueId":"test03-id","_edgeType":"connectedTo"}]}
V:{"uniqueId": "test03-id", "name": "Database1", "entityTypes": ["database"], "tags":["tag1","app:custom-app"],"matchTokens":["test03","test03-id"],"mergeTokens":["test03","test03-id"], "city":"Richmond", "area": "Broad Meadows", "geolocation": { "geometry": { "coordinates": [-77.56121810464228, 37.64360674606608],"type": "Point"}},"_operation": "InsertUpdate", "app":"test", "fromFile":"true", "_references": []}
CUSTOM_PROPERTY_RESOURCE_NAME: 'Deployment1'
CUSTOM_PROPERTY_RESOURCE_TYPE: 'deployment'
CUSTOM_PROPERTY_VALUES_NOK: '{"test1": "NOK","test2": "NOK","test3": "NOK"}'
CUSTOM_PROPERTY_VALUES_OK: '{"test1": "OK","test2": "OK","test3": "OK"}'
β Globally: if there is and error or something missing re-run the installer Pod
π₯ CP4AIOPS Base installation Failing at 10-20 pods
If you have provisioned a cluster with Managed NFS 2TB
and you have Pods in 0/0
state in the ibm-aiops
Namespace, verify the nfs-provisioner
Pod is running. If not (this is a bug in Techzone) please apply ./tools/98_maintenance/troubleshooting/nfs-provisioner.yaml
. The installation should subsequently continue.
If not, please re-run the installer Pod.
π₯ CP4AIOPS Base installation Failing at 60-90 pods
If your CP4AIPS installtion gets stuck at 60-90 Pods in the ibm-aiops
Namespace, there is not much I can do to help - this is not a problem with the scripts!
β Please try this YAML
π₯ I'm getting a certificate error when opening CP4AIOPS or Turbonomic
βIf you are using IBM TechZone Clusters you will get certificate errors when trying to open CP4AIOPS or Turbonomic
β Open the links in a Private/Incognito window and select proceed
β
Or in Chrome you can type thisisunsafe
when on the Your connection is not private
page. There is no visual feedback but if you type it correctly the page will then load.
π₯ Installation error Notification
If you get a red notification saying β FATAL ERROR: Please check the Installation Logs and re-run the installer by deleting the Pod
β Please re-run the installer Pod.
π₯ Missing stuff in CP4AIOps
If you have missing elements:
- Incomplete Topology
- Missing Policies
- Missing Runbooks
β Please re-run the installer Pod.
π₯ Training not done or incomplete
If you have missing or incomplete Training
β Please re-run the installer Pod.
For deeper understanding of the problem you can check the logs of the Data Load Pods
-
Go to your OpenShift UI
-
Select Namespace
ibm-installer
-
Select Workloads/Pods
-
You should see something like this
-
click on the three dots at the end of the line for Pod
ibm-aiops-install-aiops-xxx
-
Select Delete
-
Confirm
This will restart the complete installation process. But it will be much faster as it is mainly incremental.
For the system to work you need to follow those steps:
- Create Slack Workspace
- Create Slack App
- Create Slack Channels
- Create Slack Integration
- Get the Integration URL
- Create Slack App Communications
- Slack Reset
π₯ Detailed Instructions
- Create a Slack workspace by going to https://slack.com/get-started#/createnew and logging in with an email which is not your IBM email. Your IBM email is part of the IBM Slack enterprise account and you will not be able to create an independent Slack workspace outside if the IBM slack service.
- After authentication, you will see the following screen:
-
Click Create a Workspace ->
-
Name your Slack workspace
Give your workspace a unique name such as aiops-<yourname>.
- Describe the workspace current purpose
This is free text, you may simply write βdemo for IBM AIOpsβ or whatever you like.
You may add team members to your new Slack workspace or skip this step.
At this point you have created your own Slack workspace where you are the administrator and can perform all the necessary steps to integrate with CP4WAOps.
Note : This Slack workspace is outside the control of IBM and must be treated as a completely public environment. Do not place any confidential material in this Slack workspace.
-
Create a Slack app, by going to https://api.slack.com/apps and clicking
Create New App
. -
Select
From an app manifest
-
Select the appropriate workspace that you have created before and click
Next
-
Copy and paste the content of this file ./doc/slack/slack-app-manifest.yaml.
Don't bother with the URLs just yet, we will adapt them as needed.
-
Click
Next
-
Click
Create
-
Scroll down to Display Information and name your IBMAIOPS app.
-
You can add an icon to the app (there are some sample icons in the ./tools/4_integrations/slack/icons folder.
-
Click save changes
-
In the
Basic Information
menu click onInstall to Workspace
then clickAllow
-
In Slack add a two new channels:
- aiops-demo-reactive
- aiops-demo-proactive
-
Right click on each channel and select
Copy Link
This should get you something like this https://xxxx.slack.com/archives/C021QOY16BW The last part of the URL is the channel ID (i.e. C021QOY16BW) Jot them down for both channels
-
Under Apps click Browse Apps
-
Select the App you just have created
-
Invite the Application to each of the two channels by typing
@<MyAppname>
-
Select
Add to channel
You shoud get a message from saying
was added to #<your-channel> by ...
In the Slack App:
-
In the
Basic Information
menu get theSigning Secret
(not the Client Secret!) and jot it down -
In the
OAuth & Permissions
get theBot User OAuth Token
(not the User OAuth Token!) and jot it down
In the IBM AIOps (IBMAIOPS)
-
In the
IBM AIOps
"Hamburger" Menu selectDefine
/Integrations
-
Click
Add connection
-
Name it "Slack"
-
Paste the
Signing Secret
from above -
Paste the
Bot User OAuth Token
from above -
Paste the channel IDs from the channel creation step in the respective fields
-
Test the connection and click save
In the IBM AIOps (IBMAIOPS)
This is the URL you will be using for step 6.
Return to the browser tab for the Slack app.
-
Select
Event Subscriptions
. -
In the
Enable Events
section, click the slider to enable events. -
For the Request URL field use the
Request URL
from step 5.e.g:
https://<my-url>/aiops/aimanager/instances/xxxxx/api/slack/events
-
After pasting the value in the field, a Verified message should display.
If you get an error please check 5.7
-
Verify that on the
Subscribe to bot events
section you got:app_mention
andmember_joined_channel
events.
-
Click
Save Changes
button.
-
Select
Interactivity & Shortcuts
. -
In the Interactivity section, click the slider to enable interactivity. For the
Request URL
field, use use the URL from above.
There is no automatic verification for this form
- Click
Save Changes
button.
Now, configure the welcome
slash command. With this command, you can trigger the welcome message again if you closed it.
-
Select
Slash Commands
-
Click
Create New Command
to create a new slash command.Use the following values:
Field Value Command /welcome Request URL the URL from above Short Description Welcome to IBM AIOps -
Click
Save
.
The Slack app must be reinstalled, as several permissions have changed.
- Select
Install App
- Click
Reinstall to Workspace
Once the workspace request is approved, the Slack integration is complete.
If you run into problems validating the Event Subscription
in the Slack Application, see 5.2
This is needed for the reset scripts in order to empty/reset the Slack channels.
This is based on Slack Cleaner2. You might have to install this:
pip3 install slack-cleaner2
In your Slack app
- In the
OAuth & Permissions
get theUser OAuth Token
(not the Bot User OAuth Token this time!) and jot it down
In file ./tools/98_maintenance/scripts/13_reset-slack.sh
- Replace
not_configured
for theSLACK_TOKEN
parameter with the token - Adapt the channel name for the
SLACK_REACTIVE
parameter
In your Slack app
- In the
OAuth & Permissions
get theUser OAuth Token
(not the Bot User OAuth Token this time!) and jot it down (same token as above)
In file ./tools/98_maintenance/scripts/14_reset-slack-changerisk.sh
- Replace
not_configured
for theSLACK_TOKEN
parameter with the token - Adapt the channel name for the
SLACK_PROACTIVE
parameter
Call either of the scripts above to reset the channel:
./tools/98_maintenance/scripts/13_reset-slack.sh
or
./tools/98_maintenance/scripts/14_reset-slack-changerisk.sh