Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carlos can see a dashboard that shows how a team is doing against key metrics so that he can understand where there might be areas of improvement #50

Open
seansund opened this issue Feb 3, 2020 · 16 comments
Assignees
Labels
devops tools Tools installed to support devops (e.g. Jenkins, Nexus, etc) Persona: SRE Carlos the SRE persona spike need time to investigate

Comments

@seansund
Copy link
Member

seansund commented Feb 3, 2020

The Accelerate book and the State Of DevOps report from which it was derived identifies four key metrics that are indicative of high performing teams:

  • deployment frequency
  • cycle time (time from feature requested to deployed to production)
  • failure rate (how many deployments in production result in an error)
  • mean time to recover (when an error occurs in production, how long does it take to recover)
@seansund seansund added the devops tools Tools installed to support devops (e.g. Jenkins, Nexus, etc) label Feb 3, 2020
@seansund
Copy link
Member Author

seansund commented Feb 3, 2020

Hygieia looks like a good tool to deliver a lot of this value (and more)

@seansund seansund changed the title As a ?, I would like to have a dashboard that shows how a team is doing against key metrics so that I can understand where there might be areas of improvement As a SRE, I would like to have a dashboard that shows how a team is doing against key metrics so that I can understand where there might be areas of improvement Feb 3, 2020
@seansund seansund changed the title As a SRE, I would like to have a dashboard that shows how a team is doing against key metrics so that I can understand where there might be areas of improvement Susan sees a dashboard that shows how a team is doing against key metrics so that she can understand where there might be areas of improvement Feb 25, 2020
@seansund seansund added the Persona: SRE Carlos the SRE persona label Feb 25, 2020
@seansund seansund changed the title Susan sees a dashboard that shows how a team is doing against key metrics so that she can understand where there might be areas of improvement Susan can see a dashboard that shows how a team is doing against key metrics so that she can understand where there might be areas of improvement Feb 26, 2020
@seansund
Copy link
Member Author

seansund commented Mar 12, 2020

Start with http://hygieia.github.io/Hygieia/builddocker.html to prototype locally. Hygieia is customized to an environment so we may end up baking this into Terraform where we collect the particulars of the environment (Git host, Ticket system (e.g. Jira, Trello, etc)).

@lsteck
Copy link

lsteck commented Mar 19, 2020

Docker Compose doesn't work: hygieia/hygieia#3216

Switched to use Starter Kit: https://github.com/Hygieia/hygieia-starter-kit
This is a single docker image that contains:

  • UI
  • API
  • Mongo
  • Github Collector
  • Sonar Collector
  • Jenkins Collector.

@lsteck
Copy link

lsteck commented Mar 19, 2020

Got the Starter Kit to work locally, there is an issue with the Sonar Collector. It doesn't support Sonar Qube version 8.2, it works fine with version 6.7.

Error with 8.2
2020-03-18 17:49:19,967 [taskScheduler-1] INFO c.c.d.collector.CollectorTask - Running Collector: Sonar 2020-03-18 17:49:19,979 [taskScheduler-1] INFO c.c.d.collector.CollectorTask - ----------------------------------- 2020-03-18 17:49:19,980 [taskScheduler-1] INFO c.c.d.collector.CollectorTask - http://192.168.0.32:9000 2020-03-18 17:49:19,981 [taskScheduler-1] INFO c.c.d.collector.CollectorTask - ----------------------------------- 2020-03-18 17:49:20,634 [taskScheduler-1] INFO c.c.d.collector.CollectorTask - Fetched projects 1 0s 2020-03-18 17:49:20,638 [taskScheduler-1] ERROR o.s.s.s.TaskUtils$LoggingErrorHandler - Unexpected error occurred in scheduled task. java.lang.NullPointerException: null at com.capitalone.dashboard.model.SonarProject.equals(SonarProject.java:38) ~[sonar-codequality-collector.jar!/:3.1.0]

@lsteck
Copy link

lsteck commented Mar 19, 2020

I'm going to work on deploying the components individually as services.

There are some that are making progress

hygieia/hygieia#3224

@lsteck
Copy link

lsteck commented Mar 19, 2020

FYI here is the current list of collectors. https://hygieia.github.io/Hygieia/collectors.html

No support the Tekton or ArgoCD.

@seansund seansund changed the title Susan can see a dashboard that shows how a team is doing against key metrics so that she can understand where there might be areas of improvement Carlos can see a dashboard that shows how a team is doing against key metrics so that he can understand where there might be areas of improvement Mar 19, 2020
@lsteck
Copy link

lsteck commented Apr 1, 2020

Notes on where I left off before getting pulled to look at other tasks.

While the Starter Kit is fine for running locally it would not work in Kubernetes environment.

The Docker-compose does not work.
hygieia/hygieia#3216

Likewise the build docker image commands don't work. It uses an old framework from Spotify that isn't supported anymore.
https://github.com/spotify/docker-maven-plugin

All projects have a Dockerfile which uses a /docker/properties-builder.sh script to convert environment variables into the properties files and passes that into the spring jar so using them works fine.

MongoDB:
I just pulled the latest version of Mongo from docker hub and it works.
FYI, created a startup script that creates the dashboarduser in both the admin and dashboarddb databases because just having it on admin database didn't work.
hygieia/hygieia#2877

Sonar
The collector/dashboard does not work with SonarQube 8.2, see comment above.
hygieia/hygieia-starter-kit#8 (comment)

FYI there is issues on how to get it working on Kubernetes
hygieia/hygieia#3224

@seansund
Copy link
Member Author

We decided that Hygieia is more work to configure than the value we will get from it. The code seems to be not well supported and the installation model is pre-cloud-native

@mjperrins
Copy link
Member

I agree, the value of collecting the Accelerate metrics is still important, I was going to resurrect the original code for display the base MTTR and Build time UX we had in React and link that into the pipeline and add that UX to the Dashboard, as the dashboard moves from a tools launching functionality to a something that could give metrics a team. The design would use the work @seansund started for collecting the data in a more structured manner sending data from the pipeline to an in cluster Mongo (with storage) and then presenting the metrics in the UX below, it would also allow detail navigation to the Git Repo, Artifact, Image, Code Coverage and health. This could be then added into Tekton Tasks or new Tasks added.

image

This is a logical next step to build on top of the base pipeline delivery

@csantanapr
Copy link

csantanapr commented May 19, 2020

Take a look at Tekton Events, it already generate some events that can be collected, then this can be paired with knative-eventing to convert them into CNCF CloundEvents, then from the broker multiple trigger subscribers can attach and populate a DB like mongo, or send a slack message, etc..
https://github.com/tektoncd/pipeline/blob/master/docs/events.md

Using a generic way to generate "DevOps" events, with some sane schema, then a subscriber action can convert those events into specific system, for example if User is using Tekton on IBM Cloud and use DevOps insights, instead of hard coding the cli call ibmcloud doi publishtestrecord directly in their Tekton Task, it can be a generic "DevOps" CloudEvent, and then a subscriber can hanle the event.

https://cloud.ibm.com/docs/ContinuousDelivery?topic=ContinuousDelivery-publishing-test-data

@seansund
Copy link
Member Author

@csantanapr I agree. Would like to see this done in a more extensible way to emit and collect events. Right now most solutions are very platform/tool specific

@csantanapr
Copy link

I mean, you can even have a CloudEvent trigger another Tekton Task to handle the call to the devops system like this tasks here https://github.com/open-toolchain/tekton-catalog/blob/tkn_v1beta1/devops-insights/README.md

@csantanapr
Copy link

csantanapr commented May 19, 2020

The concept of using Tekton TaskRun as Serverless Function :-)

@lsteck
Copy link

lsteck commented Jun 2, 2020

Need time to investigate if Hygieia is a good solution

@mjperrins mjperrins self-assigned this Aug 23, 2020
@lsteck
Copy link

lsteck commented Oct 16, 2020

I wondering if we need to look for a different solution. Like some sort of eventing solution with a dashboard.

@mjperrins
Copy link
Member

I would park this story, the new AI Ops cartridge from IBM Hybrid Cloud is going to major on this functionality it will plug into a common SDLC common tools and aggregate the key metrics. This would enable IBMers to demonstrate a story, there is also a lot of activity in this space with commercial offerings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops tools Tools installed to support devops (e.g. Jenkins, Nexus, etc) Persona: SRE Carlos the SRE persona spike need time to investigate
Projects
None yet
Development

No branches or pull requests

4 participants