-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate regularly failing workflows #4872
Comments
@ewastempel is working on the |
@SteveLinden picked up the modernisation-platform-terraform-baselines code for is: failure branch: main I get no results (leaving out the spaces gives an error) |
@SteveLinden looked at modernisation-platform-terraform-s3-bucket No issues listed with the same search |
@connormaglynn looking at |
Search incorrect. I will look again. |
So for modernisation-platform-terraform-baselines code there are many checkov errors. Need to go through and see which can be removed. |
Added some chatbot ignores to modernisation platform terraform-baselines #ministryofjustice/modernisation-platform-terraform-baselines#283 |
Repeated the same for the s3-bucket one. ministryofjustice/modernisation-platform-terraform-s3-bucket#252 This was replaced by another call - #4996 see comment below |
Waiting for the above releases to be approved so I will look at modernisation-platform-terraform-trusted-advisor Errors reported in the tfsec but I am, currently, unable to locate those. Working on the checkov ones first |
|
Analysis from
Summarising, three tickets to be created:
|
New issue for the modernisation-platform-terraform-s3-bucket created with Ewa's comment in there. #4996 |
Looked at modernisation-platform-cp-network-test. Only failing pipeline on main was the Secure Code Analysis from over 5 months ago, though they've been running fine since GitHub had disabled the scheduled actions due to inactivity, which I've now re-enabled. This feature is probably something to be mindful of if we're depending on scheduled checks for inactive repos🤔 |
modernisation-platform-terraform-ssm-patching - checkov errors relate to calling the test area. I have commented out (#checkov:skip=) the offending items as they do not appear to be related to what we want to do. |
@ewastempel is picking up
|
Changes to modernisation-platform-terraform-ssm-patching have been pushed through. I will check the results tomorrow to see if that fixes the checkov issues. |
@ewastempel is picking up |
Still trying to fix the tfsec error. Changing the scan_type to changed so, in theory, it should ignore other bits. Bit of a desperate change more in hope than anticipation. The error doesn't cause an issue but shows up as a fail on the actions list. |
modernisation-platform-terraform-cross-account-access had been failing on the tfsec part of the routine. This has been working for the past 3 weeks so it appears to be fixed, |
Currently looking at modernisation-platform-terraform-bastion-linux. This calls the item I have changed in #4996 If that is successful I will apply a new latest release called 7.1.0 and apply this to the code and see what impact that has. Assuming it's none I will check what needs to be changed on the code to cure Also fails on CKV2_AWS_64. Neither of these is listed in the action for modernisation-platform-terraform-s3-bucket so will need to be checked/skipped or corrected. Current PR for this is |
Applied fixes / addressed security code analysis alerts for the following repositories :
|
I've checked modernisation-platform-github-oidc-role and there are no errors showing other than dependabot issues. Also checked modernisation-platform-incident-response which had lots of errors but I have been informed that this should not be on the list and the git should have been archived. |
The changes made to the modernisation-platform-terraform-bastion-linux have corrected the issue and the check runs successfully now. |
The version of modernisation-platform-terraform-ecs in the library is archived so this has not been looked at - last run was 6 months ago. |
modernisation-platform-configuration-management - only failures are undertaken by user groups so this does not need any changes at the moment. |
Lookin at modernisation-platform-terraform-iam-superadmins There are many vpc errors - related to using * in the arns for example - which we are unlikely to be able to fix. It's possible we cannot resolve these so maybe reduce the number of runs? |
modernisation-platform-terraform-member-vpc - this one passes security checks now |
modernisation-platform-terraform-lambda-function has not general failures but there are issues go tests such as being unable to find credentials ( Error: No valid credential sources found) or invalid count (Error: Invalid count argument) but they are infrequent. It was caused by an error in my config. This has been corrected. |
On the modernisation-platform-terraform-lambda-function providers.tf code under test/unit-test it defaulted to the wrong ID rather than the correct one I wanted which was testing-ci-user. I commented out the code that was causing the issue and the test ran with no issues. |
Looking at modernisation-platform-github-oidc-provider which has one checkov failure that should be cured with the addition of another condition (CKV_AWS_358: "Ensure GitHub Actions OIDC trust policies only allows actions from a specific known organization") and should be relatively easy to fix. |
PR for the above has been posted. It's here |
PR added for documentation workflow error on modernisation-platform-terraform-ec2-instance |
PR added for documentation workflow error on modernisation-platform-terraform-ec2-autoscaling-group |
I'll pick up the S3 bucket checks today |
User Story
As an MP engineer
I want to reduce the amount of workflow failures
So that we have more successful deployments
User Type(s)
MP engineers
Value
Less deployment failures
More consistent deployments
Assumptions / Hypothesis / Questions / Unknowns
Hypothesis
If we search for failed workflows on main branches in our repositories (excluding environments repo)
Then we can see which ones are regularly failing and try to improve them
Proposal
Search github for failed workflows on main:
https://github.com/ministryofjustice/modernisation-platform/actions?query=is%3Afailure+branch%3Amain+
Go through them and see which ones can be fixed, if they are failing regularly and not being addressed do we need a slack alert for them?
Unknowns
We have data from DORA metrics which indicates around 50% of our workflows fail. We don't know where these are now, if there is a lot of work to be done we may need to split this out into smaller issues.
Definition of done
Reference
How to write good user stories
The text was updated successfully, but these errors were encountered: