Add script gathering and reporting health indicators #185

mpg · 2024-11-20T10:55:27Z

This is a script I've been using to get some stats I see as health indicators for the CI.

Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>

gilles-peskine-arm · 2024-11-20T11:48:44Z

ci_health.py

+        gh_username = os.environ["GITHUB_USERNAME"]
+        gh_token = os.environ["GITHUB_API_TOKEN"]


Nice-to-have: I'd like us to standardize on using the GitHub token from gh (the official GitHub CLI) when it's available.

I have this code snippet in one of my scripts, which would need to be adapted because here you need the username as well.

def try_get_gh_auth_token() -> str: """Get the default authentication token from gh (the official GitHub client). Return an empty string if there is no such token or if gh is not available. """ # TODO: allow specifying an alternative host name and user name try: output = subprocess.check_output(['gh', 'auth', 'token'], stderr=subprocess.DEVNULL) return output.strip().decode('ascii') except subprocess.CalledProcessError: return '' except FileNotFoundError: return ''

gilles-peskine-arm · 2024-11-20T11:55:59Z

ci_health.py

+Currently two indicators are reported:
+    1. Success rate of the nightly jobs. We don't expect "real" failures here,
+    so any failure is likely to be an infra issue or a flaky test.
+    2. Execution time of PR jobs.


Nice-to-have: an indicator for jobs that fail without a reported cause in the failure list. The failure list is an artifact called failures.csv or failures.csv.xz. “Without a reported cause” means: failures.csv.xz doesn't exist, and (failures.csv doesn't exist or failures.csv has size 0).

Even better, exclude jobs where the sole failure is that outcome analysis is unhappy.

With Mbed-TLS/mbedtls#9286, which adds an outcome line for running each component, this would count jobs that fail solely due to infrastructure problems (e.g. timeout, network glitches), as well as jobs that fail in outcome analysis. Thus this indicator could become a proxy for jobs that fail solely due to infrastructure problems.

I would ideally like to have an indicator that detects all infrastructure problems, but that seems hard.

Add script gathering and reporting health indicators

2345861

Signed-off-by: Manuel Pégourié-Gonnard <[email protected]>

mpg added enhancement New feature or request needs: review needs: reviewer priority-medium labels Nov 20, 2024

mpg self-assigned this Nov 20, 2024

gilles-peskine-arm reviewed Nov 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add script gathering and reporting health indicators #185

Add script gathering and reporting health indicators #185

mpg commented Nov 20, 2024

gilles-peskine-arm Nov 20, 2024

gilles-peskine-arm Nov 20, 2024

		gh_username = os.environ["GITHUB_USERNAME"]
		gh_token = os.environ["GITHUB_API_TOKEN"]

Add script gathering and reporting health indicators #185

Are you sure you want to change the base?

Add script gathering and reporting health indicators #185

Conversation

mpg commented Nov 20, 2024

gilles-peskine-arm Nov 20, 2024

Choose a reason for hiding this comment

gilles-peskine-arm Nov 20, 2024

Choose a reason for hiding this comment