omnibus: fail the task when omnibus can't be installed #27730

chouquette · 2024-07-19T09:12:22Z

What does this PR do?

Actually fail the task when omnibus can't be installed.

Motivation

We currently continue when omnibus fails to install, which leads to confusing failure later on: https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/578098364

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

tasks/omnibus.py

chouquette · 2024-07-19T09:26:24Z

Rerun of the failing pipeline with this commit included: https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/578140216

tasks/omnibus.py

alopezz · 2024-07-19T09:35:45Z

I would appreciate adding a test to https://github.com/DataDog/datadog-agent/blob/chouquette/fail_install_omnibus/tasks/unit_tests/omnibus_tests.py though I admit it's not super straightforward.

agent-platform-auto-pr · 2024-07-19T09:45:55Z

[Fast Unit Tests Report]

On pipeline 39647000 (CI Visibility). The following jobs did not run any unit tests:

Jobs:

tests_deb-arm64-py3
tests_deb-x64-py3
tests_flavor_dogstatsd_deb-x64
tests_flavor_heroku_deb-x64
tests_flavor_iot_deb-x64
tests_rpm-arm64-py3
tests_rpm-x64-py3
tests_windows-x64

If you modified Go files and expected unit tests to run in these jobs, please double check the job logs. If you think tests should have been executed reach out to #agent-devx-help

pr-commenter · 2024-07-19T10:27:29Z

Regression Detector

Regression Detector Results

Run ID: 9839500d-2d63-4279-b4a2-b889354ed7b4 Metrics dashboard Target profiles

Baseline: bec830a
Comparison: c8f9dc6

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI	links
➖	file_tree	memory utilization	+0.57	[+0.50, +0.64]	Logs
➖	otel_to_otel_logs	ingress throughput	+0.53	[-0.29, +1.34]	Logs
➖	idle	memory utilization	+0.48	[+0.44, +0.51]	Logs
➖	uds_dogstatsd_to_api_cpu	% cpu utilization	+0.34	[-0.55, +1.22]	Logs
➖	tcp_dd_logs_filter_exclude	ingress throughput	+0.00	[-0.01, +0.01]	Logs
➖	uds_dogstatsd_to_api	ingress throughput	-0.00	[-0.00, +0.00]	Logs
➖	tcp_syslog_to_blackhole	ingress throughput	-0.14	[-12.94, +12.66]	Logs
➖	pycheck_1000_100byte_tags	% cpu utilization	-0.22	[-5.03, +4.60]	Logs
➖	basic_py_check	% cpu utilization	-1.09	[-3.63, +1.46]	Logs

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

chouquette · 2024-07-19T10:35:46Z

I would appreciate adding a test to https://github.com/DataDog/datadog-agent/blob/chouquette/fail_install_omnibus/tasks/unit_tests/omnibus_tests.py though I admit it's not super straightforward.

Done, and it allowed me to spot another failure we were silent ly ignoring 🥳

alopezz

Thanks for the tests! Can you take a look at the suggested simplification for the exception-related assertions?

tasks/unit_tests/omnibus_tests.py

tasks/omnibus.py

Co-authored-by: Alex Lopez <[email protected]>

chouquette · 2024-07-22T07:12:48Z

/merge

dd-devflow · 2024-07-22T07:12:53Z

🚂 MergeQueue: pull request added to the queue

The median merge time in main is 23m.

Use /merge -c to cancel this operation!

chouquette · 2024-07-22T07:13:10Z

/merge --cancel

dd-devflow · 2024-07-22T07:13:18Z

⚠️ MergeQueue: This merge request build was cancelled

This merge request build was cancelled

If you need support, contact us on Slack #devflow!

chouquette · 2024-07-22T07:13:33Z

cancelling since the CI is currently failing systematically

chouquette · 2024-07-22T07:16:47Z

/merge

dd-devflow · 2024-07-22T07:16:52Z

🚂 MergeQueue: pull request added to the queue

The median merge time in main is 23m.

Use /merge -c to cancel this operation!

omnibus: fail the task when omnibus can't be installed

c2dbdbc

chouquette added changelog/no-changelog qa/no-code-change No code change in Agent code requiring validation team/agent-delivery labels Jul 19, 2024

chouquette requested a review from a team as a code owner July 19, 2024 09:12

f4usto approved these changes Jul 19, 2024

View reviewed changes

tasks/omnibus.py Outdated Show resolved Hide resolved

don't output stderr since it's redirected to stdout

c0bca2a

alopezz reviewed Jul 19, 2024

View reviewed changes

tasks/omnibus.py Outdated Show resolved Hide resolved

chouquette added 2 commits July 19, 2024 12:34

omnibus: fail install task after max rety attempts

7f7bc22

omnibus: add bundle install tests

ce6fb39

alopezz reviewed Jul 19, 2024

View reviewed changes

tasks/unit_tests/omnibus_tests.py Outdated Show resolved Hide resolved

tasks/unit_tests/omnibus_tests.py Show resolved Hide resolved

tasks/omnibus.py Outdated Show resolved Hide resolved

chouquette and others added 3 commits July 19, 2024 13:45

Update tasks/unit_tests/omnibus_tests.py

37a8c1d

Co-authored-by: Alex Lopez <[email protected]>

simplify error handling

86a21cd

cleanup and more assertions

c8f9dc6

dd-mergequeue bot merged commit 1d76b10 into main Jul 22, 2024
207 of 208 checks passed

dd-mergequeue bot deleted the chouquette/fail_install_omnibus branch July 22, 2024 07:52

github-actions bot added this to the 7.57.0 milestone Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

omnibus: fail the task when omnibus can't be installed #27730

omnibus: fail the task when omnibus can't be installed #27730

chouquette commented Jul 19, 2024

chouquette commented Jul 19, 2024

alopezz commented Jul 19, 2024

agent-platform-auto-pr bot commented Jul 19, 2024 •

edited

Loading

pr-commenter bot commented Jul 19, 2024 •

edited

Loading

Fine details of change detection per experiment

Explanation

chouquette commented Jul 19, 2024

alopezz left a comment

chouquette commented Jul 22, 2024

dd-devflow bot commented Jul 22, 2024

chouquette commented Jul 22, 2024

dd-devflow bot commented Jul 22, 2024

chouquette commented Jul 22, 2024

chouquette commented Jul 22, 2024

dd-devflow bot commented Jul 22, 2024

omnibus: fail the task when omnibus can't be installed #27730

omnibus: fail the task when omnibus can't be installed #27730

Conversation

chouquette commented Jul 19, 2024

What does this PR do?

Motivation

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

chouquette commented Jul 19, 2024

alopezz commented Jul 19, 2024

agent-platform-auto-pr bot commented Jul 19, 2024 • edited Loading

pr-commenter bot commented Jul 19, 2024 • edited Loading

Regression Detector

Regression Detector Results

No significant changes in experiment optimization goals

Fine details of change detection per experiment

Explanation

chouquette commented Jul 19, 2024

alopezz left a comment

Choose a reason for hiding this comment

chouquette commented Jul 22, 2024

dd-devflow bot commented Jul 22, 2024

chouquette commented Jul 22, 2024

dd-devflow bot commented Jul 22, 2024

chouquette commented Jul 22, 2024

chouquette commented Jul 22, 2024

dd-devflow bot commented Jul 22, 2024

agent-platform-auto-pr bot commented Jul 19, 2024 •

edited

Loading

pr-commenter bot commented Jul 19, 2024 •

edited

Loading