Enable omnibus build cache #20117

chouquette · 2023-10-13T07:51:29Z

What does this PR do?

This PR enables omnibus git cache, in order to stop rebuilding all of our dependencies during each CI job.
All packages before the agent are expected to rarely change, and shouldn't have to be rebuilt every single time.
On average, this saves about 15/20 minutes per job.

This also allows individual developers to skip rebuilding every single dependencies if they wish to add a new software dependency to the agent. All they need to do is to provide the OMNIBUS_GIT_CACHE_DIR environment variable to a directory of their choosing.

Motivation

This is part of the currently running initiative to reduce the median pipeline duration under 2 hours. This specific investigation is listed under https://datadoghq.atlassian.net/browse/APL-1805

Associated RFC: https://docs.google.com/document/d/1PSGpd2ixXXMbfzC1j0o514SXypcDiVamxBL3db__Bt0/edit?usp=sharing

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

Reviewer's Checklist

pr-commenter · 2023-10-13T10:01:14Z

Bloop Bleep... Dogbot Here

Regression Detector Results

Run ID: 6ad93a5c-18e7-4e1d-a6ad-f271790a3eac
Baseline: d40837d
Comparison: 9f05afb

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

Experiments with missing or malformed data

basic_py_check

Usually, this warning means that there is no usable optimization goal data for that experiment, which could be a result of misconfiguration.

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	file_to_blackhole	% cpu utilization	-0.26	[-6.80, +6.29]

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	uds_dogstatsd_to_api_cpu	% cpu utilization	+1.21	[-0.25, +2.66]
➖	tcp_syslog_to_blackhole	ingress throughput	+1.17	[+1.12, +1.22]
➖	process_agent_standard_check	memory utilization	+0.67	[+0.63, +0.70]
➖	process_agent_real_time_mode	memory utilization	+0.25	[+0.21, +0.28]
➖	otel_to_otel_logs	ingress throughput	+0.18	[-0.44, +0.81]
➖	idle	memory utilization	+0.09	[+0.05, +0.12]
➖	trace_agent_json	ingress throughput	+0.01	[-0.03, +0.05]
➖	trace_agent_msgpack	ingress throughput	+0.01	[-0.01, +0.02]
➖	uds_dogstatsd_to_api	ingress throughput	+0.00	[-0.00, +0.00]
➖	tcp_dd_logs_filter_exclude	ingress throughput	-0.00	[-0.00, +0.00]
➖	file_tree	memory utilization	-0.17	[-0.23, -0.10]
➖	file_to_blackhole	% cpu utilization	-0.26	[-6.80, +6.29]
➖	process_agent_standard_check_with_stats	memory utilization	-0.39	[-0.42, -0.35]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

They are expected to almost always change and wouldn keep invalidating the cache. Not caching those will allow us to not regenerate the cache when there's no need to, which saves a few minutes it takes to recreate the cache bundle and upload it to s3

so that we can measure the results until it's merged & further worked on

pr-commenter · 2024-03-18T15:24:52Z

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv create-vm --pipeline-id=31521461 --os-family=ubuntu

pr-commenter · 2024-03-18T16:10:28Z

Regression Detector

Regression Detector Results

Run ID: 1eb5b777-6a43-4ad3-99e9-9242695e691e
Baseline: 92d6c0a
Comparison: 1e414d2

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	file_to_blackhole	% cpu utilization	+0.24	[-5.66, +6.15]

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	tcp_syslog_to_blackhole	ingress throughput	+1.32	[+1.24, +1.39]
➖	process_agent_real_time_mode	memory utilization	+0.78	[+0.74, +0.83]
➖	idle	memory utilization	+0.27	[+0.23, +0.31]
➖	file_to_blackhole	% cpu utilization	+0.24	[-5.66, +6.15]
➖	otel_to_otel_logs	ingress throughput	+0.23	[-0.18, +0.64]
➖	process_agent_standard_check_with_stats	memory utilization	+0.23	[+0.19, +0.28]
➖	trace_agent_json	ingress throughput	+0.03	[-0.01, +0.07]
➖	file_tree	memory utilization	+0.01	[-0.10, +0.13]
➖	trace_agent_msgpack	ingress throughput	+0.01	[-0.01, +0.02]
➖	uds_dogstatsd_to_api	ingress throughput	+0.00	[-0.20, +0.20]
➖	tcp_dd_logs_filter_exclude	ingress throughput	+0.00	[-0.02, +0.02]
➖	uds_dogstatsd_to_api_cpu	% cpu utilization	-0.06	[-3.11, +2.98]
➖	process_agent_standard_check	memory utilization	-0.22	[-0.27, -0.17]
➖	basic_py_check	% cpu utilization	-1.62	[-4.19, +0.95]
➖	pycheck_1000_100byte_tags	% cpu utilization	-2.47	[-7.32, +2.39]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

…-build argument

KSerrania · 2024-04-16T07:42:56Z

tasks/libs/common/omnibus.py

+    for k, v in environment.items():
+        print(f'\tUsing environment variable {k} to compute cache key')
+        h.update(str.encode(f'{k}={v}'))
+    # FIXME: include omnibus-ruby and omnibus-software version once they are pinned


I think this needs to be updated?

Indeed, that's a rather old comment. Removed

And as you mentionned on slack, the fixme was actually valid. I just pushed a commit that actually fixes the fixme.

Thanks again for noticing

dd-devflow · 2024-04-16T10:17:05Z

⚠️ MergeQueue

This merge request was unqueued

If you need support, contact us on Slack #devflow!

KSerrania · 2024-04-16T11:13:00Z

tasks/libs/common/omnibus.py

+
+
+def _get_omnibus_commits(field):
+    release_version = os.environ['RELEASE_VERSION_7']


I think you want to check RELEASE_VERSION, and then RELEASE_VERSION_7 if the first one is not found, because of the way we set these variables currently.

Note for later: we should standardize all builds, make them set RELEASE_VERSION explicitly, to avoid having to use this hack.

Indeed this was causing a failure on windows.
I might have some questions about these variables during the summit, it's a bit unclear to me.

iglendd · 2024-04-16T11:20:21Z

tasks/winbuildscripts/dobuild.bat

@@ -11,6 +11,7 @@ if NOT DEFINED GO_VERSION_CHECK set GO_VERSION_CHECK=%~4

 set OMNIBUS_BUILD=omnibus.build
 set OMNIBUS_ARGS=--python-runtimes "%PY_RUNTIMES%"
+set INSTALL_DIR=opt\datadog-agent


Did it work for windows build? Is it a new directory to be created ? In what parent directory?

It worked as this is only used for knowing the subdirectory in which to locate the cache, so it would be created in the value provided by OMNIBUS_GIT_CACHE_DIR, which points to C:\TEMP\omnibus-git-cache by default on Windows.
However, I believe this is actually not needed. I removed it and will check the pipeline results

KSerrania

LGTM. If you plan on merging this today, can you sync with @FlorentClarret? This will may conflict with the Python linter changes he's making in #24590, so he'll have to rebase.

KSerrania · 2024-04-17T10:01:52Z

tasks/libs/common/omnibus.py

+    buildimages_hash = _get_build_images(ctx)
+    for img_hash in buildimages_hash:
+        h.update(str.encode(img_hash))


Not needed for now, but for debugging purposes, you may want to log explicitly what buildimages entries go in the cache key, like you do for the environment variables.

chouquette · 2024-04-17T11:26:10Z

/merge

dd-devflow · 2024-04-17T11:26:18Z

🚂 MergeQueue

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

Use /merge -c to cancel this operation!

chouquette · 2024-04-17T13:24:00Z

/merge

dd-devflow · 2024-04-17T13:24:15Z

🚂 MergeQueue

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

Use /merge -c to cancel this operation!

chouquette · 2024-04-17T13:52:38Z

/merge -c

dd-devflow · 2024-04-17T13:52:44Z

⚠️ MergeQueue

This merge request was unqueued

If you need support, contact us on Slack #devflow!

chouquette · 2024-04-17T13:52:58Z

/merge

dd-devflow · 2024-04-17T13:53:11Z

🚂 MergeQueue

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

Use /merge -c to cancel this operation!

iliakur

seems fine from agent ints perspective, tho would be nice to get rid of the repetitive always_build true

dd-devflow · 2024-04-17T13:59:18Z

🚂 MergeQueue

Pull request added to the queue.

This build is next! (estimated merge in less than 49m)

Use /merge -c to cancel this operation!

Co-authored-by: alopezz <[email protected]> Co-authored-by: Pythyu <[email protected]>

chouquette force-pushed the chouquette/omnibus_cache branch from 79302a8 to f8fd7e1 Compare October 13, 2023 08:12

chouquette force-pushed the chouquette/omnibus_cache branch from 4784f52 to 216f58e Compare October 18, 2023 08:38

chouquette force-pushed the chouquette/omnibus_cache branch 2 times, most recently from a532fb1 to 9c889d9 Compare November 9, 2023 10:42

chouquette force-pushed the chouquette/omnibus_cache branch 4 times, most recently from 86eac62 to e9bdfc8 Compare January 9, 2024 13:02

chouquette force-pushed the chouquette/omnibus_cache branch from fcfd7cc to 3545f3f Compare February 29, 2024 09:59

chouquette added 7 commits March 4, 2024 16:03

omnibus: conditionally enable git cache

c5d3e9e

tasks: omnibus_build: add support for omnibus git caching

a3d09bd

always update omnibus cache

8653336

so that we can measure the results until it's merged & further worked on

tasks: generate a cache key to fetch omnibus cache

ac458d2

currate env variables filter

37abbc5

attempt to stop hardcoding install dir

bae0667

chouquette force-pushed the chouquette/omnibus_cache branch from 9f05afb to bae0667 Compare March 4, 2024 15:14

Pythyu added 5 commits March 15, 2024 14:41

Merge branch 'main' into chouquette/omnibus_cache

0e50cb4

feat(omnibus-cache): configure cache for remote updater OCI packages

45c6717

feat(omnibus-cache): configure cache for remote updater OCI packages

6caa677

feat(test): use omnibus basedir in cache

7d35a23

feat(test): use two different omnibus cache variable

74d010a

Pythyu added 3 commits March 18, 2024 17:13

feat(test): updater omnibus cache suffix

134d81d

feat(updater): added package version into cache path

cea95fa

feat(updater): replace OMNIBUS_GIT_CACHE_SUFFIX with an agent.omnibus…

53185cf

…-build argument

Pythyu added changelog/no-changelog qa/no-code-change No code change in Agent code requiring validation labels Mar 19, 2024

KSerrania reviewed Apr 16, 2024

View reviewed changes

alopezz and others added 3 commits April 16, 2024 10:39

simplify install directory sanitization

af4473b

remove old fixme

bdf7369

use omnibus commits in cache key

e9d2dad

chouquette force-pushed the chouquette/omnibus_cache branch from 16020bb to e9d2dad Compare April 16, 2024 09:38

KSerrania reviewed Apr 16, 2024

View reviewed changes

iglendd reviewed Apr 16, 2024

View reviewed changes

chouquette added 2 commits April 16, 2024 13:25

handle both RELEASE_VERSION and RELEASE_VERSION_7

3d78dd7

remove unneeded windows task parameter

590bd4c

chouquette requested a review from KSerrania April 16, 2024 14:13

KSerrania approved these changes Apr 17, 2024

View reviewed changes

display omnibus commits sha1

98b6a70

Merge branch 'main' into chouquette/omnibus_cache

1e414d2

iliakur approved these changes Apr 17, 2024

View reviewed changes

steveny91 approved these changes Apr 17, 2024

View reviewed changes

dd-mergequeue bot merged commit d3f3164 into main Apr 17, 2024
189 checks passed

dd-mergequeue bot deleted the chouquette/omnibus_cache branch April 17, 2024 14:38

github-actions bot added this to the 7.54.0 milestone Apr 17, 2024

CelianR pushed a commit that referenced this pull request Apr 26, 2024

Enable omnibus build cache (#20117)

83b2cb8

Co-authored-by: alopezz <[email protected]> Co-authored-by: Pythyu <[email protected]>

alexgallotta pushed a commit that referenced this pull request May 9, 2024

Enable omnibus build cache (#20117)

4e4e9a9

Co-authored-by: alopezz <[email protected]> Co-authored-by: Pythyu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable omnibus build cache #20117

Enable omnibus build cache #20117

chouquette commented Oct 13, 2023 •

edited

Loading

pr-commenter bot commented Oct 13, 2023 •

edited

Loading

Experiments ignored for regressions

Fine details of change detection per experiment

Explanation

pr-commenter bot commented Mar 18, 2024 •

edited

Loading

pr-commenter bot commented Mar 18, 2024 •

edited

Loading

Experiments ignored for regressions

Fine details of change detection per experiment

Explanation

KSerrania Apr 16, 2024

chouquette Apr 16, 2024

chouquette Apr 16, 2024

dd-devflow bot commented Apr 16, 2024

KSerrania Apr 16, 2024

chouquette Apr 16, 2024

iglendd Apr 16, 2024

chouquette Apr 16, 2024 •

edited

Loading

KSerrania left a comment

KSerrania Apr 17, 2024

chouquette commented Apr 17, 2024

dd-devflow bot commented Apr 17, 2024

chouquette commented Apr 17, 2024

dd-devflow bot commented Apr 17, 2024

chouquette commented Apr 17, 2024

dd-devflow bot commented Apr 17, 2024

chouquette commented Apr 17, 2024

dd-devflow bot commented Apr 17, 2024

iliakur left a comment

dd-devflow bot commented Apr 17, 2024



		def _get_omnibus_commits(field):
		release_version = os.environ['RELEASE_VERSION_7']

Enable omnibus build cache #20117

Enable omnibus build cache #20117

Conversation

chouquette commented Oct 13, 2023 • edited Loading

What does this PR do?

Motivation

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

Reviewer's Checklist

pr-commenter bot commented Oct 13, 2023 • edited Loading

Bloop Bleep... Dogbot Here

Regression Detector Results

Experiments with missing or malformed data

No significant changes in experiment optimization goals

Experiments ignored for regressions

Fine details of change detection per experiment

Explanation

pr-commenter bot commented Mar 18, 2024 • edited Loading

Test changes on VM

pr-commenter bot commented Mar 18, 2024 • edited Loading

Regression Detector

Regression Detector Results

No significant changes in experiment optimization goals

Experiments ignored for regressions

Fine details of change detection per experiment

Explanation

KSerrania Apr 16, 2024

Choose a reason for hiding this comment

chouquette Apr 16, 2024

Choose a reason for hiding this comment

chouquette Apr 16, 2024

Choose a reason for hiding this comment

dd-devflow bot commented Apr 16, 2024

KSerrania Apr 16, 2024

Choose a reason for hiding this comment

chouquette Apr 16, 2024

Choose a reason for hiding this comment

iglendd Apr 16, 2024

Choose a reason for hiding this comment

chouquette Apr 16, 2024 • edited Loading

Choose a reason for hiding this comment

KSerrania left a comment

Choose a reason for hiding this comment

KSerrania Apr 17, 2024

Choose a reason for hiding this comment

chouquette commented Apr 17, 2024

dd-devflow bot commented Apr 17, 2024

chouquette commented Apr 17, 2024

dd-devflow bot commented Apr 17, 2024

chouquette commented Apr 17, 2024

dd-devflow bot commented Apr 17, 2024

chouquette commented Apr 17, 2024

dd-devflow bot commented Apr 17, 2024

iliakur left a comment

Choose a reason for hiding this comment

dd-devflow bot commented Apr 17, 2024

chouquette commented Oct 13, 2023 •

edited

Loading

pr-commenter bot commented Oct 13, 2023 •

edited

Loading

pr-commenter bot commented Mar 18, 2024 •

edited

Loading

pr-commenter bot commented Mar 18, 2024 •

edited

Loading

chouquette Apr 16, 2024 •

edited

Loading