pkg/cache: optimize peak memory usage during cache build #1281

joelanford · 2024-04-22T00:11:58Z

Description of the change:
I extracted a commit from #1278 , which can be implemented on its own with our existing caching algorithm. As I mentioned there:

[This commit] changes the way the cache is built. It writes meta objects to a temporary file and records the locations of each meta in the file, grouped by package. That way we can later read just the metas for a particular package into memory.

Then we go package by package building a model, converting to the package index, and writing api bundles to the cache. The beauty is that only a single package's model is loaded in memory at any given time.

This may mean that we can stop storing caches in the catalog image and can go back to building them on-the-fly when the container starts!

I noticed that when using an FBC with olm.csv.metadata, startup peak memory and time was basically inconsequential when building a cache on the fly.

In order to maintain the performance of cache building, we need to ensure that WalkMetasFS can make use of concurrency in the same way that LoadFS can (which is what the cache builder currently uses). Therefore, the first commit in this PR includes those changes.

Motivation for the change:
There have been numerous issues reported about how finicky pre-built caches are. There are cases where a catalog image with a pre-built cache works correctly on one node, but not another. There are other cases where caches built outside the image and then injected in are mangled enough to throw off the digest calculation. While these cases are likely problems with the specific digest algorithm we use, this could all be avoided if we were able to build the cache on-the-fly.

Reviewer Checklist

Implementation matches the proposed design, or proposal is updated to match implementation
Sufficient unit test coverage
Sufficient end-to-end test coverage
Docs updated or added to /docs
Commit messages sensible and descriptive

openshift-ci · 2024-04-22T00:12:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joelanford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [joelanford]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov · 2024-04-22T00:16:28Z

Codecov Report

Attention: Patch coverage is 72.72727% with 45 lines in your changes are missing coverage. Please review.

Project coverage is 54.03%. Comparing base (aa0777c) to head (092a36d).

Files	Patch %	Lines
pkg/cache/json.go	67.90%	18 Missing and 8 partials ⚠️
alpha/declcfg/load.go	77.38%	13 Missing and 6 partials ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1281   +/-   ##
=======================================
  Coverage   54.02%   54.03%           
=======================================
  Files         108      108           
  Lines       11266    11314   +48     
=======================================
+ Hits         6087     6113   +26     
- Misses       4190     4207   +17     
- Partials      989      994    +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

joelanford · 2024-04-22T13:52:30Z

/hold
I'm going to make this faster by adding concurrency support to WalkMetasFS.

Signed-off-by: Joe Lanford <[email protected]>

joelanford · 2024-04-22T15:54:51Z

/hold cancel

joelanford · 2024-04-22T18:01:59Z

pkg/cache/json.go

+	})
+
+	var mapMu sync.Mutex
+	for i := 0; i < runtime.NumCPU(); i++ {


This means that data for up to runtime.NumCPU packages will be in memory at once. I noticed the following when building a cache for the operatorhub catalog on my 10-core Mac M1 Pro:

On master:

With unmigrated catalog

With GOMAXPROCS=1: Peak memory=778Mb, Duration=15s

With unset GOMAXPROCS, on master branch: Peak memory=731Mb, Duration=10s

With migrated catalog

With GOMAXPROCS=1 on master branch: Peak memory=166Mb, Duration=2.7s

With unset GOMAXPROCS, on master branch: Peak memory=141Mb, Duration=2.0s

On PR branch:

With unmigrated catalog

With GOMAXPROCS=1 on PR branch: Peak memory=234Mb, Duration=16s

With unset GOMAXPROCS, on PR branch: Peak memory=290Mb, Duration=6.6s

With migrated catalog

With GOMAXPROCS=1 on PR branch: Peak memory=115Mb, Duration=3s

With unset GOMAXPROCS, on PR branch: Peak memory=117Mb, Duration=1.39s

joelanford · 2024-04-24T13:52:24Z

I'm going to close this one. I think we should focus on #1278.

openshift-ci bot requested review from njhale and perdasilva April 22, 2024 00:12

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 22, 2024

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 22, 2024

joelanford added 2 commits April 22, 2024 10:51

declcfg: concurrently load and process files in WalkMetasFS

0792f6b

Signed-off-by: Joe Lanford <[email protected]>

pkg/cache: optimize peak memory usage during cache build

092a36d

joelanford force-pushed the cache-peak-memory-improvement branch from 77c0299 to 092a36d Compare April 22, 2024 15:22

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 22, 2024

joelanford commented Apr 22, 2024

View reviewed changes

joelanford closed this Apr 24, 2024

joelanford deleted the cache-peak-memory-improvement branch April 24, 2024 13:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/cache: optimize peak memory usage during cache build #1281

pkg/cache: optimize peak memory usage during cache build #1281

joelanford commented Apr 22, 2024 •

edited

Loading

openshift-ci bot commented Apr 22, 2024

codecov bot commented Apr 22, 2024 •

edited

Loading

joelanford commented Apr 22, 2024 •

edited

Loading

joelanford commented Apr 22, 2024

joelanford Apr 22, 2024

joelanford commented Apr 24, 2024

pkg/cache: optimize peak memory usage during cache build #1281

pkg/cache: optimize peak memory usage during cache build #1281

Conversation

joelanford commented Apr 22, 2024 • edited Loading

openshift-ci bot commented Apr 22, 2024

codecov bot commented Apr 22, 2024 • edited Loading

Codecov Report

joelanford commented Apr 22, 2024 • edited Loading

joelanford commented Apr 22, 2024

joelanford Apr 22, 2024

Choose a reason for hiding this comment

joelanford commented Apr 24, 2024

joelanford commented Apr 22, 2024 •

edited

Loading

codecov bot commented Apr 22, 2024 •

edited

Loading

joelanford commented Apr 22, 2024 •

edited

Loading