Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 incremental improvement in catalogmetadata cache/client performance #965

Merged

Conversation

joelanford
Copy link
Member

@joelanford joelanford commented Jun 21, 2024

Description

This partially resolves #914 by doing two things:

  1. Exploding the all.json response into separate directories organized by package
  2. Updating the BundleProvider to require a package name to be provided in the request, which results in a much smaller set of FBC that needs to be re-parsed into catalogmetadata objects on each reconcile.

This is still not completely resolved though because step (1) still happens synchronously during the reconciliation of a ClusterExtension. In order to finish the improvement (and to complete #948), we need to implement a controller that watches and reconciles ClusterCatalog changes so that it can create (and delete) caches for operator-controller.

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

@joelanford joelanford requested a review from a team as a code owner June 21, 2024 20:20
Copy link

netlify bot commented Jun 21, 2024

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 82303d7
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/6679c2f361c8f9000742ebe9
😎 Deploy Preview https://deploy-preview-965--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@joelanford joelanford force-pushed the improve-catalog-reconcile-time branch 2 times, most recently from 822f614 to cbac64e Compare June 21, 2024 20:34
Comment on lines +60 to +63
// if the catalog has not been successfully unpacked, report an error. This ensures that our
// reconciles are deterministic and wait for all desired catalogs to be ready.
if !meta.IsStatusConditionPresentAndEqual(catalog.Status.Conditions, catalogd.TypeUnpacked, metav1.ConditionTrue) {
errs = append(errs, fmt.Errorf("catalog %q is not unpacked", catalog.Name))
Copy link
Member Author

@joelanford joelanford Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change in semantics. If we ignore catalogs that are not unpacked, users could run into issues if the same package exists in two different catalogs and they are expecting a specific result.

The new semantics are similar to the behavior in OLMv0, and there have been some complaints about it. For example, "I want to install CNV, I don't care if the community catalog is having issues". In order to solve this problem, I think we need to have a knob on the ClusterExtension API to allow users to select which ClusterCatalogs to resolve from (likely using a label selector)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have this captured as an issue somewhere or do you anticipate this falling under the umbrella of #740?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to make a separate issue for this. Just wanted to get consensus on this approach before I did.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -24,30 +23,30 @@ import (
)

func TestClient(t *testing.T) {
t.Run("Bundles", func(t *testing.T) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this outer/unnecessary t.Run wrapper (mainly because it caused my IDE to not be able to find the individual test cases to run/debug separately).

In order to review changes in this test, I advise ignoring whitespace.

@@ -50,203 +56,196 @@ const (
}`
)

func TestCache(t *testing.T) {
t.Run("FetchCatalogContents", func(t *testing.T) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my other comment, I removed this unnecessary t.Run wrapper. When reviewing, it's much easier to see actual changes if you ignore whitespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I moved this into internal. I noticed go-apidiff complain that I changed the Bundles() function signature realized we shouldn't have a test utility in our public API, especially since it returns a slice of objects that are also in an internal package.

if err != nil {
return nil, fmt.Errorf("error creating cache file for Catalog %q: %s", catalog.Name, err)
return nil, fmt.Errorf("error creating temporary directory to unpack catalog metadata: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("error creating temporary directory to unpack catalog metadata: %s", err)
return nil, fmt.Errorf("error creating temporary directory to unpack catalog metadata: %v", err)

@joelanford joelanford force-pushed the improve-catalog-reconcile-time branch from cbac64e to 0661a99 Compare June 21, 2024 21:05
@@ -135,7 +135,7 @@ test-ext-dev-e2e: $(OPERATOR_SDK) $(KUSTOMIZE) $(KIND) #HELP Run extension creat
ENVTEST_VERSION := $(shell go list -m k8s.io/client-go | cut -d" " -f2 | sed 's/^v0\.\([[:digit:]]\{1,\}\)\.[[:digit:]]\{1,\}$$/1.\1.x/')
UNIT_TEST_DIRS := $(shell go list ./... | grep -v /test/)
test-unit: $(SETUP_ENVTEST) #HELP Run the unit tests
eval $$($(SETUP_ENVTEST) use -p env $(ENVTEST_VERSION) $(SETUP_ENVTEST_BIN_DIR_OVERRIDE)) && go test -count=1 -short $(UNIT_TEST_DIRS) -coverprofile cover.out
eval $$($(SETUP_ENVTEST) use -p env $(ENVTEST_VERSION) $(SETUP_ENVTEST_BIN_DIR_OVERRIDE)) && CGO_ENABLED=1 go test -count=1 -race -short $(UNIT_TEST_DIRS) -coverprofile cover.out
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CGO is required for the race detector. TIL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah - got caught on that one rejiggering the v0 Makefile

Copy link

codecov bot commented Jun 22, 2024

Codecov Report

Attention: Patch coverage is 72.97297% with 20 lines in your changes missing coverage. Please review.

Project coverage is 79.86%. Comparing base (6c61a78) to head (82303d7).
Report is 1 commits behind head on main.

Files Patch % Lines
internal/catalogmetadata/cache/cache.go 41.37% 11 Missing and 6 partials ⚠️
internal/catalogmetadata/client/client.go 91.66% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #965      +/-   ##
==========================================
+ Coverage   79.76%   79.86%   +0.09%     
==========================================
  Files          16       17       +1     
  Lines        1107     1157      +50     
==========================================
+ Hits          883      924      +41     
- Misses        155      161       +6     
- Partials       69       72       +3     
Flag Coverage Δ
e2e 58.00% <54.41%> (+0.64%) ⬆️
unit 55.05% <72.97%> (+1.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

if _, err = file.Seek(0, 0); err != nil {
return nil, fmt.Errorf("error resetting offset for cache file reader for Catalog %q: %s", catalog.Name, err)
if err := os.Rename(tmpDir, cacheDir); err != nil {
return nil, fmt.Errorf("error moving temporary directory to cache directory: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious to know what happens if fail here? Just run of the mill expbackoff? what happens to resolution during that period? Same thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes to both.

deprecationsMu sync.Mutex
)

if err := declcfg.WalkMetasFS(ctx, packageFS, func(_ string, meta *declcfg.Meta, err error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function now guaranteed to run concurrently? Is there any guarantees on the same *declcfg.Meta object not being processed more than once?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It runs concurrently by default (based on number of CPUs Go thinks are available). There is an exactly once guarantee for an individual *declcfg.Meta object.

Copy link
Member Author

@joelanford joelanford Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, we're parsing n files at a time (where n is number of goroutines). You may notice that the cache writes metas into their own files. This is on purpose to make as many files as possible so that as many goroutines as possible (up to n, at least) have work to do.

Comment on lines +162 to +170
for _, bundle := range bundles {
slices.SortFunc(bundle.InChannels, func(a, b *catalogmetadata.Channel) int { return cmp.Compare(a.Name, b.Name) })
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on the purpose of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the concurrency introduced in WalkMetasFS the order that channels are added processed is non-deterministic. That non-determinism manifests as non-deterministic ordering of bundle.InChannels. So this function gets us back to a deterministic order.

It doesn't really matter except that tests become more annoying to implement if the order isn't deterministic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a comment just highlighting the why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelanford joelanford force-pushed the improve-catalog-reconcile-time branch 2 times, most recently from 3df60bb to b86e629 Compare June 24, 2024 18:54
@joelanford joelanford force-pushed the improve-catalog-reconcile-time branch from b86e629 to 82303d7 Compare June 24, 2024 19:03
@joelanford joelanford added this pull request to the merge queue Jun 24, 2024
Merged via the queue into operator-framework:main with commit 2eca31d Jun 24, 2024
15 of 17 checks passed
@skattoju skattoju mentioned this pull request Sep 25, 2024
4 tasks
@joelanford joelanford deleted the improve-catalog-reconcile-time branch October 22, 2024 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

catalogmetadata client adds extreme latency to reconciliation
4 participants