Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/cache: add preferred pogreb database cache impl #1278

Merged

Conversation

joelanford
Copy link
Member

@joelanford joelanford commented Apr 18, 2024

Description of the change:
This PR adds a new caching implementation that has some improvements over our existing JSON cache implementation. It also maintains backward compatibility to gracefully handle serving existing JSON-based caches.

I ran a few tests for compatibility:

  1. With an opm from this PR, the following happens:

    • Success when the cache contains JSON files
    • Success when the cache contains a Pogreb DB
    • Success when the cache contains both the JSON and Pogreb files side-by-side (and it uses Pogreb)
  2. With an opm from the current master branch, the following happens:

    • Success when the cache contains JSON files
    • Failure when the cache contains a Pogreb DB
    • Success when the cache contains both the JSON and Pogreb files side-by-side (and it uses JSON). Therefore adding a Pogreb DB to the cache is forward compatible.

Motivation for the change:

  • It stores items in an embedded database (see https://github.com/akrylysov/pogreb), which seems to fit the mold of our use case (infrequent bulk load, read heavy clients)
  • It serializes/deserializes data as protobuf rather than JSON (so smaller and faster)
  • It uses ~50% less CPU to serve similar request loads
  • It uses ~66% less peak memory to serve similar request loads
  • Its steady state memory is about the same.
  • It is much faster to serve (ran some local benchmarks with ghz with 20 concurrent connections)
    • JSON
      • 10% latency 1.63s
      • 99% latency 2.53s
    • Pogreb
      • 10% latency 681ms
      • 99% latency 841ms
  • The digest calculation is based purely on the FBC and cache data (the current JSON digest uses the tar format and is pretty finicky)
    • For FBC, walk meta blobs, writing them each to the digester
    • For the cache, iterate all the items in the k/v store, writing each k+v to the digester

Cache size improvements

  • I've been testing with a migrated operatorhub catalog (that is one that has olm.csv.metadata)
  • Pogreb: 70M
  • Json: 89M

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /docs
  • Commit messages sensible and descriptive

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 18, 2024
Copy link
Contributor

openshift-ci bot commented Apr 18, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Apr 18, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joelanford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 18, 2024
@joelanford joelanford force-pushed the pogreb-cache branch 2 times, most recently from db110ee to 90f872d Compare April 19, 2024 20:25
Copy link

codecov bot commented Apr 19, 2024

Codecov Report

Attention: Patch coverage is 60.86142% with 209 lines in your changes are missing coverage. Please review.

❗ No coverage uploaded for pull request base (master@9fdedc2). Click here to learn what that means.
Report is 4 commits behind head on master.

❗ Current head 0082947 differs from pull request most recent head f9792af. Consider uploading reports for the commit f9792af to get more accurate results

Files Patch % Lines
pkg/cache/cache.go 56.58% 67 Missing and 22 partials ⚠️
pkg/cache/pogrebv1.go 55.08% 33 Missing and 20 partials ⚠️
pkg/cache/json.go 54.16% 32 Missing and 12 partials ⚠️
alpha/declcfg/load.go 77.38% 13 Missing and 6 partials ⚠️
pkg/cache/bundle_key.go 77.77% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master    #1278   +/-   ##
=========================================
  Coverage          ?   50.05%           
=========================================
  Files             ?      135           
  Lines             ?    12993           
  Branches          ?        0           
=========================================
  Hits              ?     6504           
  Misses            ?     5405           
  Partials          ?     1084           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@joelanford
Copy link
Member Author

joelanford commented Apr 20, 2024

I had an epiphany overnight!

The third commit in the PR changes the way the cache is built. It writes meta objects to a temporary file and records the locations of each meta in the file, grouped by package. That way we can later read just the metas for a particular package into memory.

Then we go package by package building a model, converting to the package index, and writing api bundles to the cache. The beauty is that only a single package's model is loaded in memory at any given time.

This may mean that we can stop storing caches in the catalog image and can go back to building them on-the-fly when the container starts!

I noticed that when using an FBC with olm.csv.metadata, startup peak memory and time was basically inconsequential when building a cache on the fly.

@joelanford joelanford force-pushed the pogreb-cache branch 2 times, most recently from 69069c2 to 5a1dac5 Compare April 20, 2024 17:35
@joelanford joelanford force-pushed the pogreb-cache branch 2 times, most recently from 8822e99 to e65fa4e Compare April 24, 2024 13:46
@joelanford joelanford marked this pull request as ready for review April 24, 2024 13:52
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 24, 2024
@openshift-merge-robot openshift-merge-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 30, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 10, 2024
@grokspawn
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 14, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 5b5181b into operator-framework:master May 14, 2024
9 of 10 checks passed
@joelanford joelanford deleted the pogreb-cache branch May 15, 2024 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants