Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter BlobInfoCache candidates before prioritization, not in transports #2346

Merged
merged 14 commits into from
Apr 4, 2024

Conversation

mtrmac
Copy link
Collaborator

@mtrmac mtrmac commented Mar 18, 2024

This:

  • Moves the point where we filter blob reuse candidates based on algorithm (when the users asks for gzip/zstd, or when the user specifies a v2s2 manifest format and Zstd is not acceptable). Before, we would prioritize and return up to 5 candidates regardless of algorithm, and then the transport would filter unwanted algorithms — in the worst case, getting not a single acceptable algorithm. Now, we first filter out unacceptable candidates, and only afterwards prioritize and return up to 5 candidates to the transport. That should improve the chances of reusing a known blob.
  • Moves the responsibility for turning compression algorithm names into internal objects from the transports to the BlobInfoCache2 implementation. This will, again, allow using c/image/docker as a ~light-weight registry client without dragging in c/storage/pkg/chunked.
  • Structures the code to give us an opportunity to add even more logic to the algorithm filtering/matching. E.g. we will want to return a known-zstd:chunked layer where we don’t trust the TOC as a zstd candidate; and we will want to return trusted zstd:chunked candidates where the user asked for just zstd. That will come in the future.

See individual commit messages for details.

Cc: @giuseppe

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mtrmac mtrmac force-pushed the bic-filter-first branch 2 times, most recently from 3d75b49 to b940f89 Compare March 20, 2024 19:25
mtrmac added 11 commits March 25, 2024 16:05
Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
Make it more similar to testGenericCandidateLocations, and
include both locations for the same digest on the same line,
to make the relationships between the digests's entries a bit clearer.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
We will add a long/unwieldy CandidateLocations2Options parameter,
and adding that inline would make readability even more challenging.

So, split that out; the new parameter will be added shortly.

Should not change (test) behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
We will soon be looking up the actual compression alghorithms in the BIC code, so use
real algorithm names in the tests.

Should not change (test) behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
…internal/manifest

We will want to call this from the pkg/blobinfocache code, and for that we need
to make it dependent on neither internal/private nor internal/blobinfocache
(otherwise the interface definitions would create a dependency loop).

This only moves ~unchanged code, should not change behavior.

Eventually CandidateMatchesTryReusingBlobOptions will be removed entirely.

Signed-off-by: Miloslav Trmač <[email protected]>
... and move the canSubstitute parameter to it.

We will add more options, and we well want to carry them around
as a unit.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
…ions

... into candidateLocations and appendReplacementCandidates.

We will add more conditions there soon.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
This is
- more similar to the other BIC implementations
- probably faster: we hold a transaction open anyway, this is all local,
  so making fewer queries is unlikely to be notably less costly,
  and we only read the value once instead of getting a column all containing
  the same value
- necessary for future changes, were we need to do local checks on compressorName
  before consuming the other data, and those checks are in Go and can't be reasonably
  done by the SQLite query

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
… ones

Append to candidates directly, avoid a separate allocation of a temporary
array.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
The primary benefit is that we now filter the algorithms _before_ collecting
and trimming the list of candidates, i.e. if the most recent candidates
are not matching the requirements, CandidateLocations2 will return older
candidates instead of returning just some unwanted candidates which will be rejected.

This requires us to look up the actual compression in the BIC code, so this also
changes pkg/blobinfocache/internal/test to use real algorithm names.

Signed-off-by: Miloslav Trmač <[email protected]>
Inline it into the only remaining caller.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
@mtrmac mtrmac force-pushed the bic-filter-first branch from b940f89 to 75a01e9 Compare March 25, 2024 15:08
mtrmac added 3 commits March 25, 2024 18:24
…tions

The implementations (sharing the code in pkg/blobinfocache/internal/prioritize)
already need to look up the algorithm, so just return the value as a part of
a BICReplacementCandidate2, instead of looking it up again in the transport.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
... dropping various unreachable/redundant code paths,
now that the semantics is clear and tested.

Removes the unfortunate dependency of c/image/docker on
c/image/pkg/compression, making it cheaper to use c/image/docker
as a simple registry client.

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
…e field

Should not change behavior.

Signed-off-by: Miloslav Trmač <[email protected]>
@mtrmac mtrmac force-pushed the bic-filter-first branch from 75a01e9 to 7482d25 Compare March 25, 2024 17:24
@rhatdan
Copy link
Member

rhatdan commented Apr 4, 2024

LGTM

@rhatdan rhatdan merged commit c73b1e4 into containers:main Apr 4, 2024
10 checks passed
@mtrmac mtrmac deleted the bic-filter-first branch April 5, 2024 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants