Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for existence of layers in cache before returning cached base image #3767

Merged
merged 19 commits into from
Sep 19, 2022

Conversation

emmileaf
Copy link
Contributor

@emmileaf emmileaf commented Sep 6, 2022

In this PR:

Implementation of a proposed solution in #3733’s discussions, to check for presence of layer files in cache in addition to the manifest json before returning the cached base image(s). This would also address the unauthorized error described in #2220 and also reported in #2007’s comments.

Details:

  • Adds allLayersCached() method to Cache and CacheStorageReader that checks for existence of cached layers for all layers described in the image manifest.
  • Updates PullBaseImageStep.getCachedBaseImages() to invoke this check, and returns no cached images if layers are missing.

Two approaches for layer check:

  • Changes up to 755dcfc is a simpler check which verifies that the corresponding layer directory exists in the cache (but not that its contents are valid)
  • b231d79 checks layers by making a call to retrieve the layer, though the retrieved layer is unused and the later ObtainBaseImageLayerStep would retrieve it again

Open to suggestions on what the better approach might be here - the second option does redundant work, but is a more rigorous validation.

@emmileaf emmileaf marked this pull request as ready for review September 6, 2022 16:00
@mpeddada1
Copy link
Contributor

@emmileaf Took a first look at the changes! To make sure that I'm understanding the two approaches correctly:

  • Approach 1 checks for the existence of {cacheDirectory}/layers/{layerDigest} where the layer digests are read from the manifest.
  • Approach 2 actually tries to grab the cached layer given the digests and if this act results in some sort of an IO issue then the layer is deemed to not be present in the cached?

Both approaches are valid but I'm a little concerned about the performance overhead approach 2 might add especially since the change will result in the cached layer being retrieved twice. Approach 1 might be a sufficient check. In terms of readability it also makes it more explicit that we are checking for the presence of the layer. Catching exceptions also works but there is usually a risk involved with it if we're not as specific as possible since methods can throw an exception for a number of reasons and it is usually recommended to let exceptions propagate to the caller or be handled by surrounding frameworks (testing frameworks, for example). Lmk what you think.

@emmileaf
Copy link
Contributor Author

emmileaf commented Sep 8, 2022

@mpeddada1 Thanks for the review! Your understanding of the two approaches is spot on.

I also agree with the concerns here around approach 2’s redundant work overhead and relying on exceptions. The downside of approach 1's check I was trying to work around is that, CacheStorageWriter.writeCompressed() creates the {cacheDirectory}/layers/{layerDigest} directory first, so it is possible for this check to succeed but the layer partially written (or not valid for retrieval in some other way). Not sure how unlikely that race condition would be though.

Maybe there is some way to make this check more rigorous without fully retrieving the layer? I'll revert the last commit to switch back to approach 1, and see if I can make any improvements from there.

@emmileaf
Copy link
Contributor Author

emmileaf commented Sep 9, 2022

The downside of approach 1's check I was trying to work around is that, CacheStorageWriter.writeCompressed() creates the {cacheDirectory}/layers/{layerDigest} directory first

Just realized this statement isn't true and I had misunderstood this cache write logic earlier - it looks like a temp folder is created for writing the layer first, and then the contents get moved into {cacheDirectory}/layers/{layerDigest} (ref: #879 (comment)). This addresses the concern with partial layer directories, and I'll go ahead with the switch to approach 1 in this PR. Apologies for the confusion!

@mpeddada1
Copy link
Contributor

Just realized this statement isn't true and I had misunderstood this cache write logic earlier - it looks like a temp folder is created for writing the layer first, and then the contents get moved into {cacheDirectory}/layers/{layerDigest} (ref: #879 (comment)). This addresses the concern with partial layer directories, and I'll go ahead with the switch to approach 1 in this PR. Apologies for the confusion!

Ah This changes the write to only produce a layer directory at the intended location in full. This way, cache reads never see an incomplete layer directory. That is reassuring, thank you for verifying this!

* @param manifest the image manifest
* @return a boolean
*/
boolean allLayersCached(ManifestTemplate manifest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it possible to move this method directly to PullBaseImageStep? Since that is the only class where this is used and it could help us avoid having to go through an extra layer of classes to reach this method. Or does this method require any variables that are specific to CacheStorageReader?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about this too - this method goes through a few layers because it depends on CacheStorageReader’s cacheStorageFiles variable here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks! Hm it looks like Cache takes in cacheStorageFiles as a parameter to it's constructor, could we use that maybe?

private Cache(CacheStorageFiles cacheStorageFiles) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you’re right - I missed that we could just add cacheStorageFiles as a field and move this logic to Cache.

I went ahead and tried to make this change, but noticed that the Cache class is set up in a way that many of its methods are wrappers around calls to either CacheStorageReader or CacheStorageWriter. Looking at the existing test suites, CacheStorageReaderTest is also more straightforward to add to than CacheTest for unit testing the new check.

I am tempted to leave this logic in CacheStorageReader just to stay consistent with the existing setup here - lmk what you think!

Comment on lines +103 to +105
} else {
throw new IllegalArgumentException("Unknown manifest type: " + manifest);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this would break current behavior for those doing multi-platform image building? For context #2730, we also do caching for manifest lists in addition to single manifests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s a good callout, and please let me know if I'm misunderstanding anything here!

Right now the added logic in PullBaseImageStep.getCachedBaseImages() never passes anything of V22ManifestList type into areAllLayersCached() calls explicitly. In the case of manifest lists, this check is made individually when looping over the platform-specific manifests, and returns an overall cache miss if any of the platform-specific manifests has incomplete layers.

But, do you think there is value here to have CacheStorageReader.areAllLayersCached() itself handle the manifest list type, rather than rely on the code calling it? Perhaps instead of throwing an exception here, it can also just return false, and leave the rest of the behavior to existing logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for the detailed explanation! You're right, it is called for the individual manifests in the manifest list.

But, do you think there is value here to have CacheStorageReader.areAllLayersCached() itself handle the manifest list type, rather than rely on the code calling it?

Hm that's a good question. I was initially thinking about this too but looking at the code for getCachedBaseImage, we would probably still have to iterate through the manifests in the manifest list again to retrieve the collection of images? What you have currently (which a more fail-fast approach) seems like a better choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Copy link
Contributor

@mpeddada1 mpeddada1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough explanations. We're getting close!

Comment on lines +103 to +105
} else {
throw new IllegalArgumentException("Unknown manifest type: " + manifest);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks for the detailed explanation! You're right, it is called for the individual manifests in the manifest list.

But, do you think there is value here to have CacheStorageReader.areAllLayersCached() itself handle the manifest list type, rather than rely on the code calling it?

Hm that's a good question. I was initially thinking about this too but looking at the code for getCachedBaseImage, we would probably still have to iterate through the manifests in the manifest list again to retrieve the collection of images? What you have currently (which a more fail-fast approach) seems like a better choice.

Comment on lines 462 to 464
if (!baseImageLayersCache.allLayersCached(Verify.verifyNotNull(manifest))) {
return Collections.emptyList();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we move the Verify.verifyNotNull condition to allLayersCached?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to a line above, since this condition is applied again to the manifest in toImage() a few lines down

Comment on lines 492 to 499
}

ManifestTemplate manifest = Verify.verifyNotNull(manifestAndConfigFound.get().getManifest());
// Verify all layers described in manifest are present in cache
if (!baseImageLayersCache.allLayersCached(manifest)) {
return Collections.emptyList();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to combine !manifestAndConfigFound.isPresent() and areAllLayersCached with an &&?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets messier here to combine them I think? (Since grabbing the manifest is conditional on manifestAndConfigFound.isPresent(), plus it also needs to perform the verifyNotNull check)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see! thanks for trying this out. Hm thinking of a way in which we could remove the number of if statements since they are all testing similar things but just at different levels of granularity. Is it possible to put them in a helper method with a descriptive name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the many if statements did get more cumbersome with the changes added in this PR. I think the refactoring suggestion from @elefeint will help with this here!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent refactoring!

* @param manifest the image manifest
* @return a boolean
*/
boolean allLayersCached(ManifestTemplate manifest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks! Hm it looks like Cache takes in cacheStorageFiles as a parameter to it's constructor, could we use that maybe?

private Cache(CacheStorageFiles cacheStorageFiles) {

public void testAllLayersCached_v21SingleManifest()
throws IOException, CacheCorruptedException, DigestException, URISyntaxException {

setupCachedMetadataV21(cacheDirectory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion: If possible, let's format the test into three blocks: arrange, act and assert, with a space between each of these blocks. The arrange block will take care of all setup, the act block will call the method we're testing and the assert block will do all the verification.

Copy link
Contributor Author

@emmileaf emmileaf Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, this makes a lot of sense! Tried to follow this idea though with the act and assert block more or less combined, since for a few of the tests had asserts both before and after certain actions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we store these calls in variables? For example:

cacheAfterFirstLayerDirectory = Files.createDirectories(cacheStorageFiles.getLayerDirectory(firstLayerDigest));
areAllLayersCachedAfterFirstLayer = cacheStorageReader.areAllLayersCached(manifest)
cacheAfterSecondLayerDirectory = ...
areAllLayersCachedAfterSecondLayer= ..

//Assert block

This is just an example so you can pick a name you think works better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh apologies I completely misunderstood earlier! Thank you for the explanation, I see what is meant by having separate act and assert blocks now. Will update the tests 😃

Copy link
Contributor

@elefeint elefeint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly LGTM with my limited knowledge. A couple of random questions inline.

@@ -486,6 +492,11 @@ List<Image> getCachedBaseImages()
}

ManifestTemplate manifest = Verify.verifyNotNull(manifestAndConfigFound.get().getManifest());
// Verify all layers described in manifest are present in cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) there was quite a bit of code duplication in this method even in the past between the section that deals with general and platform-based manifest processing. Could there be an opportunity for refactoring common logic out into a helper method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this is a great suggestion - will refactor this and see if I can make the methods cleaner to read.

Copy link
Contributor

@mpeddada1 mpeddada1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This fix required in-depth knowledge into Jib's caching mechanism and you really hit it out of the park! Also made great refactorings along the way.

Comment on lines 463 to 465
} else {
return Collections.singletonList(cachedImage.get());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: we can do away with the else here and return Collections.singletonList(...) outside the if.

Comment on lines 492 to 499
}

ManifestTemplate manifest = Verify.verifyNotNull(manifestAndConfigFound.get().getManifest());
// Verify all layers described in manifest are present in cache
if (!baseImageLayersCache.allLayersCached(manifest)) {
return Collections.emptyList();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent refactoring!

@emmileaf
Copy link
Contributor Author

@mpeddada1 @elefeint @chanseokoh Thank you so much for all the help here with this PR!

Copy link
Contributor

@elefeint elefeint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring!

@emmileaf
Copy link
Contributor Author

emmileaf commented Sep 16, 2022

Thank you! Will update changelog and merge.
(Edit: will merge once main branch's CI status is back in the clear, to avoid more confusion)

@sonarcloud
Copy link

sonarcloud bot commented Sep 16, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

90.9% 90.9% Coverage
0.0% 0.0% Duplication

@herder
Copy link

herder commented Oct 25, 2022

Hi @emmileaf @elefeint - thank you for this fix!

We'd love to get this available for quite a few of our builds that fail from this, so I was wondering if there is a release with this fix planned soon?

@emmileaf
Copy link
Contributor Author

Hi @herder - we currently have some work in progress on compatibility in our release build environment, but can plan for a release later this week including this change.

@emmileaf
Copy link
Contributor Author

@herder jib-core 0.23.0, jib-maven-plugin 3.3.1, and jib-gradle-plugin 3.3.1 have been released with this fix.

@herder
Copy link

herder commented Oct 31, 2022

Great, thank you so much @emmileaf !

@emmileaf emmileaf deleted the base-image-caching branch November 21, 2022 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants