Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: bubble up image size as pod events #106

Closed
wants to merge 3 commits into from
Closed

feat: bubble up image size as pod events #106

wants to merge 3 commits into from

Conversation

vadasambar
Copy link
Contributor

@vadasambar vadasambar commented Jan 11, 2024

Problem

If the size of the image that warm-metal is pulling is too big, image pulling (and mounting) can fail. The user does not understand why the image pulling and mounting failed.

Solution

This PR introduces a flag --max-image-size which can be set to any value like 1Mi, 1Gi, 10Gi etc., Warm-metal driver will get the image size from the registry and if the image size exceeds the --max-image-size, it will create an Event on the pod where the volume is being mounted.

This is what it looks like in action:
image

The flag has no impact on mounting and unmounting. If there is a failure to fetch image size info from the registry, warm-metal just logs the error (it doesn't terminate the driver).

@vadasambar vadasambar changed the title feat: wip bubble up image size as pod events [DON'T REVIEW] feat: wip bubble up image size as pod events Jan 11, 2024
@vadasambar vadasambar changed the title [DON'T REVIEW] feat: wip bubble up image size as pod events feat: wip bubble up image size as pod events Jan 31, 2024
@vadasambar vadasambar changed the title feat: wip bubble up image size as pod events feat: bubble up image size as pod events Jan 31, 2024
feat: implement logic for warner
- pass down max image size to the warner

refactor: pass max size directly to warner
- change pkg name `imagesizewarner` -> `imagesize`

test: add test cases for image size

refactor: add helpful logs
- fix segfault if warm-metal fails to fetch size from the container registry
@vadasambar vadasambar marked this pull request as ready for review January 31, 2024 14:26
@vadasambar vadasambar requested a review from a team as a code owner January 31, 2024 14:26
pkg/imagesize/imagesize.go Outdated Show resolved Hide resolved

ctx, cancel := context.WithTimeout(context.Background(), ctxTimeout)
defer cancel()
// TODO(vadasambar): set a timeout for the context here
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this already fixed?

pkg/imagesize/imagesize.go Outdated Show resolved Hide resolved
repo := p.image.Name()
imageSpec := &cri.ImageSpec{Image: p.image.String()}
creds, withCredentials := p.keyring.Lookup(repo)
if !withCredentials {

p.checkForImageSize(nil, pod, ns)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible and helpful to add tests for getting image size with and without creds?

Comment on lines +152 to +153
Pod: req.VolumeContext[ctxKeyPodName],
Namespace: req.VolumeContext[ctxKeyPodNamespace],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding these might also help fix the bug in current logs where pod and namespace are empty in the log messages for printing image size.

Comment on lines +77 to +79
if current.Cmp(*w.maxImageSize) <= 0 {
return nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this where we check if maxImageSize flag is set, and if not then returning without errors?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If yes, then maybe we should add this check early before calling GetImageSize() function. That would reduce the additional work on the driver if the flag is not enabled.

// fetchImageSize gets the image size from the container registry
func (w *warner) fetchImageSize(c client.RegistryClient, image reference.Named) (*resource.Quantity, error) {
var mlist []manifesttypes.ImageManifest
imageManifest, err := c.GetManifest(context.Background(), image)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we are setting timeout on the context only in the warn() function and not here. Is this intentional?

var parsedSize *resource.Quantity

for _, m := range mlist {
// TODO(vadasambar): support other OS platforms
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to fix this in the same PR or create a separate issue for tracking this?

"k8s.io/klog/v2"
)

func TestWarn(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could add tests for parsing maxImageSize with invalid or empty inputs.

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

@github-actions github-actions bot added the Stale label Mar 22, 2024
Copy link

Closing this PR after a prolonged period of inactivity. Please create a new PR if the changes of the PR are still relevant.

@github-actions github-actions bot closed this Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants