setting `SOURCE_DATE_EPOCH` changes the `created` metadata for base image layers (which are otherwise immutable 😅) #4614

tianon · 2024-02-02T00:35:00Z

Simple illustration:

$ cat Dockerfile
FROM hello-world
RUN ["/hello"]
$ crane config hello-world | jq '.history[]'
{
  "created": "2023-05-02T16:49:27Z",
  "created_by": "COPY hello / # buildkit",
  "comment": "buildkit.dockerfile.v0"
}
{
  "created": "2023-05-02T16:49:27Z",
  "created_by": "CMD [\"/hello\"]",
  "comment": "buildkit.dockerfile.v0",
  "empty_layer": true
}
$ # "36dd6c7a5ea890622c1515b980b0b0eb792f8d320fc0506fc40c58b6a81c0cc1" here is the config blob of my locally generated image, which is apparently not fully reproducible 😅
$ SOURCE_DATE_EPOCH=1 docker buildx build --builder foo --quiet --output type=oci . | tar -x --to-stdout blobs/sha256/36dd6c7a5ea890622c1515b980b0b0eb792f8d320fc0506fc40c58b6a81c0cc1 | jq '.history[]'
{
  "created": "1970-01-01T00:00:01Z",
  "created_by": "COPY hello / # buildkit",
  "comment": "buildkit.dockerfile.v0"
}
{
  "created": "1970-01-01T00:00:01Z",
  "created_by": "CMD [\"/hello\"]",
  "comment": "buildkit.dockerfile.v0",
  "empty_layer": true
}
{
  "created": "1970-01-01T00:00:01Z",
  "created_by": "RUN /hello # buildkit",
  "comment": "buildkit.dockerfile.v0"
}

This surprised me because the design of container image layering is such that "base" layers are by-design immutable, but here we're mutating part of the metadata of them anyways. 😬

I originally had logic for setting SOURCE_DATE_EPOCH that would set it to the commit timestamp or the timestamp of the parent images, whichever is newer, but I ditched the latter half of that because it means the new layers generated in the build are potentially less reproducible than they would be otherwise (and I care a lot more about reproducible layers than I do reproducible images).
See for example docker-library/golang#505 where we managed to make the "install Go" layer reproducible such that it should be identical no matter which OS / variant you download (ie, switching from golang:alpine3.19 to golang:bookworm or golang:alpine3.18 shouldn't have to redownload the Go layer, or pulling a rebuilt image from a newer base image).

I think this code is responsible for this behavior (specifically that it appears to loop over all of the history object, not just the "new" layers):

buildkit/exporter/containerimage/writer.go

Lines 694 to 700 in 5ac0a34

    
           if epoch != nil { 
        
           	for i, h := range history { 
        
           		if h.Created == nil || h.Created.After(*epoch) { 
        
           			history[i].Created = epoch 
        
           		} 
        
           	} 
        
           }

It'd be great if this behavior could change to only touch the metadata on "new" layers instead (or at least have an option that allows for that behavior). 🙇 ❤️

The text was updated successfully, but these errors were encountered:

tonistiigi · 2024-02-02T01:26:39Z

Our current logic is that timestamps for base/all layers are reset if they are newer than the epoch set by the build (as defined in SOURCE_DATE_EPOCH spec).

As epoch works both on the exporter and frontend side, the other place for this is https://github.com/moby/buildkit/blob/master/frontend/dockerfile/dockerfile2llb/convert.go#L1874 . I'm not sure if there is possible to keep the frontend implementation if the logic would be different as once we get to

buildkit/exporter/containerimage/writer.go

Lines 694 to 700 in 5ac0a34

    
           if epoch != nil { 
        
           	for i, h := range history { 
        
           		if h.Created == nil || h.Created.After(*epoch) { 
        
           			history[i].Created = epoch 
        
           		} 
        
           	} 
        
           }

there is no base image distinction anymore.

Never resetting timestamps for base image layers means that timestamps for these layers can be newer than the top layers of the image.

Theoretically there can also be the case when the base image changes only in timestamp metadata and user may want to use epoch to still keep the build reproducible. I'm not sure how practical that case is though.

Is this weird purely for visual inspection or does the timestamp there mess up some functional aspects as well?

Thoughts? @AkihiroSuda @crazy-max @cpuguy83

tianon · 2024-02-02T18:18:57Z

This affects the functional ability to detect that an image has been rebuilt due to a base image update because all the timestamps are clamped to the SOURCE_DATE_EPOCH of my Dockerfile and the (correctly newer) timestamps of the immutable parent image are lost.

tianon · 2024-02-02T18:20:01Z

(Ideally, in my golang example, it'd also set the created value to the highest timestamp in the layer itself so that it'd be set to the actual useful SOURCE_DATE_EPOCH from Go upstream, but I know that's a much larger ask.)

AkihiroSuda · 2024-02-03T08:23:54Z

We also have to ignore base images in rewriteRemoteWithEpoch

buildkit/exporter/containerimage/writer.go

Lines 383 to 412 in b09a9f1

    
           func (ic *ImageWriter) rewriteRemoteWithEpoch(ctx context.Context, opts *ImageCommitOpts, remote *solver.Remote) (*solver.Remote, error) { 
        
           	if opts.Epoch == nil { 
        
           		bklog.G(ctx).Warn("rewrite-timestamp is specified, but no source-date-epoch was found") 
        
           		return remote, nil 
        
           	} 
        
           	remoteDescriptors := remote.Descriptors 
        
           	cs := contentutil.NewStoreWithProvider(ic.opt.ContentStore, remote.Provider) 
        
           	eg, ctx := errgroup.WithContext(ctx) 
        
           	rewriteDone := progress.OneOff(ctx, 
        
           		fmt.Sprintf("rewriting layers with source-date-epoch %d (%s)", opts.Epoch.Unix(), opts.Epoch.String())) 
        
           	for i, desc := range remoteDescriptors { 
        
           		i, desc := i, desc 
        
           		eg.Go(func() error { 
        
           			if rewrittenDesc, err := rewriteImageLayerWithEpoch(ctx, cs, desc, opts.RefCfg.Compression, opts.Epoch); err != nil { 
        
           				bklog.G(ctx).WithError(err).Warnf("failed to rewrite layer %d/%d to match source-date-epoch %d (%s)", 
        
           					i+1, len(remoteDescriptors), opts.Epoch.Unix(), opts.Epoch.String()) 
        
           			} else if rewrittenDesc != nil { 
        
           				remoteDescriptors[i] = *rewrittenDesc 
        
           			} 
        
           			return nil 
        
           		}) 
        
           	} 
        
           	if err := rewriteDone(eg.Wait()); err != nil { 
        
           		return nil, err 
        
           	} 
        
           	return &solver.Remote{ 
        
           		Provider:    cs, 
        
           		Descriptors: remoteDescriptors, 
        
           	}, nil 
        
           }

AkihiroSuda · 2024-02-06T15:20:30Z

Any thought on how to track the list of the base image blobs?

jedevc · 2024-02-06T15:34:15Z

Could we potentially have some sort of internal annotation that we could add to a descriptor when we pull it for an image in a SourceOp? Then the exporter can find the image that's a base image, and we can make sure to remove it before exporting in RemoveInternalLayerAnnotations.

Edit: wait, that's not going to work with cached operations 😢

tonistiigi · 2024-02-13T07:59:18Z

@AkihiroSuda I noticed you added it to the milestone. Can this be assigned to you?

AkihiroSuda · 2024-02-13T08:57:27Z

@AkihiroSuda I noticed you added it to the milestone. Can this be assigned to you?

Yes, but might need help for how to track the list of the base image blobs

AkihiroSuda · 2024-02-19T12:36:54Z

Two options for tracking the base image blobs:

Let dockerfile2llb.Dockerfile2LLB emit the base image config as metadata that can be accessed by the exporter
Let llb.State.WithImageConfig embed the base image config in the LLB protobuf

The option 2 might be more robust but the current exporter isn't aware of the LLB graph, so the option 1 seems more straightforward.

AkihiroSuda · 2024-02-19T15:18:44Z

PR

Avoid applying SOURCE_DATE_EPOCH to base images #4663

AkihiroSuda added the area/reproducible-builds label Feb 2, 2024

tonistiigi mentioned this issue Feb 7, 2024

Add versioning to exporting #4629

Open

AkihiroSuda added this to the v0.13.0 milestone Feb 9, 2024

AkihiroSuda mentioned this issue Feb 19, 2024

Avoid applying SOURCE_DATE_EPOCH to base images #4663

Merged

tonistiigi closed this as completed in #4663 Feb 24, 2024

AkihiroSuda mentioned this issue Mar 28, 2024

rewrite-timestamp still rewrites base image layers (for some base images) #4805

Closed

AlexanderYastrebov mentioned this issue Apr 29, 2024

docker build does not create reproducible output (even with fixed SOURCE_DATE_EPOCH) moby/moby#43124

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

setting `SOURCE_DATE_EPOCH` changes the `created` metadata for base image layers (which are otherwise immutable 😅) #4614

setting `SOURCE_DATE_EPOCH` changes the `created` metadata for base image layers (which are otherwise immutable 😅) #4614

tianon commented Feb 2, 2024

tonistiigi commented Feb 2, 2024

tianon commented Feb 2, 2024

tianon commented Feb 2, 2024

AkihiroSuda commented Feb 3, 2024

AkihiroSuda commented Feb 6, 2024

jedevc commented Feb 6, 2024 •

edited

Loading

tonistiigi commented Feb 13, 2024

AkihiroSuda commented Feb 13, 2024

AkihiroSuda commented Feb 19, 2024

AkihiroSuda commented Feb 19, 2024

setting SOURCE_DATE_EPOCH changes the created metadata for base image layers (which are otherwise immutable 😅) #4614

setting SOURCE_DATE_EPOCH changes the created metadata for base image layers (which are otherwise immutable 😅) #4614

Comments

tianon commented Feb 2, 2024

tonistiigi commented Feb 2, 2024

tianon commented Feb 2, 2024

tianon commented Feb 2, 2024

AkihiroSuda commented Feb 3, 2024

AkihiroSuda commented Feb 6, 2024

jedevc commented Feb 6, 2024 • edited Loading

tonistiigi commented Feb 13, 2024

AkihiroSuda commented Feb 13, 2024

AkihiroSuda commented Feb 19, 2024

AkihiroSuda commented Feb 19, 2024

setting `SOURCE_DATE_EPOCH` changes the `created` metadata for base image layers (which are otherwise immutable 😅) #4614

setting `SOURCE_DATE_EPOCH` changes the `created` metadata for base image layers (which are otherwise immutable 😅) #4614

jedevc commented Feb 6, 2024 •

edited

Loading